Jump to content
IGNORED

Best Nvidia CUDA Card for HQPlayer


juliocat

Recommended Posts

5 hours ago, sbenyo said:

Is it for sure that dual-gpu (e.g. gtx 1070, 1080) is not supported?

 

At least not explicitly. Depending on how Nvidia implemented some of the CUDA functionality, multiple GPUs could possibly work when you have also convolution enabled. But since I don't have any multi-GPU machines I have not checked whether this happens or not.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
  • 2 weeks later...
28 minutes ago, john925 said:

Is eGPU possible for cuda offload?  

 

I'd say it is, but:

1. TB3 has its limits and you loose some of your GPU power. From what I've read, the better the card the more you lose.

2. External GPU cases are loud (in case you plan to keep it in the same room).

3. The eGPU cases are expensive.

Vinnie Rossi LIO (AVC/Tubestage, AMP Module with built in HPF 100Hz 24dB/octave, DAC 2.0), Harbeth P3ESR, Rythmik F8

Win10 i7-7700 -> Roon -> HQPlayer DSD512- > LIO 100Hz HPF -> Harbeth P3ESR

                                                                                ->LIO  -> miniDSP <100Hz -> Rythmik F8  

 

 

 

Link to comment
  • 3 weeks later...
  • 2 weeks later...
On 8/24/2017 at 4:11 AM, bigbear2003 said:

I tried the 940mx on my lenovo laptop and i dont see any improvement on cpu util.

 

Did you check with <nvidia-smi> if hqplayer is using or not the card?

 

My laptop is HP zBook G2 17" with i7-4940MX and Nvidia Quadro K5100M. In my case, hqp is always using Nvidia card or it seems to use it even if I deselect "cuda offload" ... So CPU% is always the same on the different cores 1-3% for plain DSD64

 

Have a nice day, Massimiliano

Link to comment
  • 3 weeks later...
On 2017. 07. 26. at 10:40 PM, Miska said:

 

At least not explicitly. Depending on how Nvidia implemented some of the CUDA functionality, multiple GPUs could possibly work when you have also convolution enabled. But since I don't have any multi-GPU machines I have not checked whether this happens or not.

 

 

This should be interesting with 16x 1080 Ti cards for example :)

 

https://www.onestopsystems.com/product/4u-value-gpu-accelerator-system

 

 

 

Link to comment
  • 2 months later...
  • 4 months later...

Since My PC (core i5 6400; 12GB RAM) doesn't support well GTX 1060 (need more power) I intent to use gtx 1050 with it. I want to ask is that any benefit. Since I know that too slow GPU will necklack HQplayer, I want to ask will be an option to choose how many % to seperate work for GPU and CPU. To make sure it work best.

Link to comment
  • 5 weeks later...
26 minutes ago, yamamoto2002 said:

This is my Titan V result.

About 6 TFLOPS doubleprec, One-seventh of Earth Simulator Gen1 Supercomputer .

I hope upcoming Geforce Volta products may have some doubleprec capability

What a beast! Have you run any HQplayer heavy filter setting with it? Something like upsampling 44.1/16 → 48 x 512 using poly-sinc-xtr filter?

 

Meanwhile AMD and Intel are having the "multi-core" CPU competition. All good for HQplayer :)

 

Software: Roon, Tidal, HQplayer 

HQplayer PC: i9 7980XE, Titan Xp, RTX 3090; i9 9900K, Titan V

DAC: Holo Audio MAY L2, T+A DAC8 DSD, exasound e12, iFi micro iDSD BL

USB tweaks: Intona, Uptone (ISO) regen, LPS-1, LPS-1.2, Sbooster Vbus2, Curious cables, SUPRA Certified HiSpeed USB cable

NAA: Logic CL100 powered by Uptone JS-2

AMP: Spectral DMC 30SV, Spectral DMA 300RS

Speaker: Magico S3 MKII

Rack: HRS SXR signature

Link to comment
On 6/12/2018 at 11:31 PM, louisxiawei said:

What a beast! Have you run any HQplayer heavy filter setting with it? Something like upsampling 44.1/16 → 48 x 512 using poly-sinc-xtr filter?

 

 

 

 

No.

 

It seems, in order to run CUDA programs on Volta, programs should be compiled using latest version of CUDA Toolkit, which dropped support of older Fermi based GPUs such as Geforce GTX 580 or Quadro 6000. I'm not sure this affects HQP

 

> Meanwhile AMD and Intel are having the "multi-core" CPU competition. All good for HQplayer :)

 

Yes it is good thing. On Windows, non-processor-group-aware apps can handle up to 64 core (or 64 hyper-thread). Process affinity mask is 64bit (one bit is associated to one core(or hyper-thread), so it can express up to 64 core(or hyper-thread)). With 32 core 64 thread CPU, all the available affinity mask bit is used and free performance improvement of multi thread app by increasing CPU core ends there. If this trend continues and say 64 core 128 thread CPU is arrived, app should be rewritten to use multiple processor groups to squeeze all the CPU resource.

Sunday programmer since 1985

Developer of PlayPcmWin

Link to comment
10 hours ago, yamamoto2002 said:

It seems, in order to run CUDA programs on Volta, programs should be compiled using latest version of CUDA Toolkit, which dropped support of older Fermi based GPUs such as Geforce GTX 580 or Quadro 6000. I'm not sure this affects HQP

 

Latest HQPlayer Desktop 3.21 is compiled with latest CUDA 9.2. But already earlier versions compiled with CUDA 9.1 had full support for Volta. Availability of latest CUDA toolkit actually delayed my release, because Microsoft's update to Visual Studio 2017 broke the previous CUDA toolkit version... But the CUDA-Z test application you are running is compiled against CUDA 5 or 6 or something really old.

 

10 hours ago, yamamoto2002 said:

Yes it is good thing. On Windows, non-processor-group-aware apps can handle up to 64 core (or 64 hyper-thread). Process affinity mask is 64bit (one bit is associated to one core(or hyper-thread), so it can express up to 64 core(or hyper-thread)). With 32 core 64 thread CPU, all the available affinity mask bit is used and free performance improvement of multi thread app by increasing CPU core ends there. If this trend continues and say 64 core 128 thread CPU is arrived, app should be rewritten to use multiple processor groups to squeeze all the CPU resource.

 

Luckily I have very little Windows specific code and no limitations for number of CPU cores. So no need to rewrite anything... ;)

 

And I can also warmly recommend using Linux... ;)

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
3 hours ago, Miska said:

But the CUDA-Z test application you are running is compiled against CUDA 5 or 6 or something really old.

 

Thanks for your reply. I understood about CUDA binary forward compatibility and things are cleared up:

 

https://docs.nvidia.com/cuda/volta-compatibility-guide/index.html

Quote

 

  Applications that already include PTX versions of their kernels should work as-is on Volta-based GPUs. Applications that only support specific GPU architectures via cubin files, however, will need to be updated to provide Volta-compatible PTX or cubins.

 

 

So, CUDA-Z contains PTX binary for forward compatibility and if you are lucky enough, PTX code runs well on future architecture.

 

Sunday programmer since 1985

Developer of PlayPcmWin

Link to comment
  • 2 weeks later...
On 6/15/2018 at 6:29 PM, Miska said:

Luckily I have very little Windows specific code and no limitations for number of CPU cores. So no need to rewrite anything... ;)

 

Last I checked, on Windows, one process can handle up to 64 HT. So, in order to handle 128 HT, another worker process should be created and each process create 64 threads, and two processes communicate with inter-process communication. This is significant rewrite from casual multi threading code of Sunday programmer :)

Sunday programmer since 1985

Developer of PlayPcmWin

Link to comment
2 hours ago, yamamoto2002 said:

Last I checked, on Windows, one process can handle up to 64 HT. So, in order to handle 128 HT, another worker process should be created and each process create 64 threads, and two processes communicate with inter-process communication. This is significant rewrite from casual multi threading code of Sunday programmer :)

 

I'm not casual multi-threading Sunday programmer... ;) I also know how to program HPC clusters  / supercomputers.

 

But I still recommend going for Linux if you have anything more than a traditional average PC. It is not my programming, it is about Microsoft's.

 

Anyway, this is going off-topic for CUDA & HQPlayer.

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment

Thank you for your reply. I found Linux has much better multi threading support for casual Sunday programmer. Also there are several cross platform library to do overcome this kind of OS specific quirky. And sorry for topic drift.

Sunday programmer since 1985

Developer of PlayPcmWin

Link to comment
  • 3 weeks later...

Wondering how to enable K20 for cuda offload correctly in ubuntu 16.

hqplayer process is using correct gpu, but with cuda offload it hiccups more (using e5-2680v2 10 core xeon)

 

image.thumb.png.ccf8ee4f53797df95a4e5b9139ecd8cd.png

 

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37                 Driver Version: 396.37                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 00000000:02:00.0 N/A |                  N/A |
| 40%   41C    P8    N/A /  N/A |     95MiB /  2000MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20Xm         Off  | 00000000:04:00.0 Off |                  Off |
| N/A   77C    P0    59W / 235W |    106MiB /  6083MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
|    1      4660      C   /usr/bin/hqplayer                             95MiB |
+-----------------------------------------------------------------------------+

Link to comment
  • 4 months later...
On 7/20/2018 at 10:32 PM, 2a3set said:

Wondering how to enable K20 for cuda offload correctly in ubuntu 16.

hqplayer process is using correct gpu, but with cuda offload it hiccups more (using e5-2680v2 10 core xeon)

 

image.thumb.png.ccf8ee4f53797df95a4e5b9139ecd8cd.png

 

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37                 Driver Version: 396.37                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 00000000:02:00.0 N/A |                  N/A |
| 40%   41C    P8    N/A /  N/A |     95MiB /  2000MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20Xm         Off  | 00000000:04:00.0 Off |                  Off |
| N/A   77C    P0    59W / 235W |    106MiB /  6083MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
|    1      4660      C   /usr/bin/hqplayer                             95MiB |
+-----------------------------------------------------------------------------+

 

Is this with latest HQPlayer version? If that is the case, your driver is too old (396), you need latest driver (>= 410) for CUDA 10 support. Does HQPlayer tell that the offload is enabled when you start playback?

 

Now the GPU utilization is shown as 0% so things are probably working as they should...

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
On Tuesday, December 04, 2018 at 11:14 PM, Miska said:

 

Ehh; should be "not working as they should"...

 

 

Hi Miska, for HQP Desktop, I understand that there is a message at the bottom left corner telling us that it is enabled during the first few seconds when the music starts playing.

 

But for HQPE, how do we know that the CUDA offload is working properly? Thanks.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...