Jump to content
IGNORED

Best Nvidia CUDA Card for HQPlayer


juliocat

Recommended Posts

So I decided to upgrade RTX 3080 to RTX a5000.

 

I have the latest CUDA software for RTX 3080 currently.

 

Do I need to do anything other than just swap the GPUs?

QNAP NAS w/minimserver, iBuypower  i7 13700kf,  RTXa5000 24g GPU, Ubuntu 22.04 LTS minimal server, HQPe v5 x64 avx2, HQPDcontrol4,  HQPlayer Client iOS, mconnect playerHD, JplayiOS, Daphile on Asus PN-51-s1 (AMD 5700u) in Akasa fanless case, Snakeoil OS NAA/NAA image on Fitlet2 , Lampizator Big 7 MKII Balanced, Pass XVR1, Pass X5, Pass XA 100.5’s, PSB Stratus Gold(i)’s, Vandersteen 2wq’s.

Link to comment

Now also Ada-generation A4500 (AD104):

https://www.nvidia.com/en-us/design-visualization/rtx-4500/

and A4000 (AD104):

https://www.nvidia.com/en-us/design-visualization/rtx-4000/

are available.

 

My A4500 is previous Ampere-generation (GA102).

 

For some reason, with the newer generation double precision performance actually went down according to this:

https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units

A5000 is pretty much on same level and for A6000 it improved.

 

Those figures don't tell the whole truth for HQPlayer performance though, as we've seen before. Those figures are mostly good for comparing relative performance within same generation.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment

That’s some wiki.

 

I think my a5000 is the generation right before Ada which would be same gen as yours

 

 

QNAP NAS w/minimserver, iBuypower  i7 13700kf,  RTXa5000 24g GPU, Ubuntu 22.04 LTS minimal server, HQPe v5 x64 avx2, HQPDcontrol4,  HQPlayer Client iOS, mconnect playerHD, JplayiOS, Daphile on Asus PN-51-s1 (AMD 5700u) in Akasa fanless case, Snakeoil OS NAA/NAA image on Fitlet2 , Lampizator Big 7 MKII Balanced, Pass XVR1, Pass X5, Pass XA 100.5’s, PSB Stratus Gold(i)’s, Vandersteen 2wq’s.

Link to comment
6 hours ago, Triplefun said:

Is anyone using a NVIDIA  tesla k80. They have 4992 Cuda cores and 24gb ram and relatively  cheap on eBay. Require 300w power.

alternatively see the cheaper K40 on ebay. The K40 requires 6 pin power, has a passive option, has 2880 cuda cores (series 3.5) BUT supports 1,4300 double precision (ie. FP64).  The RTX4070 has 5888 cuda cores but supports only 455.4 FP64.

 

So would the tesla k40 be better for HQplayer than the k40.

 

Note there is also a passive version (no fan) of the k40 !!!

Link to comment
58 minutes ago, Triplefun said:

alternatively see the cheaper K40 on ebay. The K40 requires 6 pin power, has a passive option, has 2880 cuda cores (series 3.5) BUT supports 1,4300 double precision (ie. FP64).  The RTX4070 has 5888 cuda cores but supports only 455.4 FP64.

 

So would the tesla k40 be better for HQplayer than the k40.

 

Note there is also a passive version (no fan) of the k40 !!!

 

K40 won't work, since it is too old and doesn't support the needed features. Minimum baseline at the moment is compute capability 5.2. K40 (Kepler) is 3.5.

 

Nvidia has also dropped support for the oldest generations from CUDA SDK, that could be reason you can find those old ones for cheap.

 

I wouldn't likely go older than V100 if you are into those datacenter GPUs.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment

Looking for advice from those with experience using a GPU card when processing multi-channel audio in HQPlayer Embedded.  I am currently running an 8-channel exaSound s88 DAC, while using 8 Matrix Pipeline filter channels for digital crossover and room correction. With my i9-13900K and no GPU, I can achieve DSD256 (max DAC capability) using EC7 series modulators and filters like poly-sinc-gaus-xla with 44/48 kHz base rate source material. Trying to run sinc-MGa results in periodic dropouts. I would like to explore using some of the longer filters if possible or practical.

 

I am considering two GPUs to offload correction and oversampling filters: Nvidia RTX 3090 and RTX 4080. The basic question is, will I benefit more from the higher memory, lower speed 3090 (24 GB/1395 MHz) or the lower memory, higher speed 4080 (16 GB/2205 MHz)? There are probably other factors I am not aware of, but the two cards are in the same price range and not quite at the top of what is available on the market.

Link to comment
1 hour ago, sledwards said:

I am considering two GPUs to offload correction and oversampling filters: Nvidia RTX 3090 and RTX 4080. The basic question is, will I benefit more from the higher memory, lower speed 3090 (24 GB/1395 MHz) or the lower memory, higher speed 4080 (16 GB/2205 MHz)?

 

For multichannel case and big filters plus convolution, focusing on amount of GPU RAM is particularly important. You may find GPU models with more RAM from the professional RTX series.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
On 10/24/2023 at 2:01 PM, Triplefun said:

alternatively see the cheaper K40 on ebay. The K40 requires 6 pin power, has a passive option, has 2880 cuda cores (series 3.5) BUT supports 1,4300 double precision (ie. FP64).  The RTX4070 has 5888 cuda cores but supports only 455.4 FP64.

 

So would the tesla k40 be better for HQplayer than the k40.

 

Note there is also a passive version (no fan) of the k40 !!!

I had the P100 12GB datacenter card, no fan either, however cooling is not optional, I used a fan + 3d printed  shroud for this.

Pinouts for the power connector are also not standard, needed adapter to connect the card to the pcie power connectors on my PSU.

If you are interested in an older datacenter card , the table below shows the compute capabilities of the various model.

 

P100 is the first model with HBM memory built into the GPU, holds its own quite well in HQP5, used prices are dropping on this one, about half of what I paid 4 months ago.

cuda_compute.thumb.jpg.6a4eace5dc4ea5f9106558ec1140f7a3.jpg

Link to comment
On 10/30/2023 at 2:27 AM, paradis said:

@b0bb how would you compares the P100 to the RTX series, how performant is the P100? I understand that you've had RTX3080 for instance.

I use the RTX3080Ti, this is the 3090 with half the memory.

P100 is 50%  of the performance.

Do not have data for other RTX devices.

 

For the filters I have tried (polysinc-gauss and SincMX) to DSD512 output rate, the P100 does the job without dropouts.

 

GPU is load is close to 85% in the scenario above based on numbers from nvidia-smi and nvtop tools.

RTX3080Ti in this situation is 35-45% load.

 

P100 struggles with non integer conversions while simultaneously converting PCM to DSD eg 48k/96k/192k to DSD512

Link to comment
On 11/4/2023 at 12:05 AM, b0bb said:

I use the RTX3080Ti, this is the 3090 with half the memory.

P100 is 50%  of the performance.

Do not have data for other RTX devices.

 

For the filters I have tried (polysinc-gauss and SincMX) to DSD512 output rate, the P100 does the job without dropouts.

 

GPU is load is close to 85% in the scenario above based on numbers from nvidia-smi and nvtop tools.

RTX3080Ti in this situation is 35-45% load.

 

P100 struggles with non integer conversions while simultaneously converting PCM to DSD eg 48k/96k/192k to DSD512

Thanks, that's helpful

Link to comment
  • 2 weeks later...

So i finished the build with a 7600x (6 cores, 12 threads, 4.7Ghz, 5.3Ghz) and rtx4090 in a fractal r5 case fronting a Holo May K3 run HQplayer 5.3 and Roon with Windows 11 Pro. I can run ASDM7ecv2 at DSD512 but DSD1024 is intermittent stuttering. Would upgrading the CPU to a 7950x3d (16 cores, 32 threads, 4.2Ghz, 5.7Ghz) realise DSD1024?

 

According to Coretemp, the CPU is running only 7% and with GPU-z the Rtx4090 70%. and CPU temp 85c. 

 

We tried PBO 65w in the BIOS but DSD512 would not run.

 

Also played round with the advanced settings. Forcing all CPU cores was worse than automatic. We have forced the cuda cores.

 

Alternatively should I wait for the Ryzen 8000 Zen 5 processors which are still supported by the AM5 socket (hopefullly the same mobo) and supposedly 25% more ipc than Zen4.

 

I prefer the ryzen sound to Intel although this migh have changes. Would changing to an Intel i9-14900k work. I note according to cpu.userbenchmark the 14900k has 24 cores, 32 threads, 3.2Ghz, and 5.9ghz overclock - lots of heat.

 

I also found AMD's "equivalent" 7950x3d is haunted by software bugs and "features" such as needing to have Xbox game bar open 24/7 and having their motherboards overvolt and kill the CPUs."  (MrNukes's User Profile - UserBenchmark).

Link to comment
  • 2 weeks later...

Good evening. 

 

I Just started to get familiar with the cuda topic. I am running hqplayer embedded currently. Is there anything else required for running cuda on hqplayer embedded besides the os? So basically just plug the gpu to the system and anything else runs automatically or do I need to install some drivers etc? Thank you. 

Link to comment
10 minutes ago, Sunny_Player said:

Good evening. 

 

I Just started to get familiar with the cuda topic. I am running hqplayer embedded currently. Is there anything else required for running cuda on hqplayer embedded besides the os? So basically just plug the gpu to the system and anything else runs automatically or do I need to install some drivers etc? Thank you. 

I don’t believe it’s supported on HQP embedded. 

Founder of Audiophile Style | My Audio Systems AudiophileStyleStickerWhite2.0.png AudiophileStyleStickerWhite7.1.4.png

Link to comment
1 hour ago, The Computer Audiophile said:

Fully supported by @Miska?

 

I believe he doesn’t support it and has said so numerous times. 

CUDA is not supported on HQPlayer OS, which runs realtime kernel.

But it is supported on HQPlayer Embedded installed on supported Linux distros running low latency kernel.
See https://www.signalyst.com/custom.html
DSP offload to GPU using NVIDIA CUDA (only on Ubuntu, Fedora and Debian -based systems)

i7 11850H + RTX A2000 Win11 HQPlayer ► Topping HS02 ► 2x iFi iSilencer ► SMSL D300 ► DIY headamp DHA1 ► HiFiMan HE-500
Link to comment
2 minutes ago, bogi said:

CUDA is not supported on HQPlayer OS, which runs realtime kernel.

But it is supported on HQPlayer Embedded installed on supported Linux distros running low latency kernel.
See https://www.signalyst.com/custom.html
DSP offload to GPU using NVIDIA CUDA (only on Ubuntu, Fedora and Debian -based systems)

Ah yes. I was combining HQP OS and embedded. 

Founder of Audiophile Style | My Audio Systems AudiophileStyleStickerWhite2.0.png AudiophileStyleStickerWhite7.1.4.png

Link to comment
  • 3 months later...

just another sharing, got hold of a Titan V 12GB cards for testing.

 

It does upsampling from 44.1khz to DSD512 Sinc-L under ASDM7ECv3 without breaking a sweat, it uses up to 11700MB/12288MB of the 12GB memory on the Titan V

 

based on the pricing of these cards, it seems either Titan V / Tesla P100 can be a great candidate for anything up to DSD512 as it handled most if not all filters up to DSD512 nicely

Link to comment
3 hours ago, AudioDoctor said:

I just came across this card elsewhere and thought to myself, I bet this will fit nicely in an HDplex H5 case, and at 70 watt power usage, may even be able to be passively cooled in one...

 

https://www.pny.com/nvidia-rtx-4000-sff-ada-generation?iscommercial=true

nice for a 70w cards and have the potential to leverage on the HDPlex other heatsink for passive cool, but i saw the FP64 performance is worse than the Ampere A4000 so wonder it'll be performant enough, but the 70w power is the main attraction, and 20GB of memory

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...