Jump to content
IGNORED

Building a high performance compute server for HQPlayer


Recommended Posts

5 hours ago, jabbr said:

Geekbench is just a number. For compute you want CUDA because that’s what HQP  uses (not  OpenCL). I think @Miska’s idea of looking at the time for Pro to encode a song / the song real time is a great measurement of the system’s performance. Yeah EC/DSD512 not anytime soon. 


The question is whether AVX512 @ 3.9 GHz is better than AVX2 @ 5 - 5.3 GHz. 

 

HQPlayer 4 Desktop has the benchmark feature. Although I accidentally broke it with the latest start/stop change (already fixed for next release). But with a releases before that change it should work.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
8 minutes ago, Solstice380 said:

I just ran their GPU Test and it returned the OpenCL result.  I was curious why @jabbr got a CUDA result and I got the OpenCL.   Difference between my GPU and the Ti?  Or?

 

Are you on Nvidia graphics? Since CUDA is Nvidia-only...

 

Also CUDA needs new enough graphics driver version, which cannot be older than the SDK used to build the software.

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment

My main reason to go with new Mellanox cards was RDMA and NIC offload ... used with stuff other than HQPlayer ... so I'm running some benchmarks ... first with HQPe and Roon on same machine (note that xtr-mp/EC/DSD256 runs fine with Roon on same machine:

top - 13:36:24 up 13:47,  1 user,  load average: 1.99, 0.89, 0.55
Tasks: 426 total,   1 running, 281 sleeping,   0 stopped,   0 zombie
%Cpu(s): 17.6 us,  0.1 sy,  0.0 ni, 82.1 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem : 32563588 total, 11880460 free,  6129732 used, 14553396 buff/cache
KiB Swap:  2097148 total,  2092284 free,     4864 used. 25912360 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                     
 1484 hqplayer  10 -10 9691188 738024 178712 S 245.7  2.3 169:29.35 hqplayerd                                                                  
 1743 root      20   0 11.189g 3.674g 423404 S  36.8 11.8 344:52.58 RoonAppliance                                                               
   11 root      20   0       0      0      0 I   0.3  0.0   0:20.85 rcu_sched                                                                   
 3326 root      20   0       0      0      0 S   0.3  0.0   2:18.68 cifsd

We see that cifsd has dropped to 0.3%, now lets move Roon to another server:

... except that roonlabs.com is down so hold that thought ...

Custom room treatments for headphone users.

Link to comment

Killer NICs that for example Gigabyte puts on some motherboards are also good for offloads. They have pretty much entire TCP/IP stack on the NIC. Linux support is not great for all models, but the models I have work on recent Linux kernels. These are just more gaming oriented NICs aiming for minimizing latencies.

 

https://www.killernetworking.com

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
2 minutes ago, Miska said:

Killer NICs that for example Gigabyte puts on some motherboards are also good for offloads. They have pretty much entire TCP/IP stack on the NIC. Linux support is not great for all models, but the models I have work on recent Linux kernels. These are just more gaming oriented NICs aiming for minimizing latencies.

 

https://www.killernetworking.com

 

Wow, I haven't had a Killer NIC for many years. They seem cool, but many people are suspect. Hey, that's like audio :~)

Founder of Audiophile Style | My Audio Systems AudiophileStyleStickerWhite2.0.png AudiophileStyleStickerWhite7.1.4.png

Link to comment
5 minutes ago, The Computer Audiophile said:

Wow, I haven't had a Killer NIC for many years. They seem cool, but many people are suspect. Hey, that's like audio :~)

 

I have couple of Gigabyte motherboards that have those, and the DAC UP USB ports.

 

Now the latest Gigabyte Z390 Designare motherboard I have for the 9900KS has just two Intel NICs (different models actually), but still comes with their newest DAC-UP2 USB ports.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
On 4/15/2020 at 4:36 PM, Miska said:

 

This was a test build on Windows, so some production build may differ a little. I've already put a lot of effort to make EC modulators run like they now do. I'm not expecting any big gains on this front. No thermal throttling. Maybe @jabbr can test at some point how it runs with AVX512 on his new Xeon. Larger and faster cache may help too. But large part of the limitation is just how many instructions a CPU core can execute within available number of clock cycles.

 

 

I just tried a few settings. Both poly-sinc-ext2 and xtr-mp with ASDM7EC took almost 2x (a few seconds shy of 2x), with or without CUDA. CUDA with -mp does significantly decrease CPU usage. In one case Pro used 640% CPU (CUDA off), with all cores active.

 

I didn't spend a great deal of time optimizing anything but my results seem roughly the same as yours.

Custom room treatments for headphone users.

Link to comment
24 minutes ago, jabbr said:

I just tried a few settings. Both poly-sinc-ext2 and xtr-mp with ASDM7EC took almost 2x (a few seconds shy of 2x), with or without CUDA. CUDA with -mp does significantly decrease CPU usage. In one case Pro used 640% CPU (CUDA off), with all cores active.

 

Yes, filters are not usually the limiting factor, but instead EC modulators are the ones that set the pace. This can be different with non-EC normal modulators and heavier filters.

 

But if the speed is close to similar, it means that AVX512 offsets the lower clock speed enough to get close on-par to 9900KS running at 5 GHz on per-core speeds in this case. So at least no loss in that sense, and complex filters may be running faster than 9900KS.

 

 

P.S. I just checked that equivalent new HP workstation (Z4) to my current one, with W-2245 costs about 3200€ here (32 GB RAM, 1 TB SSD).

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
21 hours ago, Miska said:

 

Because it provides nice amount of extra processing power, and is more powerful in some tasks than a CPU. But primarily it can free up CPU resources for other tasks.

 

Thanks for educating me. I know next to nothing about building music servers. Thus I am here attempting to learn.

 

JC

Link to comment

Even considering the current Dell sale, I specced out a workstation matching the one he JABBER just purchased, added in the cost of the his additional RAM, the pricey NIC card, and the most reasonably priced 2080 that I could find and it was over $3k. I was considering this as a possible path to allow me to run Roon and HQ Player in conjunction, at DSD256, and with filters in hopes of seeing the very significant sonic upgrade Chris mentioned in his RAAL SR1a discussion. This is, however, a pretty costly endeavor. And how long would a server pc build like that be viable to able to run HQ Player optimally as described above? Much less at DSD512.

 

JC 

Link to comment
2 hours ago, TubeLover said:

Even considering the current Dell sale, I specced out a workstation matching the one he JABBER just purchased, added in the cost of the his additional RAM, the pricey NIC card, and the most reasonably priced 2080 that I could find and it was over $3k. I was considering this as a possible path to allow me to run Roon and HQ Player in conjunction, at DSD256, and with filters in hopes of seeing the very significant sonic upgrade Chris mentioned in his RAAL SR1a discussion. This is, however, a pretty costly endeavor. And how long would a server pc build like that be viable to able to run HQ Player optimally as described above? Much less at DSD512.

 

You can get quite a bit cheaper with i9-9900K and RTX2080. But you can get also started with just i9-9900K and add the RTX2080 later, it has built-in graphics so you get display output without a separate card, although not any offloads from HQPlayer.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
5 hours ago, TubeLover said:

Even considering the current Dell sale, I specced out a workstation matching the one he JABBER just purchased, added in the cost of the his additional RAM, the pricey NIC card, and the most reasonably priced 2080 that I could find and it was over $3k. I was considering this as a possible path to allow me to run Roon and HQ Player in conjunction, at DSD256, and with filters in hopes of seeing the very significant sonic upgrade Chris mentioned in his RAAL SR1a discussion. This is, however, a pretty costly endeavor. And how long would a server pc build like that be viable to able to run HQ Player optimally as described above? Much less at DSD512.

 

JC 


Start with the cheapest Precision 5820 and then upgrade the CPU, power supply etc. It comes with 8Gb RAM which is enough to get started— or add more. You don’t need an expensive NIC at all, get Mellanox ConnectX-3 on eBay for <$50 or an Intel x520 — seriously no need for something uber expensive. 
 

You don’t need to start with an RTX 2080 either so all in you should be <$2000

Custom room treatments for headphone users.

Link to comment

To summarize some of the measurements I've found so far as I started with the base 5820 and measured after each upgrade:

 

1) 8 Gb ECC 2930 -> 32 Gb 3200 ... the memory is needed for certain applications but does not appreciably/dramatically affect the results of HQPe nor did it allow me to run a modulator or filter that I wouldn't have otherwise

 

2) NIC: HQPe is not limited by the speed at which a music file is pulled from the NAS: the disc i/o (cifs process) uses 5-10% of one core *a most* at 1Gbe, NIC offload benefits for *this application* will be met with most any fiberoptic NIC. There is something called "SMB Direct" which is enabled by NICs that support RDMA (remote direct memory access). This paper discusses benefits: https://www.chelsio.com/wp-content/uploads/resources/T5-SMBOverRDMA-vs-NIC.pdf ... again audio i/o rates do not need this.

The Connectx-5 I am using supports RDMA, and *not* generally needed or useful for home audio purposes. Although the 100Gbase NICs have extraordinarily low jitter -- far better than *any* copper Ethernet by a long shot, I cannot hear a difference beyond any pro grade 10Gbe NIC i.e. Intel x520, Mellanox Connectx-3 (RDMA), Solarflare 

 

Upgrading the NIC did not appreciably improve the performance of HQPe such that I was able to run a modulator or filter that I wouldn't have otherwise.

 

3) GPU ... the GPU helps 24_96/24_192 -> DSD conversion reducing about 50-60% of one core ... the GPU should enable multichannel room correction and digital  crossovers with DSD256, so there's a lot you can do, and many filters that can be used without GPU acceleration, that said, GPU/CUDA  acceleration is useful in certain circumstances.

 

dsd256/asdm7ec and any filter is doable on this workstation

Custom room treatments for headphone users.

Link to comment

Thank you @jabbr for posting your findings and conclusions. I am just now building a fanless PC for HQPe and Roon. It won't be such high performance as your Xeon but it is something I had planned anyway and need to finish it.

 

  • HDPlex H5 v2 chassis
  • Asus TUF Gaming Wifi motherboard
  • Ryzen 3800X
  • Nvidia GTX 1650 (GDDR5 or 6 not sure yet)

 

I will be happy if I can run DSD256 with ASDM7EC

 

If you and @Miska could provide some recommendations or changes to this build (without altering the chassis) that would be greatly appreciated like memory recommendations, ECC or unbuffered, etc.

 

Thank you in advance

Link to comment

@luisma it would be great to run @Miska's test of looking at how long HQ Pro takes to encode ASDM7EC on a Ryzen because that looks to me to be the best overall test of the system performance.  Xeon W looks to be similar to 9900K overall for HQP but I suspect each would be better for different tasks ... Ryzen has its own tradeoffs so hard to say .. e.g. it doesn't have AVX512

 

Obviously many of the best results on Geekbench are overclocked and with active cooling! When running Pro, My workstation might work even better with better cooling 🤷‍♂️

Custom room treatments for headphone users.

Link to comment
25 minutes ago, jabbr said:

@luisma it would be great to run @Miska's test of looking at how long HQ Pro takes to encode ASDM7EC on a Ryzen because that looks to me to be the best overall test of the system performance.

 

You can do the same test with Desktop v4 too, it is just broken in current release due to regression. I fixed that and improved the time display to show m:ss.sss instead of just plain seconds.

 

You can also do similar test with Embedded, but there's currently no display for the timing data, although HQPlayer engine has it. Maybe I need to add this same functionality to the web interface.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
8 hours ago, Miska said:

 

You can get quite a bit cheaper with i9-9900K and RTX2080. But you can get also started with just i9-9900K and add the RTX2080 later, it has built-in graphics so you get display output without a separate card, although not any offloads from HQPlayer.

 

Thanks for the input Miska. What strategy would you recommend to manage to keep things "quite a bit cheaper with i9-9900K and RTX2080". Thanks.

 

JC

Link to comment
5 hours ago, jabbr said:


Start with the cheapest Precision 5820 and then upgrade the CPU, power supply etc. It comes with 8Gb RAM which is enough to get started— or add more. You don’t need an expensive NIC at all, get Mellanox ConnectX-3 on eBay for <$50 or an Intel x520 — seriously no need for something uber expensive. 
 

You don’t need to start with an RTX 2080 either so all in you should be <$2000

Thanks jabbr. But, in the end, the cost will be the same, granted, I could stretch out the expense, as you noted. But how much would using a less expensive video card cost me in terms of performance, and is there a recommended budget solution as to what card to go with? Thanks.

 

JC

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...