Jump to content
IGNORED

Building a high performance compute server for HQPlayer


Recommended Posts

3 hours ago, zerung said:

Are you planning to upgrade the power supply or stay with the present one? Once the NVIDIA is placed, Do you think the fan’s noise and the impact of the EMI, etc. on the system can be managed?

 

Kudos on taking another way for the HQPlayer server.

 

I have a 950W PSU but powering the RTX 2080 Ti is always a challenge with cables.

 

The EMI isn't an issue because the Mellanox NIC is designed to operate within extremely tight tolerances even in the presence of noise. The optical output will not transmit EMI, nor jitter for that matter, and the workstation remains far away from my audio system. I plan to use a Source Photonics 100Gbase-LR4 module which uses duplex LC connectors and single mode fiber.

Custom room treatments for headphone users.

Link to comment

CUDA only seems to add a little bit of performance, using an RTX 2080 Ti with 440 driver:

Nether can do ASDM7EC without stuttering - hqp goes to 100% cpu on two cores and only 7% CUDA per nvidia-smi FF50EB0A-EEB7-4C99-91C2-1AAC92803F6A.thumb.jpeg.2c3a36317cc5b64b64176ba5cbe22782.jpeg

 

1E613DB6-693D-414F-9875-4D21839863A8.thumb.jpeg.e053e0aa5360d36d4796c58bc334f524.jpeg

 

when using AMSDM7 512 + fs the CPUtilization of HQP goes from 284% to 200% and 14% CUDA: 

9762FC0F-95F2-4C7B-83B4-1DF15E388E84.thumb.jpeg.6a1c677fbbc8c62dbf1c32a18837853e.jpeg8B45F725-42C6-4081-A204-4599FD602E91.thumb.jpeg.6bf986bc255c4e38eeaa76df9463f871.jpeg0EC7F1DD-F66C-474C-BFED-7FFA7ED60043.thumb.jpeg.b56b670f7d7a1cff94299432a75ce2e8.jpeg

 

81EDAAA3-32B9-4A2C-84FE-AD9962DAEBA7.jpeg

AC59A0C3-FE6A-41FB-8636-4E1FA2AA8925.jpeg
 


@Miska does this seem about right? Thoughts?

Custom room treatments for headphone users.

Link to comment
On 4/5/2020 at 7:34 PM, Miska said:

Yeah, you get decently high GPU load on RTX when you do for example 48k to 44.1 x 512 with poly-sinc-xtr.

 

Alternatively, if you want to do 8 channels with a regular filter to DSD256 using the EC modulators, you end up pretty high load on both GPU and CPU.

 

 

Yes that's the eventual idea. Any consideration of providing for >1 DAC with multichannel, I am sure you are considering physically separate NAA/DACs.

 

I must say the Dell case is very well made, and quiet even with the RTX 2080 Ti running. 

 

I am waiting for my replacement NIC but I suspect that even with RDMA offload, the CPU load will not change appreciably - I think I'm very close to ASDM7EC / 512 but as they say, close only counts for ...

 

Interestingly the CPU bursts ( $ lscpu ) to 4.5 Ghz with AMSDM7 but only to 3.9 Ghz with ASDM7EC so I think there's hope of tweaking something

Custom room treatments for headphone users.

Link to comment
5 minutes ago, Miska said:

 

You could run null output test and see what kind of processing time you get for a track vs track length.

 

So far highest published boost is the new flagship laptop CPU i9-10980HK at 5.3 GHz, but it is only single core boost. For this case you would need at least two cores boosted and I don't know how much boost that CPU can have with  two cores.

 

i9-9900KS can do all-core boost at 5 GHz, but it is not enough for ASDM7EC at DSD512.

 

Will be interesting to see how high clocks there will be for the 10th gen desktop CPUs.

 

 

True but W2245 has AVX512. Its curious that ASDM7EC is not able to boost c/w AMSDM7 (3.9 vs 4.5). W2245 should be able to multicore boost. I may need to look at each core separately -- htop shows % but not per core clock rate. 

Custom room treatments for headphone users.

Link to comment

I created an AWK to capture the /proc/cpuinfo output:

In both cases output 44.1x512

First from the AMSDM7: Note that 4 threads are boosting to 4.5 Ghz

Processor:  0  Mhz:  2943.395
Processor:  1  Mhz:  1200.102
Processor:  2  Mhz:  2802.146
Processor:  3  Mhz:  1200.199
Processor:  4  Mhz:  4499.999
Processor:  5  Mhz:  1200.046
Processor:  6  Mhz:  1201.438
Processor:  7  Mhz:  4561.842
Processor:  8  Mhz:  2436.413
Processor:  9  Mhz:  1201.864
Processor:  10  Mhz:  2671.769
Processor:  11  Mhz:  1200.799
Processor:  12  Mhz:  4492.542
Processor:  13  Mhz:  1200.343
Processor:  14  Mhz:  1200.020
Processor:  15  Mhz:  4514.522
CPU model:  Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz
1 CPU,  8 physical cores per CPU, total 16 logical CPU units

Now with ASDM7EC: Note that these 4 threads stay at 3.9 Ghz

Processor:  0  Mhz:  3965.971
Processor:  1  Mhz:  1200.031
Processor:  2  Mhz:  1200.164
Processor:  3  Mhz:  1265.660
Processor:  4  Mhz:  3965.246
Processor:  5  Mhz:  1201.363
Processor:  6  Mhz:  1201.073
Processor:  7  Mhz:  3076.107
Processor:  8  Mhz:  3900.650
Processor:  9  Mhz:  1200.346
Processor:  10  Mhz:  1201.619
Processor:  11  Mhz:  1270.791
Processor:  12  Mhz:  3899.409
Processor:  13  Mhz:  1200.285
Processor:  14  Mhz:  1200.009
Processor:  15  Mhz:  3145.273
CPU model:  Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz
1 CPU,  8 physical cores per CPU, total 16 logical CPU units

 

Custom room treatments for headphone users.

Link to comment
6 minutes ago, Miska said:

 

Hard to say, but it could be AVX512 capping the clocks, because ASDM7EC puts more load on it.

 

Yes as we've discussed. Is there a way to send more cycles to CUDA which isn't getting used too much? Perhaps a "prefer-cuda" flag? Or "enable-AVX512". That might just make 44.1x512 work?

Custom room treatments for headphone users.

Link to comment
4 hours ago, Miska said:

Btw, note that for full memory speed on a quad-channel CPU you need four DIMMs...


yep, I’m not paying dell for RAM, not a GPU nor NIC ... going to install NIC then RAM and test after each

Custom room treatments for headphone users.

Link to comment

Some more numbers:

looking at the temps - running ASDM7EC-DSD256:

dell_smm-virtual-0
Adapter: Virtual device
fan1:        1000 RPM
fan2:         723 RPM
fan3:         714 RPM

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +65.0°C  (high = +88.0°C, crit = +98.0°C)
Core 0:        +64.0°C  (high = +88.0°C, crit = +98.0°C)
Core 2:        +52.0°C  (high = +88.0°C, crit = +98.0°C)
Core 3:        +51.0°C  (high = +88.0°C, crit = +98.0°C)
Core 5:        +51.0°C  (high = +88.0°C, crit = +98.0°C)
Core 8:        +65.0°C  (high = +88.0°C, crit = +98.0°C)
Core 10:       +49.0°C  (high = +88.0°C, crit = +98.0°C)
Core 11:       +49.0°C  (high = +88.0°C, crit = +98.0°C)
Core 12:       +51.0°C  (high = +88.0°C, crit = +98.0°C)

When I try to run at DSD512, the sound is on for a second and then off for a second and repeat. The CPU utilization doesn't get >70%

(32Gb RAM now)

dell_smm-virtual-0
Adapter: Virtual device
fan1:         991 RPM
fan2:         724 RPM
fan3:         680 RPM

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +44.0°C  (high = +88.0°C, crit = +98.0°C)
Core 0:        +44.0°C  (high = +88.0°C, crit = +98.0°C)
Core 2:        +41.0°C  (high = +88.0°C, crit = +98.0°C)
Core 3:        +42.0°C  (high = +88.0°C, crit = +98.0°C)
Core 5:        +41.0°C  (high = +88.0°C, crit = +98.0°C)
Core 8:        +43.0°C  (high = +88.0°C, crit = +98.0°C)
Core 10:       +39.0°C  (high = +88.0°C, crit = +98.0°C)
Core 11:       +40.0°C  (high = +88.0°C, crit = +98.0°C)
Core 12:       +41.0°C  (high = +88.0°C, crit = +98.0°C)

 

Custom room treatments for headphone users.

Link to comment
8 hours ago, The Computer Audiophile said:

This is interesting. I'm trying ASDM7EC and my old school Xeon E3-1241 v3 CPU doesn't get above 40-50%, yet the audio constantly stutters. 


Yeah, that’s what I am seeing above: the stuttering throws off the measurement — the CPU usage spikes to 100% then a stutter starts and it drops during the stutter “recovery”. What you are seeing is probably an average rather than instantaneous CPU.

 

That modulator needs a lot of horsepower.

 

I can’t get my machine to boost its clock rate >4 GHz and suspect the AVX512 instructions are holding it back. That might mean that ASDM7EC is limited to DSD256 on the current generation of machines — well maybe unless fooling around with over clocking or even disabling AVX512 — counterproductive? — I have one more optimization trick before bailing at trying to get DSD512 working ...

Custom room treatments for headphone users.

Link to comment
10 hours ago, asdf1000 said:

 

With EC modulator? Forget about it at DSD512 (for real-time).

 

With current CPU clock rates and processing capabilities this seems to be true.

 

At DSD256 I am hitting 90-95% at 3.9 Ghz (again the AMSDM7 modulator "allows" the CPU to 4.7 Ghz ... more evidence about the clock limiting behavior of AVX512).

 

Increasing RAM channels didn't cause a huge difference. I don't think optimizing the network i.e. RDMA + NIC offload, will make a huge difference either. So here we have a Xeon W 2245 along with RTX 2080 Ti workstation which I suspect will be able to do 6 channels of DSD256 / EC ... that would handle room correction + digital crossover.

 

But look at what we can do with this approach! Moreover you can buy this prepackaged from Dell ... the workstation is surprisingly quiet, I installed my own GPU, RAM and NIC the base price ~$1600 is very reasonable for what you get.

Custom room treatments for headphone users.

Link to comment

closed-form-16m, ASDM7EC, DSD256 both 16/44.1 or DSD64 source => 98%

Processor:  0  Mhz:  3905.090
Processor:  1  Mhz:  1200.102
Processor:  2  Mhz:  1200.071
Processor:  3  Mhz:  1951.016
Processor:  4  Mhz:  3905.621
Processor:  5  Mhz:  1200.017
Processor:  6  Mhz:  2586.072
Processor:  7  Mhz:  1200.296
Processor:  8  Mhz:  4439.282
Processor:  9  Mhz:  1200.089
Processor:  10  Mhz:  1200.032
Processor:  11  Mhz:  2353.270
Processor:  12  Mhz:  3897.773
Processor:  13  Mhz:  1201.184
Processor:  14  Mhz:  3163.338
Processor:  15  Mhz:  1201.437
CPU model:  Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz
1 CPU,  8 physical cores per CPU, total 16 logical CPU units

 

Custom room treatments for headphone users.

Link to comment
13 hours ago, asdf1000 said:

There's already a big thread where people shared builds:

 

https://audiophilestyle.com/forums/topic/56966-hqplayer4-ec-modulator-tips-and-techniques/

 

 

Yes I see. There are several threads in both the Software and Network subfora that delve into hardware ... I posted this in the Music Server subforum for that reason -- also want to get into other topics than EC in time, but obviously there are overlaps.

Custom room treatments for headphone users.

Link to comment
2 hours ago, Solstice380 said:

 

These are the results for my new system that I posted in the EC Modulators thread.

 

Nice! Yeah the KS not easily available for me, and if I wasn’t using my crazy NIC that would’ve been a great option (I needed 2 x PCIe x16 ... one for GPU and one for NIC ... but for normal purposes the KS is sweet ;) 

Custom room treatments for headphone users.

Link to comment

Dell Inc. Precision 5820 Tower

1234
Single-Core Score
8648
Multi-Core Score
Geekbench 5.1.0 Tryout for Linux x86 (64-bit)

Result Information

User jabbr
Upload Date April 15 2020 02:25 PM
Views 5

System Information

System Information
Operating System Ubuntu 18.04.4 LTS 5.3.0-46-generic x86_64
Model Dell Inc. Precision 5820 Tower
Motherboard Dell Inc. 06JWJY
Processor Information
Name Intel Xeon W-2245
Topology 1 Processor, 8 Cores, 16 Threads
Identifier GenuineIntel Family 6 Model 85 Stepping 7
Base Frequency 4.70 GHz
L1 Instruction Cache 32.0 KB x 8
L1 Data Cache 32.0 KB x 8
L2 Cache 1.00 MB x 8
L3 Cache 16.5 MB x 1

Single-Core Performance

Single-Core Score 1234  
Crypto Score 1655  
Integer Score 1186  
Floating Point Score 1268  
AES-XTS 1655
2.82 GB/sec
 
Text Compression 1110
5.61 MB/sec
 
Image Compression 1242
58.7 Mpixels/sec
 
Navigation 1017
2.87 MTE/sec
 
HTML5 1258
1.48 MElements/sec
 
SQLite 1229
385.0 Krows/sec
 
PDF Rendering 1183
64.2 Mpixels/sec
 
Text Rendering 1173
373.8 KB/sec
 
Clang 1287
10.0 Klines/sec
 
Camera 1199
13.9 images/sec
 
N-Body Physics 1120
1.40 Mpairs/sec
 
Rigid Body Physics 1333
8259.3 FPS
 
Gaussian Blur 789
43.4 Mpixels/sec
 
Face Detection 1128
8.69 images/sec
 
Horizon Detection 1090
26.9 Mpixels/sec
 
Image Inpainting 1979
97.1 Mpixels/sec
 
HDR 2414
32.9 Mpixels/sec
 
Ray Tracing 1531
1.23 Mpixels/sec
 
Structure from Motion 1162
10.4 Kpixels/sec
 
Speech Recognition 1058
33.8 Words/sec
 
Machine Learning 1047
40.5 images/sec
 

Multi-Core Performance

Multi-Core Score 8648  
Crypto Score 6766  
Integer Score 8396  
Floating Point Score 9509  
AES-XTS 6766
11.5 GB/sec
 
Text Compression 9803
49.6 MB/sec
 
Image Compression 10519
497.6 Mpixels/sec
 
Navigation 4366
12.3 MTE/sec
 
HTML5 8734
10.3 MElements/sec
 
SQLite 9618
3.01 Mrows/sec
 
PDF Rendering 8649
469.4 Mpixels/sec
 
Text Rendering 6824
2.12 MB/sec
 
Clang 10724
83.6 Klines/sec
 
Camera 8659
100.4 images/sec
 
N-Body Physics 8491
10.6 Mpairs/sec
 
Rigid Body Physics 13668
84679.6 FPS
 
Gaussian Blur 8072
443.7 Mpixels/sec
 
Face Detection 9988
76.9 images/sec
 
Horizon Detection 9103
224.4 Mpixels/sec
 
Image Inpainting 13236
649.3 Mpixels/sec
 
HDR 18669
254.4 Mpixels/sec
 
Ray Tracing 14997
12.0 Mpixels/sec
 
Structure from Motion 9591
85.9 Kpixels/sec
 
Speech Recognition 5820
186.1 Words/sec
 
Machine Learning 3263
126.1 images/sec
 

 

Custom room treatments for headphone users.

Link to comment

Dell Inc. Precision 5820 Tower

170911
CUDA Score
Geekbench 5.1.0 Tryout for Linux x86 (64-bit)

Result Information

User jabbr
Upload Date April 15 2020 02:39 PM
Views 2

System Information

System Information
Operating System Ubuntu 18.04.4 LTS 5.3.0-46-generic x86_64
Model Dell Inc. Precision 5820 Tower
Motherboard Dell Inc. 06JWJY
Processor Information
Name Intel Xeon W-2245
Topology 1 Processor, 8 Cores, 16 Threads
Identifier GenuineIntel Family 6 Model 85 Stepping 7
Base Frequency 4.70 GHz
L1 Instruction Cache 32.0 KB x 8
L1 Data Cache 32.0 KB x 8
L2 Cache 1.00 MB x 8
L3 Cache 16.5 MB x 1
CUDA Information  
Device Name GeForce RTX 2080 Ti
Compute Capability Memory 7.5
Maximum Frequency 1.64 GHz
Multiprocessor Count 68
Maximum Threads Per Multiprocessor 1024
Device Memory 10.7 GB 7.00 GHz

CUDA Performance

CUDA Score 170911  
Sobel 197778
51.2 Gpixels/sec
 
Canny 99288
6.21 Gpixels/sec
 
Stereo Matching 560206
792.3 Gpixels/sec
 
Histogram Equalization 125229
22.1 Gpixels/sec
 
Gaussian Blur 179393
9.86 Gpixels/sec
 
Depth of Field 623547
7.23 Gpixels/sec
 
Face Detection 36686
282.5 images/sec
 
Horizon Detection 157285
3.88 Gpixels/sec
 
Feature Matching 58696
1.21 Gpixels/sec
 
Particle Physics 685846
18268.1 FPS
 
SFFT 101541
1.40 Tflops
 

 

 

Custom room treatments for headphone users.

Link to comment

To give an idea about processing needs comparison:

 

When running Roon upsampling to DSD512 from 24_96 material:

top - 12:18:30 up  2:19,  1 user,  load average: 2.42, 2.31, 2.33
Tasks: 440 total,   2 running, 271 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.0 us,  0.8 sy,  0.0 ni, 88.2 id,  4.7 wa,  0.0 hi,  0.4 si,  0.0 st
KiB Mem : 32563588 total,   295900 free,  3584900 used, 28682788 buff/cache
KiB Swap:  2097148 total,  2093564 free,     3584 used. 28452876 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                     
11225 root      20   0 7548264 1.776g 150132 S 107.0  5.7  46:53.18 RoonAppliance                                                               
 1672 root      20   0       0      0      0 R   3.6  0.0   2:24.89 cifsd                                                                       
13248 jon       20   0   51476   4168   3324 R   1.0  0.0   0:00.17 top                                                                       
   47 root      rt   0       0      0      0 S   0.3  0.0   0:00.66 migration/6                                                                 
  223 root      20   0       0      0      0 S   0.3  0.0   0:09.82 kswapd0    

 

Custom room treatments for headphone users.

Link to comment

With 16/44.1 -> DSD512:


  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                    
11225 root      20   0 7944056 2.320g 239944 S  47.5  7.5  75:05.66 RoonAppliance                                 
 1672 root      20   0       0      0      0 S   7.6  0.0   4:08.97 cifsd 

and two cores at 4.5Ghz

jon@jon-w2245:~$ ./cpuspeeds.sh
Processor:  0  Mhz:  1200.020
Processor:  1  Mhz:  1200.009
Processor:  2  Mhz:  2206.086
Processor:  3  Mhz:  4499.161
Processor:  4  Mhz:  1200.003
Processor:  5  Mhz:  1200.002
Processor:  6  Mhz:  1754.388
Processor:  7  Mhz:  1200.003
Processor:  8  Mhz:  1200.021
Processor:  9  Mhz:  1200.012
Processor:  10  Mhz:  2253.244
Processor:  11  Mhz:  4483.826
Processor:  12  Mhz:  1200.046
Processor:  13  Mhz:  1200.005
Processor:  14  Mhz:  2022.015
Processor:  15  Mhz:  1200.012
CPU model:  Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz
1 CPU,  8 physical cores per CPU, total 16 logical CPU units

 

Custom room treatments for headphone users.

Link to comment
2 hours ago, Solstice380 said:

I'm not sure how to compare OpenCL to CUDA, though.  I understand that OpenCL code runs faster.  For my use case I'm extremely happy.  And @Jussi keeps killing us... DSD512 with EC modulators in realtime probably won't happen in my lifetime! 😂  2.5X the 9900KS.... ouch.


Geekbench is just a number. For compute you want CUDA because that’s what HQP  uses (not  OpenCL). I think @Miska’s idea of looking at the time for Pro to encode a song / the song real time is a great measurement of the system’s performance. Yeah EC/DSD512 not anytime soon. 


The question is whether AVX512 @ 3.9 GHz is better than AVX2 @ 5 - 5.3 GHz. 

Custom room treatments for headphone users.

Link to comment

My main reason to go with new Mellanox cards was RDMA and NIC offload ... used with stuff other than HQPlayer ... so I'm running some benchmarks ... first with HQPe and Roon on same machine (note that xtr-mp/EC/DSD256 runs fine with Roon on same machine:

top - 13:36:24 up 13:47,  1 user,  load average: 1.99, 0.89, 0.55
Tasks: 426 total,   1 running, 281 sleeping,   0 stopped,   0 zombie
%Cpu(s): 17.6 us,  0.1 sy,  0.0 ni, 82.1 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem : 32563588 total, 11880460 free,  6129732 used, 14553396 buff/cache
KiB Swap:  2097148 total,  2092284 free,     4864 used. 25912360 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                     
 1484 hqplayer  10 -10 9691188 738024 178712 S 245.7  2.3 169:29.35 hqplayerd                                                                  
 1743 root      20   0 11.189g 3.674g 423404 S  36.8 11.8 344:52.58 RoonAppliance                                                               
   11 root      20   0       0      0      0 I   0.3  0.0   0:20.85 rcu_sched                                                                   
 3326 root      20   0       0      0      0 S   0.3  0.0   2:18.68 cifsd

We see that cifsd has dropped to 0.3%, now lets move Roon to another server:

... except that roonlabs.com is down so hold that thought ...

Custom room treatments for headphone users.

Link to comment
On 4/15/2020 at 4:36 PM, Miska said:

 

This was a test build on Windows, so some production build may differ a little. I've already put a lot of effort to make EC modulators run like they now do. I'm not expecting any big gains on this front. No thermal throttling. Maybe @jabbr can test at some point how it runs with AVX512 on his new Xeon. Larger and faster cache may help too. But large part of the limitation is just how many instructions a CPU core can execute within available number of clock cycles.

 

 

I just tried a few settings. Both poly-sinc-ext2 and xtr-mp with ASDM7EC took almost 2x (a few seconds shy of 2x), with or without CUDA. CUDA with -mp does significantly decrease CPU usage. In one case Pro used 640% CPU (CUDA off), with all cores active.

 

I didn't spend a great deal of time optimizing anything but my results seem roughly the same as yours.

Custom room treatments for headphone users.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...