Jump to content
IGNORED

Design a PC/Server for ROON and HQ Player


sgr

Recommended Posts

  • 4 weeks later...

So I'm trying to get to DSD512 with the poly-sinc-mp filters and my current system does -2s well but hiccups at poly-sinc-mp. Not sure that its a pure CPU issue because my system loads to 32% (6x 3.4Ghz 4930k processor). The GTX 760 doesn't seem to help.

 

So:

i7-6700k (4.0) vs.

E3-1275 V5 (3.6)

vs.???

 

adding a GTX 1080 I'm assuming this will do DSD512 poly-sinc-mp fine ...

any idea about DSD1024? ...just because ... :cool:

Custom room treatments for headphone users.

Link to comment
Please check out the per-core load graphs in Resource Monitor. If any one of the cores is maxed out, then hiccups are likely to occur. Also note that for CPUs with HyperThreading, the CPU is practically fully loaded at 50% total load, because the virtual second CPU's from HT don't really help much with HQPlayer type of work loads.

 

Here are per core loads:73e6618b56c59e4d973af1c80663b55b.jpg

 

No real difference between short-mp and short-mp-2s but the former stutters and the latter works well?

Custom room treatments for headphone users.

Link to comment
  • 4 months later...
  • 3 months later...
  • 1 month later...

I've seen some presentations which look at GPU vs FPGA acceleration in terms of Flops/W which seem to favor FPGA -- In particular using the SOC approach combining CPU and FPGA on chip and C/C++ tools which allow moving parts of the code to the FPGA ease testing -- Have you looked at this for HQPlayer? -- the Altera design tools are $$$ but I've got Xilinx SDoC at a "special price" to try out for a year.

 

 

Room treatments for headphone users

Custom room treatments for headphone users.

Link to comment
I don't want to start implementing floating point unit(s) into an FPGA. It is huge amount of work, requires huge FPGA and ends up costing much more than equally or more capable CPU/GPU...Main problem with FPGAs is also that cheaper models don't support high clock speeds. HQPlayer needs clocks in GHz range so that there are enough clock cycles per input sample.Power consumption is not an issue, so the FLOPS/W ratings are meaningless. Only things that matter is how many GFLOPS (64- and 80-bit precision) there are and memory speed in GB/sec. Taking into account that HQPlayer requires at least 1 GB of RAM to run it's DSP engine.
Fair enough. I came across this recently:1) https://www.xilinx.com/support/documentation/product-briefs/rfsoc-ieee-paper.pdf2) https://www.xilinx.com/support/documentation/white_papers/wp489-rfsampling-solutions.pdfThese "RF" class hybrid CPU/FPGA's with 12-14bit 2 Ghz (multileaved 500 Mhz units) ADC/DACs look like they'd be good for something :) perhaps a digital oscilloscope ... and if you ever thought there might be a limit to upsampling frequencies ;) In any case power is always important because the long term cost of a solution needs to take into account the cost of the power used.Food for thought in any case

Custom room treatments for headphone users.

Link to comment
Seems to be for software defined radios mostly. Did you check how much those cost? Last time I checked possibly capable enough Xilinx Virtex-series FPGA costed $1k+. So if you are in market for $100k DACs, then that could be the way to go. :D

 

SFDR figures from the first PDF are not very promising for audio...

Yah. I've been playing around with the "regular" zynq's (e.g. picozed/snickerdoodle/parallela) which are more reasonably priced. Aiming at hitting DSD1024 x 32 channels direct DSD output from a 1Gbe input (that's essentially saturating the 1GbE link).

 

In any case these have the more pedestrian 1Msps XADC so perhaps something to fool around with. Looking at: http://www.analog.com/media/en/technical-documentation/technical-articles/not-your-grandfathers-adc-rf-sampling-adcs-offer-advantages-in-systems-design.pdf

Raises question as to whether decimation would improve SFDR enough to be interesting ... just a thought but not something I'm actively looking at (the XADCs are just sitting there on chip unused for me at the moment)

 

Regarding the Zynq, depending on performance, might go with zero-copy between the TCP input buffer and the FIFO sent to the PL, so the real consideration for me is integration with networkaudiod. In any case this is where the SDR techniques have popped up (reading tutorials etc).

Custom room treatments for headphone users.

Link to comment
It depends a bit, I didn't try that case yet with that particular hardware. But so far (on other machines) 44.1k PCM -> DSD512 with -2s filter has been roughly the same load as DSD64 -> DSD512.

...

 

That is my experience also. PCM 24/192 -> DSD512 not a problem ;)

 

OTOH DSD256 -> DSD512 chokes my aging system.

Custom room treatments for headphone users.

Link to comment
  • 2 years later...

Ok I'm going to post CPU usage as I upgrade my new server:

 

Starting with Dell T420 with a single 6 core 2.4 Ghz E5-2440 V2 and 48 Gb ECC DDR3 and stock 1 Gbe Ethernet (Broadcom)

HQPe -> NAA (Pro-Ject S2D)

Ubuntu 18.10

DSD512 upsampling (%cpu as reported by top): ASDM7

 

poly-sinc-short-mp-2s: 350-360%

poly-sinc-xtr-mp-2s: 370%

poly-sinc-xtr-mp: 460% (stutters)

Custom room treatments for headphone users.

Link to comment
Just now, ddetaey said:

Could you please enligthen me on this measurement?

 

How can a processor usage be higher than 100%?

 

It is a measure of virtual cpus, so 6 cores and 12 SMT. Rarely possible to get to all cores fully active given inefficiencies in software -- thats presumably why I am hearing stuttering at 450% usage (6 cores each 100% active would be 600%)

Custom room treatments for headphone users.

Link to comment

Now 2x E5-2470 v2 (20 cores) and 96 Gb ECC DDR3:

 

poly-sinc-xtr-short-mp-2s: 360-400%

poly-sinc-xtr-mp-2s: 420-450%

poly-sinc-xtr-mp: 1400% (and still some stutters!)

poly-sinc-xtr-mp: 900% (DSD256) and plays beautifully

 

(OMG the web interface on HQPe makes this so vastly easier to do)

Custom room treatments for headphone users.

Link to comment

Sat Mar 16 14:49:23 2019       

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 418.43       Driver Version: 418.43       CUDA Version: 10.1     |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|   0  GeForce RTX 208...  Off  | 00000000:0A:00.0 Off |                  N/A |

| 45%   68C    P2   172W / 260W |    349MiB / 10989MiB |     47%      Default |

+-------------------------------+----------------------+----------------------+

                                                                               

+-----------------------------------------------------------------------------+

| Processes:                                                       GPU Memory |

|  GPU       PID   Type   Process name                             Usage      |

|=============================================================================|

|    0     28775      C   /usr/bin/hqplayerd                           339MiB |

+-----------------------------------------------------------------------------+

Custom room treatments for headphone users.

Link to comment

above is for PCM source, with DSD64 source:

Sat Mar 16 14:55:18 2019       

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 418.43       Driver Version: 418.43       CUDA Version: 10.1     |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|   0  GeForce RTX 208...  Off  | 00000000:0A:00.0 Off |                  N/A |

| 40%   55C    P2    73W / 260W |    333MiB / 10989MiB |     10%      Default |

+-------------------------------+----------------------+----------------------+

                                                                               

+-----------------------------------------------------------------------------+

| Processes:                                                       GPU Memory |

|  GPU       PID   Type   Process name                             Usage      |

|=============================================================================|

|    0     28775      C   /usr/bin/hqplayerd                           323MiB |

+-----------------------------------------------------------------------------+

Custom room treatments for headphone users.

Link to comment
18 hours ago, Em2016 said:

 

The thousand dollar question that few of us can answer.... (at DSD512..) how does non-2s sound versus -2s?

 

Any different?

 

It’s at best subtle — I’m not going to give impressions until I optimize setup:

 

The filters (at least seem to) only affect PCM -> DSD conversion not DSD->DSD upsampling, as reflected by the 10% GPU usage with DSD64->DSD512 whereas 50% usage PCM->DSD512 and I’m getting some stuttering with HD PCM sources ... I need to do some work to optimize unless @Miska is able to shed some light...

 

(note that I’m using generic 4.18.0 kernel with Ubuntu 18.04.02 — yes I’ll be maximally compliant to reduce variables ;)

Custom room treatments for headphone users.

Link to comment
1 hour ago, Miska said:

 

I mean 44.1k source to 44.1k x512 with poly-sinc-xtr, so we can compare it with the other figures for 2080...

 

Was the first one for that case?

Yes DSD64 source -> 10-11%

PCM44,PCM96 -> 44-52%

Custom room treatments for headphone users.

Link to comment
3 hours ago, Miska said:

 

OK, so the Ti delivers pretty much the same performance ratio as the price ratio. About 1.5x price, about 2/3rds load. That is good since it means pretty exact bang for the buck.

 

 

Wow! When I upconvert 24/96 or 24/192 -> DSD512 (22.6Mhz) as opposed to to 24.6Mhz the GPU utilization drops from 50% or so down to 25-30% and the hiccups stop ... and in fact much much better than 44.1kHz -> which uses 45-50% (?)

 

So folks, this is why I'm not giving listening impressions right off the bat...

Custom room treatments for headphone users.

Link to comment
40 minutes ago, Jud said:

 

I'm using the low latency kernel with the Lubuntu variant (which I think may be just a bit lighter weight, unless you're using Ubuntu Server with Embedded), and my experience has been positive.

 

I use Ubuntu server with xorg— then I ssh in:

 

$ ssh -Y [email protected]

 

any GUI apps I occasionally need pop up on my laptop

 

The “fix” was to upsample not in the same family but to 22.6Mhz — I don’t understand why but @Miska said so ;) 

 

Now optimize everything else so I’m comparing +\- Roon, +\- NAS etc etc and will put low latency kernel in the mix ... but first need to determine where the delays are because if it’s done wrong, things get worse ... the best I can tell, when the GPU is st 25-30% load, no hiccups and then slight at 45-50% ... at the same time CPU goes from 200% to 300% (I have 20 cores so this is minimal) so the key us likely keeping the GPU well fed ;) 

Custom room treatments for headphone users.

Link to comment
2 hours ago, Miska said:

44.1k xN to 22.5792M and 48k xN to  24.576M should end up in practically same load.

 

They aren't. Not close.

 

44,88 -> 22.57 use 44-50% (GPU) and 300% CPU

44,88,96,192 -> 24.57 use 44-50%

 

96,192 -> 22.57 use 25% (sounds the best too)

 

Quote

 

The notable differences are:

1) 44.1k xN to 24.576M - not going to work

2) 48k xN to 22.5792M - works fine, a bit heavier than the simple cases, but not so much

 

 

2) seems to use both less CPU and GPU here... I am using 180-200% CPU for this

Custom room treatments for headphone users.

Link to comment

For reference with cuda="0"

------ 22.57 MHz ----

44.1k  -> 1200% cpu

88.2k -> 1200%

96k -> 1200%

192k -> 1300%

DSD64 -> 380%

------ 24.57 Mhz -----

44.1k -> 3900%

88.2k -> 3900%

96k -> 1345%

192k -> 1100%

DSD64 -> 450%

 

There may be an issues in the settings I applied with GPU so I will repeat

 

Custom room treatments for headphone users.

Link to comment

96 and 192k both use 50% GPU

 

I reported this incorrectly yesterday. I had "auto rate family" checked and was setting 22.57 as max rate, but it was dropping down to 12Mhz. When I set the rate to 22.57 and remeasure (properly) it uses 50%.

Custom room treatments for headphone users.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...