Jump to content
IGNORED

HQPlayer4 EC modulator tips and techniques


ted_b

Recommended Posts

16 hours ago, shadowlight said:

My setup is Dell Server with quad Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz six core server with 64GB of memory, running Ubuntu Bionic LTS with latest patches and latest version of embedded.  The result of top has cpu usage value around 1000 when using EC modulator and around 400 when using ASDM7.  I think I still have plenty of CPU cycles available, since 100% load across all CPU would be 2400 so I am currently under 50%.

@Miska

is there a way to optimize CPU load under embedded?

 

If you have 4 sockets of E5-2643 (quad core), that means total of 16 cores. So 1600 would be full load on all cores, not including threads.

 

However, if some of the cores reach close to 100% load individually, you are close to maximum and get capped by clock frequency of a single core. For EC modulators I would say you need at least four cores running minimum of 4 GHz, plus what ever is needed to run the filters (if not offloaded to GPU).

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
20 hours ago, shadowlight said:

The server in question has 6 cores per cpu - E5-2643 v2.  I will look into seeing if there is something that I can do at the Bios level.  Can I do anything similar @ OS / Embedded level.  Multicore is currently set @ auto.

 

From hqplayerd log.

 

2019/08/11 23:09:52 Parallel threads: 24
2019/08/11 23:09:52 Nested parallelism: 4
2019/08/11 23:09:52 Parallel pipelines: 4

 

Ahh, yes, v2 is 6-core. What does the log say about number of CPU cores? Is it correct for total?

 

For filters, this is a very good setup with high memory bandwidth for each CPU. I could optimize such configurations further and hopefully will do so in near future.

 

Challenge with modulators is that there's a limit how much the work can be parallelized, rest relies on raw core performance. Multi-socket helps on that, since each socket can have fewer cores at higher clock speeds.

 

 I will continue trying to optimize the new modulators to run on wider variety of hardware.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
15 minutes ago, k6davis said:

The forthcoming AMD Ryzen 9 3950X may present an opportunity for the best HQP performance yet, with 16 cores and 32 threads and the largest ever cache. But all of those cores run slightly slower than the cores of my i7-9700K. 

 

It will be very interesting to see if it's capable of anything that the i7-9700K is not regarding HQP EC. 

 

I will certainly try the new AMD chip when it becomes available here. Large cache should notably help with filters.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
4 minutes ago, shadowlight said:

I will take a look at it tonight.  Is there a quick way to identify how many cores HQPd is seeing from the logs?

 

Yes, quite early on you have this kind of report:

  2019/07/24 23:48:23 Number of processor cores: 4
  2019/07/24 23:48:23 DSP thread pools disabled
  2019/07/24 23:48:23 DSP pipelines disabled
  2019/07/24 23:48:23 Pipelined DSP engine enabled

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
On 8/11/2019 at 11:09 PM, Xoverman said:

I can run EC filters to DSD256 on 4core CPU @4Ghz, do you think it would be possible to run DSD512 on 8 or 12core CPU @4GHz. Or doesn't it scale that whel? 

 

You'll need to scale clock speed instead of cores. That's what makes it tricky...

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
5 minutes ago, k6davis said:

So AMD is once again providing more cores, but less speed. The large cache will help with filters, but not (necessarily) with modulators. Just guessing, but It seems that even though the Ryzen 9 3950X will be a beast of a chip, it will not yield any better HQP EC performance than what we have now. 

 

Of course we won't know until we know, but I'm lowering my expectations for the new AMD CPU. The good news is that EC DSD256 is easy to attain and sounds great!

 

Yes and no. Previously question has been what is needed to run some single stage filters. Now it is more what is needed to run the new modulators. These two have pretty much opposing needs. Since it is hard to have let's say both AMD and Intel CPU in the same machine. It is easier to match modulator needs with CPU and filter needs with GPU.... This has been the case already before the EC modulators with DSD1024. Now with new modulators it is emphasized with lower DSD rates as well.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
20 hours ago, randytsuch said:

This benchmark rates cuda performance for many different video cards, and its the only cuda benchmark I've found, although I didn't spend a lot of time looking.

 

Problem is that it doesn't tell what kind of workload it is and for what kind of data format. This is similar to CPU benchmarks where it is hard to find representative benchmark.

 

You get some kind of idea from CUDA-Z. You can look here for some figures:

https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units

Look at the "Processing power (GFLOPS)" column and sub-column "Double precision" for some indicative figures. However, this is not directly related to HQPlayer performance and can be grossly misleading. For example for HQPlayer case RTX 2080 is much faster than GTX 1080, but the raw figures don't suggest such a large difference.

 

20 hours ago, randytsuch said:

It would be great to get a benchmark for hqp, but since no one running hqp has all the different graphics cards, I'm not really sure what you're suggesting.

 

This could be done in a contributory way, similar to many benchmark results. Everybody contributes one result...

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
10 hours ago, asdf1000 said:

As the i7-8086K isn't available anymore, what would be your personal recommendation for a new build, to do ASDM7EC, DSD256, ext2, on Ubuntu Embedded?

 

I understand your i7-8086K has max speed 5GHz, as does i9-9900K. Plus the i9-9900K has more cache which may help?

 

Assume of course proper cooling, to allow max turbo speed.

 

I was between i7-8086K and i9-9900K and picked the former because it had higher base frequency to guarantee enough clock rate. Got the last one in local dealer's stock. But likely i9-9900K is as good as well.

 

Temps don't seem to be a problem here, being far from limits. Likely one reason is that I don't use the integrated GPU for anything, so it doesn't consume any of the TDP either.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
12 hours ago, ted_b said:

What if I'm running multichannel into an NAA (to an exaSound)?  Will the 9900k be better or will I simply need a GPU to offload/

 

You need at least as many CPU cores as you have channels. So 9900K is just enough. If you run filters on the CPU it is better to have double the number of cores. So best option is probably 9900K + powerful enough Nvidia GPU.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
3 hours ago, k6davis said:

The 9900K and 9700K both have 8 cores.

 

One has HyperThreading and a bit more boost clock. I don't know how much it makes difference in practice, my guesstimate is around 10 - 15%. Typically total number of threads with HQPlayer and the OS is some hundreds at least. If you run Roon on the same machine, it matters much more than if you run HQPlayer OS. Not very straightforward to give a simple rule/figure.

 

Roughly speaking, for HQPlayer most optimal would be number of channels + 2 when using GPU and at least number of channels x2 + 2 without GPU. That's how I picked i7-8086K (for Holo Spring 2). But there are other aspects as well, if you run filters on CPU, amount and speed of cache matters a lot.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
6 hours ago, dminches said:

What’s the minimum video card I should get for CUDA offload?  My processor is an i7-9700k. 

 

Rough rule of thumb is not get anything cheaper than the CPU alone. With that CPU, I would get started with RTX 2080 (Ti/Super).

 

Point is that GPU should be fast enough that CPU doesn't have to wait for GPU to complete it's part of the job...

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
2 hours ago, lmitche said:

A i7-9700kf or i9-9900kf has the advantage of no gpu. I haven't heard either, but if it's anything like what I am hearing with a Ryzen with no gpu, it's a very worthwhile upgrade.

 

I think the die is just the same, but GPU fused out. Way to increase overall yields by using dies that have a faulty GPU. So instead of trashing those, they can still sell the chips.

 

Probably power management can reach the same result by powering down the GPU on normal K.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
21 hours ago, lmitche said:

Perhaps one could power down the Intel based GPU with a boot script and install another GPU when needed.

 

It should fall asleep automatically when there's no display connected or the display is entered into DPMS powersave mode. Unless you actually run something on the GPU. You can tell kernel to redirect console somewhere else, like a serial port.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
22 hours ago, lmitche said:

10 feet from the DAC connected via 5ghz WiFi to the endpoint.

 

There is something wrong with the DAC then, or you are having pollution through power lines. If you use NAA, it certainly doesn't matter what kind of things you have in the HQPlayer server. And if that's the case, then any other computer unrelated to HQPlayer would also make the same difference.

 

In my case I have multiple computers around (only one or two playing audio though) with and without GPU, and certainly the computers don't matter when playing through a NAA... Audio computers are behind their own power line filter and audio equipment behind another one. Half of the computers and networking gear are behind multiple UPS' (I want to keep my servers running even during power outage).

 

Most of my development is done using a Xeon workstation with Nvidia 10-series GPU (to be upgraded to 20-series soon), and it has both Holo Spring 1 and RME ADI-2 Pro connected directly through USB. That's how 90% of my listening during development is done. Works very well...

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
4 hours ago, lmitche said:

The SQ here is amazing, and simple and obvious tweaks like removing a GPU continue to enhance the experience.

 

My take is that GPU can enhance the experience by allowing more algorithms and digital room correction. If you can run what you want without, then it is also fine.

 

5 hours ago, lmitche said:

My expectation was you would be indifferent about the KF version for hqplayer upsampling given the similar performance as the K series. If one uses a Nvidia GPU for upsampling then the k series cpu is redundant, and with the KF performance is the same. A low power pcie based GPU can be had for $30 for setup and removed for playback for those not interested in a coprocessor.

 

KF version is certainly fine from technical perspective, but since it costs the same as K (and very likely is the same silicon) you get less for the same price. Power consumption / heat production of CPUs with GPU is usually less in headless HQPlayer case than with CPUs of same TDP but without GPU. For KF it likely doesn't make a difference though. But for something like i7/i9 vs Ryzen 7/9 it seems to be the case.

 

When the GPU is redundant, there are also Intel options that really don't have a GPU, like X-series and most Xeon models. For example I have the Windows 10 server with i7-6950X + GTX1080 combo. It has the T+A DAC8 DSD connected directly, but main point of that server is to provide 8 channel DSD256 output to exaSound and Merging DACs.

 

Probably Ryzen 3000 series improves power efficiency (GFLOPS per Watt) getting it closer to Intel. Nvidia is already quite good on that, but more limited in flexibility.

 

But you could also look it from another perspective. Let's say you run HQPlayer Desktop and would also want to do something else with the same computer (browsing, using Roon, what ever) and don't need very fast GPU for graphics. Then CPUs with integrated GPU likely give lowest noise and power instead of getting a cheap $30 separate card for display output. I also have for example cheap passive cooled GeForce GT710 card for the purpose you describe (mainly for AMD machines), but leaving it in would likely make the combo noisier or at least as noisy as having CPU with integrated GPU.

 

5 hours ago, lmitche said:

If the SQ is improved with KF, what's the big deal?

 

I just don't think it makes difference for sound quality, especially given that all the co-electronics is still there (display interface hardware) on the motherboard. So LGA2066 or LGA3647 solutions could be better in such case.

 

For me, main put off with KF is the pricing, you essentially pay the same price to get a piece of faulty silicon. Not necessarily a big deal though!

 

For example I'm wondering how something like i7-9800X would perform, a bit more base clock and twice as much memory bandwidth than i9-9900K.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
1 minute ago, bodiebill said:

Glass fiber cable for Ethernet is also a good option I guess?

 

Yes of course! WiFi also clears up ground current problems. Point in avoiding shielded cables is to keep the isolation so that you don't have dirty ground currents flowing between devices. It could still happen through other paths, such as earthed mains sockets (through earth connection), but it is less likely and in that domain bigger problem would be usually other household appliances and neighborhood.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
4 hours ago, Outlaw said:

Have a 8086K CPU.Can run all filters in Linux or Windows.In embedded can't run the adm7ec filter without stuttering.

 

I have the same CPU and Ubuntu Server 16.04 LTS on it, and I'm using it with ASDM7EC + poly-sinc-ext2 just fine to DSD256 (Holo Spring 2).

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...