Miska Posted April 5, 2020 Share Posted April 5, 2020 Yeah, you get decently high GPU load on RTX when you do for example 48k to 44.1 x 512 with poly-sinc-xtr. Alternatively, if you want to do 8 channels with a regular filter to DSD256 using the EC modulators, you end up pretty high load on both GPU and CPU. Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 8, 2020 Share Posted April 8, 2020 32 minutes ago, jabbr said: Yes that's the eventual idea. Any consideration of providing for >1 DAC with multichannel, I am sure you are considering physically separate NAA/DACs. No, NAA is DAC-side clocked so only one DAC for multichannel. No synchronization for multiple DACs on purpose, because you would have multiple clocks in such case. Very few audiophile DACs support external clock synchronization anyway. With exaSound you can get 8 channels of 384/32 PCM and DSD256. With Merging Hapi/Horus you can get 384 PCM and DSD256 multichannel as well. 2 hours ago, jabbr said: I am waiting for my replacement NIC but I suspect that even with RDMA offload, the CPU load will not change appreciably - I think I'm very close to ASDM7EC / 512 but as they say, close only counts for ... Interestingly the CPU bursts ( $ lscpu ) to 4.5 Ghz with AMSDM7 but only to 3.9 Ghz with ASDM7EC so I think there's hope of tweaking something You could run null output test and see what kind of processing time you get for a track vs track length. So far highest published boost is the new flagship laptop CPU i9-10980HK at 5.3 GHz, but it is only single core boost. For this case you would need at least two cores boosted and I don't know how much boost that CPU can have with two cores. i9-9900KS can do all-core boost at 5 GHz, but it is not enough for ASDM7EC at DSD512. Will be interesting to see how high clocks there will be for the 10th gen desktop CPUs. Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 8, 2020 Share Posted April 8, 2020 27 minutes ago, jabbr said: True but W2245 has AVX512. Its curious that ASDM7EC is not able to boost c/w AMSDM7 (3.9 vs 4.5). W2245 should be able to multicore boost. I may need to look at each core separately -- htop shows % but not per core clock rate. Cores are boosted individually, you can check /proc/cpuinfo to see what clocks each core have. But only testing will show how each CPU model performs at such tasks. Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 9, 2020 Share Posted April 9, 2020 2 hours ago, jabbr said: I created an AWK to capture the /proc/cpuinfo output: In both cases output 44.1x512 First from the AMSDM7: Note that 4 threads are boosting to 4.5 Ghz Processor: 0 Mhz: 2943.395 Processor: 1 Mhz: 1200.102 Processor: 2 Mhz: 2802.146 Processor: 3 Mhz: 1200.199 Processor: 4 Mhz: 4499.999 Processor: 5 Mhz: 1200.046 Processor: 6 Mhz: 1201.438 Processor: 7 Mhz: 4561.842 Processor: 8 Mhz: 2436.413 Processor: 9 Mhz: 1201.864 Processor: 10 Mhz: 2671.769 Processor: 11 Mhz: 1200.799 Processor: 12 Mhz: 4492.542 Processor: 13 Mhz: 1200.343 Processor: 14 Mhz: 1200.020 Processor: 15 Mhz: 4514.522 CPU model: Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz 1 CPU, 8 physical cores per CPU, total 16 logical CPU units Now with ASDM7EC: Note that these 4 threads stay at 3.9 Ghz Processor: 0 Mhz: 3965.971 Processor: 1 Mhz: 1200.031 Processor: 2 Mhz: 1200.164 Processor: 3 Mhz: 1265.660 Processor: 4 Mhz: 3965.246 Processor: 5 Mhz: 1201.363 Processor: 6 Mhz: 1201.073 Processor: 7 Mhz: 3076.107 Processor: 8 Mhz: 3900.650 Processor: 9 Mhz: 1200.346 Processor: 10 Mhz: 1201.619 Processor: 11 Mhz: 1270.791 Processor: 12 Mhz: 3899.409 Processor: 13 Mhz: 1200.285 Processor: 14 Mhz: 1200.009 Processor: 15 Mhz: 3145.273 CPU model: Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz 1 CPU, 8 physical cores per CPU, total 16 logical CPU units Hard to say, but it could be AVX512 capping the clocks, because ASDM7EC puts more load on it. Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 9, 2020 Share Posted April 9, 2020 1 minute ago, jabbr said: Yes as we've discussed. Is there a way to send more cycles to CUDA which isn't getting used too much? Perhaps a "prefer-cuda" flag? That might just enable 44.1x512? Modulators cannot be run on GPU because of the mathematical structure they would be badly sub-optimal there. You can only run filters and convolution engine there. jabbr 1 Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 9, 2020 Share Posted April 9, 2020 8 hours ago, jabbr said: https://lemire.me/blog/2018/08/15/the-dangers-of-avx-512-throttling-a-3-impact/ Tricky part is that things depend on CPU model and workload. Newer CPUs likely throttle less than previous generations. But I would conclude that likely the higher AVX-512 usage limits clocks to base frequency in this case. On 4/4/2020 at 5:04 PM, jabbr said: 8GB 1x8GB DDR4 2933MHz RDIMM ECC Memory Btw, note that for full memory speed on a quad-channel CPU you need four DIMMs... Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 10, 2020 Share Posted April 10, 2020 If you keep filters offloaded to the GPU, that leaves you least amount of load on the CPU to minimize throttling when running the intensive EC modulators. So that's the way to gain highest CPU clocks (also on other models). Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 13, 2020 Share Posted April 13, 2020 6 hours ago, jabbr said: When I try to run at DSD512, the sound is on for a second and then off for a second and repeat. The CPU utilization doesn't get >70% When deadlines are systematically missed, the whole process goes into spring like motion that you may know from rush hour traffic jams where queue of cars end up in such motion of acceleration and deceleration. Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 13, 2020 Share Posted April 13, 2020 32 minutes ago, The Computer Audiophile said: @Miska Have you been able to get the EC modulator working at DSD512 without hiccups? No, I'm not aware of any CPU that would be able to do it in realtime. Only offline conversion with HQPlayer 4 Pro. I hope development of CPUs will eventually make it possible in two years or something. Intel's 10th gen mobile CPUs can already boost to 5.3 GHz, so we'll see how the desktop CPUs will look like. Even though i9-9900KS can run on all-core boost of 5 GHz it is still not enough. But you get the picture if you look at the highest per core loads at DSD256 and then multiply that by two to go to DSD512. The Computer Audiophile 1 Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 13, 2020 Share Posted April 13, 2020 35 minutes ago, jabbr said: But look at what we can do with this approach! Moreover you can buy this prepackaged from Dell ... the workstation is surprisingly quiet, I installed my own GPU, RAM and NIC the base price ~$1600 is very reasonable for what you get. Workstations designed for high CPU loads are pretty quiet. My old HP Z440 Xeon workstation is also very quiet and it doesn't change depending on load, it is as quiet even at constant full load. So I would again like to go with a new HP Z4 or Z6 series. If not possible, I'd build a new workstation myself. But for me, for example the three year next business day on-site warranty was important factor on choice. Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 13, 2020 Share Posted April 13, 2020 11 hours ago, The Computer Audiophile said: This is interesting. I'm trying ASDM7EC and my old school Xeon E3-1241 v3 CPU doesn't get above 40-50%, yet the audio constantly stutters. My Xeon E5-1620v3 cannot do ASDM7EC even at DSD256. DSD128 is max. That's why I need a new one with W2245. The Computer Audiophile 1 Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 15, 2020 Share Posted April 15, 2020 Yesterday I tested on Windows using the free running mode how long it takes to process a track on i9-9900KS with poly-sinc-ext2 and ASDM7EC to DSD512. Result for 3:32 long track was 6:23. So almost 2x the track's playback time at processing speed of 0.55x. Solstice380 1 Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 15, 2020 Share Posted April 15, 2020 44 minutes ago, StreamFidelity said: What does that mean? That processing runs at full speed and all the output is just thrown away. It is sort of inherited feature from HQPlayer 4 Pro where it lets the conversion run to output file at full speed, instead of waiting for the audible output. 44 minutes ago, StreamFidelity said: So you have to wait six minutes for a three-minute song? If the song is loaded into the buffer or how does it work? Yes... Not sure I understand the question about buffer part. 44 minutes ago, StreamFidelity said: What is the CPU usage in %, how many cores (Hyperthreading?) and what clock rate? About 20%, mostly two cores are fully loaded. CPU runs at constant 5 GHz. So to run this in realtime, you'd need to have something that is about 2x faster per-core than i9-9900KS. StreamFidelity 1 Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 15, 2020 Share Posted April 15, 2020 31 minutes ago, Sagittarius said: I am curious though why the processing speed was only 0.55x. Cache not being able to feed the rest of the processor fast enough? ..other thread dependencies? ..thermal throttling? Something that can be addressed in future versions of HQ player? This was a test build on Windows, so some production build may differ a little. I've already put a lot of effort to make EC modulators run like they now do. I'm not expecting any big gains on this front. No thermal throttling. Maybe @jabbr can test at some point how it runs with AVX512 on his new Xeon. Larger and faster cache may help too. But large part of the limitation is just how many instructions a CPU core can execute within available number of clock cycles. Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 16, 2020 Share Posted April 16, 2020 7 hours ago, Solstice380 said: @jabbr I ran the GeekBench benchmark so I could compare. i9-9900KS at 5GHz Asus RTX2080 Advanced I'm not sure how to compare OpenCL to CUDA, though. I understand that OpenCL code runs faster. For my use case I'm extremely happy. And @Jussi keeps killing us... DSD512 with EC modulators in realtime probably won't happen in my lifetime! 😂 2.5X the 9900KS.... ouch. I'm pretty sure those figures are not for the same algorithm and using double-precision 64-bit floating point. OpenCL is not very nice unless you want to do graphics. For DSD512 with EC look for a CPU that has double the single-core score and that could be potential... Solstice380 1 Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 16, 2020 Share Posted April 16, 2020 5 hours ago, jabbr said: Geekbench is just a number. For compute you want CUDA because that’s what HQP uses (not OpenCL). I think @Miska’s idea of looking at the time for Pro to encode a song / the song real time is a great measurement of the system’s performance. Yeah EC/DSD512 not anytime soon. The question is whether AVX512 @ 3.9 GHz is better than AVX2 @ 5 - 5.3 GHz. HQPlayer 4 Desktop has the benchmark feature. Although I accidentally broke it with the latest start/stop change (already fixed for next release). But with a releases before that change it should work. Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 16, 2020 Share Posted April 16, 2020 3 hours ago, TubeLover said: Why an NVIDIA RTX 2080 GPU in a music server? Because it provides nice amount of extra processing power, and is more powerful in some tasks than a CPU. But primarily it can free up CPU resources for other tasks. Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 16, 2020 Share Posted April 16, 2020 8 minutes ago, Solstice380 said: I just ran their GPU Test and it returned the OpenCL result. I was curious why @jabbr got a CUDA result and I got the OpenCL. Difference between my GPU and the Ti? Or? Are you on Nvidia graphics? Since CUDA is Nvidia-only... Also CUDA needs new enough graphics driver version, which cannot be older than the SDK used to build the software. Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 16, 2020 Share Posted April 16, 2020 55 minutes ago, Solstice380 said: I installed the latest driver from Nvidia: nvidia 442.59-desktop-win10-64bit-international-dch-whql dated 3/4/20. That should be enough for all current CUDA stuff. Solstice380 1 Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 16, 2020 Share Posted April 16, 2020 Killer NICs that for example Gigabyte puts on some motherboards are also good for offloads. They have pretty much entire TCP/IP stack on the NIC. Linux support is not great for all models, but the models I have work on recent Linux kernels. These are just more gaming oriented NICs aiming for minimizing latencies. https://www.killernetworking.com Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 16, 2020 Share Posted April 16, 2020 5 minutes ago, The Computer Audiophile said: Wow, I haven't had a Killer NIC for many years. They seem cool, but many people are suspect. Hey, that's like audio :~) I have couple of Gigabyte motherboards that have those, and the DAC UP USB ports. Now the latest Gigabyte Z390 Designare motherboard I have for the 9900KS has just two Intel NICs (different models actually), but still comes with their newest DAC-UP2 USB ports. Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 16, 2020 Share Posted April 16, 2020 24 minutes ago, jabbr said: I just tried a few settings. Both poly-sinc-ext2 and xtr-mp with ASDM7EC took almost 2x (a few seconds shy of 2x), with or without CUDA. CUDA with -mp does significantly decrease CPU usage. In one case Pro used 640% CPU (CUDA off), with all cores active. Yes, filters are not usually the limiting factor, but instead EC modulators are the ones that set the pace. This can be different with non-EC normal modulators and heavier filters. But if the speed is close to similar, it means that AVX512 offsets the lower clock speed enough to get close on-par to 9900KS running at 5 GHz on per-core speeds in this case. So at least no loss in that sense, and complex filters may be running faster than 9900KS. P.S. I just checked that equivalent new HP workstation (Z4) to my current one, with W-2245 costs about 3200€ here (32 GB RAM, 1 TB SSD). jabbr 1 Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 17, 2020 Share Posted April 17, 2020 2 hours ago, TubeLover said: Even considering the current Dell sale, I specced out a workstation matching the one he JABBER just purchased, added in the cost of the his additional RAM, the pricey NIC card, and the most reasonably priced 2080 that I could find and it was over $3k. I was considering this as a possible path to allow me to run Roon and HQ Player in conjunction, at DSD256, and with filters in hopes of seeing the very significant sonic upgrade Chris mentioned in his RAAL SR1a discussion. This is, however, a pretty costly endeavor. And how long would a server pc build like that be viable to able to run HQ Player optimally as described above? Much less at DSD512. You can get quite a bit cheaper with i9-9900K and RTX2080. But you can get also started with just i9-9900K and add the RTX2080 later, it has built-in graphics so you get display output without a separate card, although not any offloads from HQPlayer. Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 17, 2020 Share Posted April 17, 2020 25 minutes ago, jabbr said: @luisma it would be great to run @Miska's test of looking at how long HQ Pro takes to encode ASDM7EC on a Ryzen because that looks to me to be the best overall test of the system performance. You can do the same test with Desktop v4 too, it is just broken in current release due to regression. I fixed that and improved the time display to show m:ss.sss instead of just plain seconds. You can also do similar test with Embedded, but there's currently no display for the timing data, although HQPlayer engine has it. Maybe I need to add this same functionality to the web interface. Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Miska Posted April 17, 2020 Share Posted April 17, 2020 4 hours ago, TubeLover said: Thanks for the input Miska. What strategy would you recommend to manage to keep things "quite a bit cheaper with i9-9900K and RTX2080". Thanks. I would first get a type of server I like with 9900K, that has technical space and possibility to add something like RTX2080 later. And then see if I would want to eventually upgrade the machine with such GPU. Essentially that means that the case is large enough and has suitable PSU for the purpose. You would likely want to have a quiet machine too. The kind of approach to take depends if you'd like to build one yourself, get someone build one for you (a small company or such), or get one from the big vendors. Signalyst - Developer of HQPlayer Pulse & Fidelity - Software Defined Amplifiers Link to comment
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now