Jump to content
IGNORED

HQ Player


Recommended Posts

1 hour ago, Yviena said:

Yeah CUDA is active I can verify it by checking actual GPU usage, and it does go up when using xtr so it seems to be working, Nvidia driver is the 436,xx branch so newest one.

But reverting to 4.1.0.1 everything is fine, With 4.1.1 it stutters without touching anything regarding core allocation, I do see the CPU usage in 4.1.1 maxes out at 18% while older build was using 19-22%.

Maybe when you did your optimizations depending on which CPU you used, could have led to regressions on AMD if for example you used Intel, probably hard to check without acquiring a 3700x/3900x.

 

How does the core mask in log file look like for you?  Per core loads in resource monitor? Does it work any better without CUDA? Are you able to disable CPU's threading from BIOS settings and try if that makes any difference?

 

I'm trying to figure out which change could do it. Desktop 4.1.1 behaves exactly the same as Embedded 4.11.2 which seems to perform the best right now. While the Embedded that matched Desktop 4.1.0.1 didn't. So something is going different on Windows than on Linux. While it should behave exactly the same.

 

I will get the Ryzen 3950X, but it is not available yet...

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
6 hours ago, Miska said:

 

Strange... And I assume thus you don't have "CUDA offload" enabled in settings either? Just Multicore DSP set to "Auto" (grayed)? Maybe Windows is misreporting CPU topology or something, but your CPU is quite straightforward quad core without hyperthreading. Can you check HQPlayer log file and tell what "CPU core mask" is reported?

 

You should be getting higher load on cores 0 and 2 and somewhat lower load on cores 1 and 3.

 

 

No cuda offload; win10 1903; multicore dsp grayed or selected (same result); cpu core mask 00000000000000000000000000001111; load higher on cores 0 and 2, lower on 1 and 3. Total cpu load is not higher than using 4.1.0.1, but with 4.1.0.1 the load is equally distributed between the 4 cores. 

Maybe here is the problem: using 4.1.1, 2 cores are lightlier loaded and 2 cores are too much heavily loaded... So this could cause stuttering

Link to comment
45 minutes ago, Miska said:

 

How does the core mask in log file look like for you?  Per core loads in resource monitor? Does it work any better without CUDA? Are you able to disable CPU's threading from BIOS settings and try if that makes any difference?

 

I'm trying to figure out which change could do it. Desktop 4.1.1 behaves exactly the same as Embedded 4.11.2 which seems to perform the best right now. While the Embedded that matched Desktop 4.1.0.1 didn't. So something is going different on Windows than on Linux. While it should behave exactly the same.

 

I will get the Ryzen 3950X, but it is not available yet...

 

core mask is 00000000000000000101010101010101, per core loads in resource monitor is lower than previous build but no threads hits 100% they are all at max 90-92% with most under 80%.

 

It's the same without CUDA enabled still stutters, i will try to disable SMT but i doubt that will help.

 

I wonder if the cpu core allocation optimizations are to blame, could be that it works better on linux, and with your 8086k.

Link to comment
15 minutes ago, Yviena said:

I wonder if the cpu core allocation optimizations are to blame, could be that it works better on linux, and with your 8086k.

 

4.1.1 has been tested on my Xeon E5 under Linux, and iMac and Mac Mini on macOS (Mac version doesn't have core pinning because macOS doesn't support it). On Windows it has been tested on i7-7700K and i7-6950X. Matching Embedded version has been tested on i7-8086K, i7-7700K and i5-7600T. None of these have problems to do what they are capable of...

 

I can make a build that disables that part for Windows, if that seems to be the problem. But would be nice to figure out first if there's some logic why things don't work on some machines while they do work on others.

 

31 minutes ago, Luca72c said:

No cuda offload; win10 1903; multicore dsp grayed or selected (same result); cpu core mask 00000000000000000000000000001111; load higher on cores 0 and 2, lower on 1 and 3. Total cpu load is not higher than using 4.1.0.1, but with 4.1.0.1 the load is equally distributed between the 4 cores. 

Maybe here is the problem: using 4.1.1, 2 cores are lightlier loaded and 2 cores are too much heavily loaded... So this could cause stuttering

 

That looks correct, four cores and no hyperthreading. You can spread the load more by enforcing full multicore by checking the multicore DSP box, but I doubt it will help...

 

28 minutes ago, Yviena said:

core mask is 00000000000000000101010101010101, per core loads in resource monitor is lower than previous build but no threads hits 100% they are all at max 90-92% with most under 80%.

 

This also looks correct, eight cores with hyperthreading.

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment

Hi @Miska So it seems that AMD chips are not in favor with the new build :) (at least most of the regression messages come from AMD guys). I cannot get a stable playback without the CUDA offload now, either with Multicore set to "on" or "auto". (it was possible in the previous build with the same CPU setup - 8 cores and 4.1GHz overclock).

The playback settings are ASDM5EC into DSD256. Windows is latest build, power plan set to High Performance

 

Here's the core setup from log:

Number of processor cores: 8
Core mask: 00000000000000000101010101010101
Reserved cores: 2

 

multicore_auto.thumb.png.0f3106597daf94c135daa62164bc5d91.pngmulticore_on.thumb.png.0a9bd32b98a70cbb0df54942c4b9aff7.png

 

Attached are a task manager views for when the Multicore set to "on" - this is where the load is maxed out on two cores, and with "auto"

 

Edit: there is also a funny thing - even with CUDA offload HQP begin to stutter once there is no other load on the CPU, be it RDP session or Ryzen Master software. When something is running on one of the cores/threads besides the HQP there is no stutter. It's as if HQP needs to have some load on other cores (or maybe that's a Windows thing). Again, this was not the case with previous build. 

 

Link to comment
1 hour ago, Miska said:

 

4.1.1 has been tested on my Xeon E5 under Linux, and iMac and Mac Mini on macOS (Mac version doesn't have core pinning because macOS doesn't support it). On Windows it has been tested on i7-7700K and i7-6950X. Matching Embedded version has been tested on i7-8086K, i7-7700K and i5-7600T. None of these have problems to do what they are capable of...

 

I can make a build that disables that part for Windows, if that seems to be the problem. But would be nice to figure out first if there's some logic why things don't work on some machines while they do work on others.

 

 

That looks correct, four cores and no hyperthreading. You can spread the load more by enforcing full multicore by checking the multicore DSP box, but I doubt it will help...

 

 

This also looks correct, eight cores with hyperthreading.

just a quick report : 4.1.1 works fine here against all odds (unsupported WS2012 running on a 2012 Mac with ParkControl (( does not mess with core allocation per se but prevents parking of cores so that the CPU always runs at max frequency)

 

did not tried for long but could use xtr2s/DSD5EC with 192 while I can NOT with Embedded 

 

don't feel in the mood for comparison tonight and went swimming in not so warm anymore sea this evening, might affect my audition, but analysed sound as recessed beefier but muddier with hints of not so well integrated highs compared (from memory) to Embedded

Link to comment
3 hours ago, fred_com said:

Edit: there is also a funny thing - even with CUDA offload HQP begin to stutter once there is no other load on the CPU, be it RDP session or Ryzen Master software. When something is running on one of the cores/threads besides the HQP there is no stutter. It's as if HQP needs to have some load on other cores (or maybe that's a Windows thing). Again, this was not the case with previous build. 

 

Sounds like some odd Windows behavior... Like it would be dropping clocks when it shouldn't...

 

Here's a build with some adjustments. Now difference between multicore DSP setting "auto" and "enabled" is bigger, auto doesn't ever try to use parallelized modulators even if there are lot of cores, that is enabled only when multicore DSP is checked. I also modified how parallelized modulators work when that is enabled. In addition core allocation is now the same with and without CUDA - otherwise there are just too many possible combinations.

 

Let's see if this one works better or the same...

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment

I just updated to 4.1.1 on my mid-2012 rMBP.  Using RBCD -> DSD128/44 with ASDM5EC, poly-sinc-ext2 and multicore (and no CUDA due to old Nvidia Geoforce GT 650M), it is using about 20% more CPU than 4.1.0.1 (360% vs 300%) and running about 10C hotter (and therefore pushing the fans more).  Without Multicore, CPU is about the same for both versions (200%); I wasn't using this option with 4.1.0.1. because it runs about 10% hotter than with Multicore.

 

FYI, with 4.1.0.1 I have been able to do RBCD -> DSD256/44 with DSD5EC and a lighter 2s filter, even with this 7 year old computer (2.6 GHz i7) but it runs near CPU temperature limit, with help from TG-Pro providing a more aggressive fan speed. I've got a laptop cooler stand on order to see if that helps.

 

Going back to 4.1.0.1 for now.

Link to comment
56 minutes ago, Miska said:

Sounds like some odd Windows behavior... Like it would be dropping clocks when it shouldn't...

 

By the way, it could be worth trying ParkControl as @Le Concombre Masqué mentioned. Or alternatively simpler method to switch to Ultimate Performance power plan.

 

i7-6950X was unusable on Win10 without High Performance power plan, and the Ultimate Performance power plan helped more. Otherwise Windows was sleeping cores too much.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
2 hours ago, Miska said:

 

Sounds like some odd Windows behavior... Like it would be dropping clocks when it shouldn't...

 

Here's a build with some adjustments. Now difference between multicore DSP setting "auto" and "enabled" is bigger, auto doesn't ever try to use parallelized modulators even if there are lot of cores, that is enabled only when multicore DSP is checked. I also modified how parallelized modulators work when that is enabled. In addition core allocation is now the same with and without CUDA - otherwise there are just too many possible combinations.

 

Let's see if this one works better or the same...

 

The beta build works okay if i manually assign core allocation, if all cores/threads are ticked it still stutters, if SMT threads are unticked it also stutters, if first 4 cores are ticked it also occasionally stutters , setting core allocation to only 8/10/12/14 seems to fix the stuttering completely, so it seems to be a core allocation/threading issue, atleast with AMD/win1903.

Link to comment
3 hours ago, Miska said:

 

By the way, it could be worth trying ParkControl as @Le Concombre Masqué mentioned. Or alternatively simpler method to switch to Ultimate Performance power plan.

 

i7-6950X was unusable on Win10 without High Performance power plan, and the Ultimate Performance power plan helped more. Otherwise Windows was sleeping cores too much.

 

could there be an equivalent of the magic formula (| powercfg /DUPLICATESCHEME e9a42b02-d5df-448d-aa00-03f14749eb61) for Embedded?

Link to comment
4 hours ago, MikePid said:

I just updated to 4.1.1 on my mid-2012 rMBP.  Using RBCD -> DSD128/44 with ASDM5EC, poly-sinc-ext2 and multicore (and no CUDA due to old Nvidia Geoforce GT 650M), it is using about 20% more CPU than 4.1.0.1 (360% vs 300%) and running about 10C hotter (and therefore pushing the fans more).  Without Multicore, CPU is about the same for both versions (200%); I wasn't using this option with 4.1.0.1. because it runs about 10% hotter than with Multicore.

 

FYI, with 4.1.0.1 I have been able to do RBCD -> DSD256/44 with DSD5EC and a lighter 2s filter, even with this 7 year old computer (2.6 GHz i7) but it runs near CPU temperature limit, with help from TG-Pro providing a more aggressive fan speed. I've got a laptop cooler stand on order to see if that helps.

 

Going back to 4.1.0.1 for now.

same machine (but I much prefer HQPOS to macOS for music reproduction) ; yes, the cooler stand will help, at least feeling better with the temperature issue ; tip : close partially (about 1/3) the lid to optimise the airflow

Link to comment
4 hours ago, Yviena said:

The beta build works okay if i manually assign core allocation, if all cores/threads are ticked it still stutters, if SMT threads are unticked it also stutters, if first 4 cores are ticked it also occasionally stutters , setting core allocation to only 8/10/12/14 seems to fix the stuttering completely, so it seems to be a core allocation/threading issue, atleast with AMD/win1903.

 

With both multicore auto and enabled? Does that change core mask? I'm not sure what the stuff you are doing actually does. HQPlayer tries to avoid using SMT threads, leaving those to other tasks in the OS. With your CPU, only 8 real cores are used.

 

My core allocation/pinning just specifically assigns different tasks to different cores so that OS doesn't try to put two heavy tasks on the same core. That was problem sometimes with earlier releases that tasks were not locked to particular cores and sometimes, non-deterministically OS decided to make a bad choice where things go.

 

Essentially you are likely adjusting clock frequency vs number of cores. Since your CPU has only 8 cores, your setting of 8 means it really operates as a quad core? With 12 it would be 6 cores?

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
1 hour ago, Miska said:

 

With both multicore auto and enabled? Does that change core mask? I'm not sure what the stuff you are doing actually does. HQPlayer tries to avoid using SMT threads, leaving those to other tasks in the OS. With your CPU, only 8 real cores are used.

 

My core allocation/pinning just specifically assigns different tasks to different cores so that OS doesn't try to put two heavy tasks on the same core. That was problem sometimes with earlier releases that tasks were not locked to particular cores and sometimes, non-deterministically OS decided to make a bad choice where things go.

 

Essentially you are likely adjusting clock frequency vs number of cores. Since your CPU has only 8 cores, your setting of 8 means it really operates as a quad core? With 12 it would be 6 cores?

With auto/off I'm always limited to 13.5% CPU usage which is not enough for playback without stuttering, clock frequency is set to fixed in bios, the only thing setting core affinity does is telling/limiting which CPU core/thread HQplayer can run on.

 

In my case core 8/10/12/14 corresponds to core 5/6/7/8 without SMT so quad core while 9/11/13/15 is SMT threads belonging to those cores, and in my case HQplayer is still using SMT threads aka core 1/3/5/7/9/11/13/15 if I leave core affinity at default, I can see thread 1 at 65/70 CPU usage which means SMT is used. 

 

Also if I leave it at default without touching affinity CPU usage is around 27-30% while still stuttering very frequently, but limiting it to 4 real cores brings it down to 19-22% CPU usage, a few percent higher than previous 4.1.0.1 which was averaging around 17-19% but no problem as it's only a few percent higher usage.

Link to comment
18 hours ago, Miska said:

 

How does the core mask in log file look like for you?  Per core loads in resource monitor? Does it work any better without CUDA? Are you able to disable CPU's threading from BIOS settings and try if that makes any difference?

 

I'm trying to figure out which change could do it. Desktop 4.1.1 behaves exactly the same as Embedded 4.11.2 which seems to perform the best right now. While the Embedded that matched Desktop 4.1.0.1 didn't. So something is going different on Windows than on Linux. While it should behave exactly the same.

 

I will get the Ryzen 3950X, but it is not available yet...

 

Debian 9,  Jussi  original 4.9.158 jl+ kernel ,  i5 7500 HQPe 4.11.2 stuttering  with DSD 128 7EC ext2 and CPU allocation two are 90% the rest two 30-40% 

 

But in HQPe 4.11.1-31 no stuttering at all, sounds  great ! 4 cores average at 50-70% 

Kernel which I compiled with bfq sheduler plus other optimizations happened being  also OK

So maybe it is CPU related ?

 

 

 

 

 

 

Link to comment
17 minutes ago, maya said:

Debian 9,  Jussi  original 4.9.158 jl+ kernel ,  i5 7500 HQPe 4.11.2 stuttering  with DSD 128 7EC ext2 and CPU allocation two are 90% the rest two 30-40% 

 

But in HQPe 4.11.1-31 no stuttering at all, sounds  great ! 4 cores average at 50-70% 

Kernel which I compiled with bfq sheduler plus other optimizations happened being  also OK

So maybe it is CPU related ?

 

 

 

 

 

 

 

So if desktop 4.1.1 behaves ecactly as embedded 4.11.2, obviously it stutters... 

No windows strange behaviour, then: same results in linux and macos. I think maybe people is too inclined to blame windows everytime something goes wrong... 

Link to comment
12 hours ago, Miska said:

 

Sounds like some odd Windows behavior... Like it would be dropping clocks when it shouldn't...

 

Here's a build with some adjustments. Now difference between multicore DSP setting "auto" and "enabled" is bigger, auto doesn't ever try to use parallelized modulators even if there are lot of cores, that is enabled only when multicore DSP is checked. I also modified how parallelized modulators work when that is enabled. In addition core allocation is now the same with and without CUDA - otherwise there are just too many possible combinations.

 

Let's see if this one works better or the same...

 

Yes, this build works better, thanks! There is no stutter when multicore set to "on" without the CUDA. And it looks like the load is more evenly distributed between real cores.

 

As for the core parking - I've installed Ultimate Performance profile, and Park Control also, but it still stutters once I disconnect RDP session. Seems like some weird Windows doing, indeed. And it's not the case for Ubuntu.

 

The ASDM7EC is still out of reach, though :)

Link to comment
14 minutes ago, fred_com said:

Yes, this build works better, thanks! There is no stutter when multicore set to "on" without the CUDA. And it looks like the load is more evenly distributed between real cores.

 

As for the core parking - I've installed Ultimate Performance profile, and Park Control also, but it still stutters once I disconnect RDP session. Seems like some weird Windows doing, indeed. And it's not the case for Ubuntu.

 

The ASDM7EC is still out of reach, though :)

 

... But it"s the case for Debian, as previous post clearly shows. So the problem should not be in windows - or only in windows

Link to comment
12 minutes ago, Luca72c said:

 

... But it"s the case for Debian, as previous post clearly shows. So the problem should not be in windows - or only in windows

 

I was answering to Jussi's suggestion on how to remedy this specific condition on my PC when HQP begins to stutter when there is no other load on the CPU in Windows. If I'm connected to the PC via RDP, for example, then there is no stutter with the new 4.1.1.1 build. 

Link to comment
2 hours ago, Luca72c said:

 

So if desktop 4.1.1 behaves ecactly as embedded 4.11.2, obviously it stutters... 

No windows strange behaviour, then: same results in linux and macos. I think maybe people is too inclined to blame windows everytime something goes wrong... 

 

4.11.2 went through couple of cycles of testing by me and others where each build had some tuning updates.

 

2 hours ago, maya said:

Debian 9,  Jussi  original 4.9.158 jl+ kernel ,  i5 7500 HQPe 4.11.2 stuttering  with DSD 128 7EC ext2 and CPU allocation two are 90% the rest two 30-40% 

 

But in HQPe 4.11.1-31 no stuttering at all, sounds  great ! 4 cores average at 50-70% 

Kernel which I compiled with bfq sheduler plus other optimizations happened being  also OK

So maybe it is CPU related ?

 

For me, 4.11.1 performance depends on phase of the moon and time of the day... 4.11.2 gives systematically same load distribution always.

 

Have you tried the 4.11.2 HQPlayer OS bootable image? Because that's a good baseline because it is the same OS and build for all. If you want to test on the same baseline as Windows and macOS, you need to use HQPlayer OS or the Ubuntu build.

 

On Debian/Fedora build I turned off one extra optimization which is not always improving performance. I can put it back on for testing. But otherwise for a quad core there shouldn't be much difference between 4.11.1 apart from core pinning if you have multicore set to auto. And if the kernel is throwing the high load threads between CPU cores all the time it certainly won't help performance.

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
2 hours ago, fred_com said:

As for the core parking - I've installed Ultimate Performance profile, and Park Control also, but it still stutters once I disconnect RDP session. Seems like some weird Windows doing, indeed. And it's not the case for Ubuntu.

 

Windows also changes WASAPI devices when you connect over RDP, it hides the real hardware devices and replaces those with network endpoint that redirects audio back to RDP client. Luckily it doesn't know about ASIO, so it cannot mess with that one. But since it is doing such things when RDP client is connected, likely it does something else too, like thinking that the session is idle when RDP is disconnected and sleeps the computer anyway.

 

I don't use RDP myself, but I use VNC instead. That way Windows doesn't understand about remote sessions. See https://tightvnc.com

 

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment

always wanting to be up to date is not necessarily a good idea , i still play fine any of the Ec modulators at DSD256 without any problem with an old windows 1803 and a I7 6700K 😉

PC audio /Roon + HQPLAYER / HOLO Spring 2 / / DIY AD1 SET tube amp  /  DIY Altec 2 way horn Speaker

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...