Jump to content
IGNORED

Visual studio 2012 c++ and wasapi minimalist player


sbgk

Recommended Posts

Hi, I always wondered why different players sound different and why they sounded digital so decided to develop a minimalist wasapi wav player that does nothing but load wav files into memory and play them, there is no gui, just select the required wav files or folder(s) in explorer, right mouse click and select copy (to copy the filenames into the clipboard). Then double click on MQn.bat to start playback. Shutdown the console to stop. The aim was to see what an absolute bare bones player would sound like and I was more than surprised to find that by reducing functionality to a minimum and optimising the memory and render loop the sound quality improved and the hard digital sound disappeared. Here is the link to download the files, just download the files into a directory and make a shortcut to the MQn.bat file on the desktop. https://rapidshare.com/#users|45980080|0ae609ce616a35c8de7ac5fda4b6194c|11541 requires .Net framework 4.5 to be installed, if not already installed at the moment it is win 7/8 64 bit only and can play 16bit 44 and 48 and 24 bit in 32 bit container 48 ->192 (my dac can't play 24 bit), it plays to the default device. Anyway, I originally compiled it using vs2010 and optimised it to the point where it was the best wasapi player I had heard (it uses some innovative ways to render the data to the audio device, compared to other players), then I tried compiling it in vs2012 and this has made a tremendous difference to the sound, the vs2012 version is the file called MQn.exe2012 (just rename it). I think vs2012 optimises the code to take advantage of the modern processors much better than vs2010 did and will have quite an impact on music players' sound quality.

There is no harm in doubt and skepticism, for it is through these that new discoveries are made. Richard P Feynman

 

http://mqnplayer.blogspot.co.uk/

Link to comment
plays to the default device

 

As we know, WASAPI has two modes, shared and exclusive. I'm assuming you are using exclusive mode.

 

it uses some innovative ways to render the data to the audio device, compared to other players

 

How do you know what other players do?

 

requires .Net framework

 

Why does it need .NET framework? It is interpreted bytecode non-native code environment and adds lots of overhead. I would recommend writing minimalist player with combination of plain C and assembler.

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
As we know, WASAPI has two modes, shared and exclusive. I'm assuming you are using exclusive mode.

 

wasapi has 4 modes , shared, exclusive, event driven, push and combinations of these. Of course for the best sound exclusive, event driven is used.

 

 

How do you know what other players do?

 

I look at their code, where available

 

 

Why does it need .NET framework? It is interpreted bytecode non-native code environment and adds lots of overhead. I would recommend writing minimalist player with combination of plain C and assembler.

 

so would I, but it would probably take me years to get to that stage and think I have got it to a good enough standard. If any one else wants to try developing one or knows of a better sounding player then I am happy to pass on what I have tried to do or try other players (so far I have not found one that doesn't sound digital)

 

Here are the ideas I have implemented

 

1. preload the buffer in the render loop so that the data can be released immediately after the load buffer event is fired.

2. use fixed values in the render loop

3. use goto for the render loop

4. use optimised memcpy to copy data to the device buffer in the render loop

5. don't check for fileend, just let it read past the end and throw an exception.

6. no result checking in the render loop, just load the buffer, release the buffer and copy to the device.

7. use virtualalloc to get allocate the buffer instead of malloc so that full pages are allocated. switch off page protection after data loaded.

8. align the device period to the system memory page size 4096 bytes for x86 machines ie load whole memory pages into the device buffer.

9. change the buffer page protection to readonly after loading

10. gapless by appending wav data to the buffer rather than merging etc

11. use vs2012 parallisation

12. use vs2012 fast transendentals

13. optimisations - inline function expansion, intrinsic functions, favor fast code, omit frame pointers, don't enable fiber safe, whole program optimisations, no string pooling, no buffer security check, enable function level linking, MT runtime library, no C++ exceptions, favor intel 64, opt ref and opt icf, target machine x64, embedded manifest.

14. no user interaction apart from start and shutdown.

 

here is the render loop for 16/44.1, each sample/bit rate has it's own loop with precalculated frame and buffer size so that variables aren't used and they are aligned to the memory page size, buffer is preloaded and released immediately after the buffer fill event trigger

 

loop1644:

 

WaitForSingleObject(hNeedDataEvent, INFINITE);

hr = pAudioRenderClient->ReleaseBuffer(256, 0);

hr = pAudioRenderClient->GetBuffer(256, &pData);

A_memcpy (pData, sound_buffer += 1024, 1024);

 

goto loop1644;

 

It's easy enough to try, all you need is a win64 laptop/pc and headphones, that is enough to judge whether it is any good.

There is no harm in doubt and skepticism, for it is through these that new discoveries are made. Richard P Feynman

 

http://mqnplayer.blogspot.co.uk/

Link to comment

if one of your speaker cables is undone do you need a blind listening test to verify it ? if a change is big enough then I don't see the value in unnecessary tests, obviously anything I say would have to be independently verified, so I'll leave that to others.

There is no harm in doubt and skepticism, for it is through these that new discoveries are made. Richard P Feynman

 

http://mqnplayer.blogspot.co.uk/

Link to comment
I look at their code, where available

 

I'm curious, for which players the source code is available?

 

here is the render loop for 16/44.1, each sample/bit rate has it's own loop with precalculated frame and buffer size so that variables aren't used and they are aligned to the memory page size, buffer is preloaded and released immediately after the buffer fill event trigger

 

IMO, you shouldn't be doing any preload/release from the render loop at all...

 

It's easy enough to try, all you need is a win64 laptop/pc and headphones, that is enough to judge whether it is any good.

 

For me, it's missing oversampling digital filters and delta-sigma modulation, and the ones built into DACs I have sound poor. :)

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
I'm curious, for which players the source code is available?

 

 

 

IMO, you shouldn't be doing any preload/release from the render loop at all...

 

 

For me, it's missing oversampling digital filters and delta-sigma modulation, and the ones built into DACs I have sound poor. :)

 

PlayPCMWin, Pureplayer, portaudio, audacity, MSDN articles, MediaPortal, vlc media player etc

 

you are proposing not releasing inside the render loop, how would the data get to the device ? In fact what is a render loop without rendering to the device ?

 

I was trying to demonstrate the effects of optimised code in the player, an area that seems to be overlooked by most player developers yet absolutely key to the sound quality IMO.

 

I have a nad M51 so sound is of an acceptable quality.

There is no harm in doubt and skepticism, for it is through these that new discoveries are made. Richard P Feynman

 

http://mqnplayer.blogspot.co.uk/

Link to comment
PlayPCMWin, Pureplayer, portaudio, audacity, MSDN articles, MediaPortal, vlc media player etc

 

Two first ones are new to me, the rest are not audiophile players anyway.

 

you are proposing not releasing inside the render loop, how would the data get to the device ? In fact what is a render loop without rendering to the device ?

 

OK, maybe I misunderstood your statement. I was thinking you meant the sound_buffer. Anyway, there's not much to optimize in that four call loop, so playing with compiler optimizations won't change that loop anyway, since it just makes four call outside. And that memcpy() should be written in SSE assembler anyway so nothing for compiler to optimize there either.

 

I was trying to demonstrate the effects of optimised code in the player, an area that seems to be overlooked by most player developers yet absolutely key to the sound quality IMO.

 

IMO, the most overlooked thing are all the DSP things you could do with all the CPU power you have. Instead of burning those CPU cycles in OS kernel's idle-loop.

 

My own approach is to move this very optimized playback part to a small battery powered embedded ARM computer running just Linux kernel and the playback part. Audio goes there over ethernet connection for playback, after it has been processed by the player in a normal computer. CPU load on 800 MHz ARM is less than 1%.

 

I have a nad M51 so sound is of an acceptable quality.

 

I have not heard it, but it is hard to imagine it couldn't be optimized by good quality external digital filter always feeding it at 192/24.

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
I tested MQn on my 3 audio interfaces but I was not able to play music after all.

 

I guess your program runs with very limited hardware combination.

Do you test your program with any audio interfaces other than NAD M51 ?

 

do you have any details of hardware/ os/sample rates dac etc. it should play to the laptop/pc internal device as a minimum.

There is no harm in doubt and skepticism, for it is through these that new discoveries are made. Richard P Feynman

 

http://mqnplayer.blogspot.co.uk/

Link to comment

Thanks for your reply.

 

Mac Mini (Mid 2010)

Windows 7 x64 SP1

48000Hz 24bit 2ch WAV file created by SoX

 

Tested audio interfaces are RME Fireface 400, Lynx Hilo and TEAC UD-501.

 

RME Fireface 400 and Lynx Hilo are the same behavior.

MQn.exe starts and MQn.exe appeared on application list on windows mixer but there is no sound.

 

With TEAC UD-501, MQn.exe prints "music" and exits immediately.

 

Hope this helps

Sunday programmer since 1985

Developer of PlayPcmWin

Link to comment

so playing with compiler optimizations won't change that loop anyway, since it just makes four call outside. And that memcpy() should be written in SSE assembler anyway so nothing for compiler to optimize there either.

 

I have tried various combinations of code and compiler settings and there are audible differences, memcpy produces quite a hard digital effect typical of what I hear from most memory players, otherwise I would still be using it. Agner Fog's optimised memcpy A_memcpy seems to solve that particular issue. http://www.agner.org/optimize/asmlib-instructions.pdf

 

upsampling has always produced unsatisfactory results for me, I just concentrate on delivering what's there, 16/44.1 can sound surprisingly good once the noise has been removed.

 

I think i have proved to my own satisfaction that it is the noise produced by the player that is the issue and dsp settings are merely trying to repair the damage, I prefer less cpu load.

There is no harm in doubt and skepticism, for it is through these that new discoveries are made. Richard P Feynman

 

http://mqnplayer.blogspot.co.uk/

Link to comment

Hi,

 

you're an expert at this sort of thing any ideas of what the problem is ? the player plays 16/44.1 in 16 bits, 24/48 plays in a 32 bit container, just tested it with a 24/96 track downsampled using dbpoweramp. My M51 is connected via hdmi, but shall try usb to mf vlink 192 to nad M51 and usb to mf vlink 192 to benchmark dac1 (24 bit) over the weekend, the only other person who has had issues was using 32 bit windows. Is your dac 16 , 24 or 32 bit ?

There is no harm in doubt and skepticism, for it is through these that new discoveries are made. Richard P Feynman

 

http://mqnplayer.blogspot.co.uk/

Link to comment

SBGK,

 

This is wild guess, It seems your code fix sample buffer size to the optimum value for your device but

its value varies by device-by-device and it must be calculated by "alignment dance"

 

RME Fireface 400 and Lynx Hilo accepts 24/48 in 32bit container.

TEAC UD-501 plays 24/48 in 24bit container.

Sunday programmer since 1985

Developer of PlayPcmWin

Link to comment
upsampling has always produced unsatisfactory results for me, I just concentrate on delivering what's there, 16/44.1 can sound surprisingly good once the noise has been removed.

 

Well, the NAD M51 does it internally anyway. I'm usually not happy with those implementations.

 

I think i have proved to my own satisfaction that it is the noise produced by the player that is the issue and dsp settings are merely trying to repair the damage, I prefer less cpu load.

 

I would go by first fixing the strong hardware interference issues you are observing. If computer generated electrical noise has such big impact on the sound quality of your DAC, I would go first by reducing that noise through hardware route.

 

Trying to minimize computer generated noise through one small piece of software running on such large multitasking OS as Windows with all the garbage collectors and other stuff of .NET is a fight against windmills. I don't mind someone trying it out, but I rather do it by replacing the entire hardware with much less noisy one and running custom built small OS and software for the purpose without anything else.

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
Agner Fog's optimised memcpy A_memcpy seems to solve that particular issue. http://www.agner.org/optimize/asmlib-instructions.pdf

 

Seems to be somewhat similar to my own, but mine has the advantage of knowing size of the smallest possible copy unit. So instead of bytes the copy size is defined in samples and each different sample size has it's own optimized copy.

 

These generic memcpy() implementations have number of problems:

1) they must be able to handle sizes that are not multiples of sample size

2) they must be able to handle copies to/from non cache line aligned addresses

Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Link to comment
if one of your speaker cables is undone do you need a blind listening test to verify it ? if a change is big enough then I don't see the value in unnecessary tests, obviously anything I say would have to be independently verified, so I'll leave that to others.

 

So you did not and are not interested in verifying your technically highly unlikely findings. Why was it expectable...

Link to comment
Won't have impact on sound quality in this scope anyway due to extremely low data rate....

 

But the optimizations have impact on overall CPU load for some heavy DSP operations.

 

I agree with you on this topic. I'm always impressed your honest and insightful posts.

Sunday programmer since 1985

Developer of PlayPcmWin

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...