DSD encoding with SoX

mansr · August 25, 2015

When I recently decided to take a closer look at the DSD phenomenon to see for myself whether it lived up to the hype, I noticed a dearth of open source tools with support for this format, so I set about rectifying this situation by adding the necessary functionality to SoX. The resulting code is available from https://github.com/mansr/sox with the following features:

- DSF read/write

- DFF read-only

- DSD to PCM conversion using the existing resampler

- PCM to DSD conversion

The DSD encoding supports four settings with increasing quality and decreasing speed: fast, hq, audiophile, goldenear. Below are RightMark Audio Analyzer results showing the performance. I encoded the 96/24 test signal to DSD64 and back to PCM at each of the quality levels. For reference, JRiver is also included in the analysis.

First the tabular report:

Frequency response:

Noise level:

Dynamic range:

Harmonic distortion:

wgscott · August 25, 2015

I set about rectifying this situation by adding the necessary functionality to SoX.

You da man!

Just to clarify, will this be in future (current?) versions of sox, or is this a fork?

mansr · August 25, 2015

Just to clarify, will this be in future (current?) versions of sox, or is this a fork?

That's up the maintainer.

wgscott · August 25, 2015

I guess I was asking if you were one of the maintainers. I hope they include it. That looks like a great contribution. Thanks for doing it, and making it open-source.

Jud · August 30, 2015

Hi, what's the reason for the spike in harmonic distortion at 1KHz? (Sorry, I'm just ignorant of this stuff but happy to learn.)

And yes, thanks for the contribution to SoX/open source.

mansr · August 30, 2015

Hi, what's the reason for the spike in harmonic distortion at 1KHz?

1kHz is the test tone. Harmonic distortion is visible as blips at multiples of thereof, i.e. 2kHz, 3kHz etc.

Jud · August 30, 2015

1kHz is the test tone. Harmonic distortion is visible as blips at multiples of thereof, i.e. 2kHz, 3kHz etc.

Got it. (Wow, that *was* an ignorant question!)

rikm · August 31, 2015

Thanks mansr, this looks to be very useful, but I have one too [an ignorant question, I mean]...it compiles well enough and:

sox filename.flac -r 2822400 -b 1 filename.dsf

produces, after thinking about it for a few minutes, a file that plays, but can you say what the syntax is for the 'hq' 'audiofile' and 'goldenear' parameters would be? Have had no luck getting any of them accepted...

thanks again,

rikm

mansr · September 1, 2015

Thanks mansr, this looks to be very useful, but I have one too [an ignorant question, I mean]...it compiles well enough and:

sox filename.flac -r 2822400 -b 1 filename.dsf

produces, after thinking about it for a few minutes, a file that plays, but can you say what the syntax is for the 'hq' 'audiofile' and 'goldenear' parameters would be? Have had no luck getting any of them accepted...

Add "sdm -f hq" to the end of that command. You might also want to increase the quality setting of the resampler, like this:

sox file.flac file.dsf rate -v 2822400 sdm -f hq

bibo01 · September 1, 2015

Can I also do a DSD64 to a DSD256?

What command?

mansr · September 1, 2015

Can I also do a DSD64 to a DSD256?
What command?

Yes. To do that you need to remove the high-frequency noise from the original, resample to 256x rate, and quantise to 1-bit. Like this:

sox in.dsf out.dsf rate -v 88200 gain 6 rate -v 11289600 sdm

The "gain 6" is needed to restore the volume level from the -6dB reference level for DSD/SACD. However, some DSD files (e.g. from dsdfile.com / Opus3) are encoded at a higher (nonstandard) level, and those require a lower gain setting to avoid clipping. You can also omit the "gain" step entirely and simply get a quieter output file.

Note that although the command above will produce valid output, the filters are optimised for DSD64. Using them for higher rates simply raises the frequency where the noise level starts rising without improving anything in the low part of the spectrum. With filters tuned for DSD128 or DSD256, other characteristics would be possible. I simply haven't got around to exploring those options yet.

rikm · September 1, 2015

Thanks again mansr, that works just fine...and as you cautioned in your OP, it does take significantly longer for hq...and as was also mentioned above, hope this makes it to the release version...

rikm

Archimago · September 5, 2015

mansr: Great job!

Wondering, is there any move towards a better file format for DSD? Something that can handle both tagging and (DST-like?) compression for DSD64/128/256. To have software like SoX be capable of handling conversion of .dff/.dsf to this new format, MP3Tag to allow easy tagging, and JRiver for playback I suspect would be a great forward!

We need a new "FLAC" but for DSD :-).

One and a half · September 5, 2015

Wondering, is there any move towards a better file format for DSD? Something that can handle both tagging and (DST-like?) compression for DSD64/128/256. To have software like SoX be capable of handling conversion of .dff/.dsf to this new format, MP3Tag to allow easy tagging, and JRiver for playback I suspect would be a great forward!

We need a new "FLAC" but for DSD :-).

Not quite sure what to make of this post. MP3Tag can edit metadata in a DSF file just as easy as a FLAC. If you wish that Sox has these attributes, that's another story.

Metadata edits using Jriver for DSF don't stick for some reason, no big loss don't use the program much.

We need another audio format like a hole in the head unless there's REALLY good justification. Now if what you're proposing is that a 4min DSD file can be squeezed to <100kB with the same resolution and SQ, there would be interest and would save me copious $ from buying RAID.

mansr · September 5, 2015

We need another audio format like a hole in the head unless there's REALLY good justification. Now if what you're proposing is that a 4min DSD file can be squeezed to <100kB with the same resolution and SQ, there would be interest and would save me copious $ from buying RAID.

I agree, file format proliferation is bad enough as it is. The main problem with the currently popular DSF is the size. DSD is inherently large, and since much of the information contained is noise, it is hard to compress well. DST still manages pretty well, and doing better would likely be very difficult. The trouble with DST is its complexity. It's not easy to write a decoder, much less an encoder, and making it fast is a challenge on top of that. I wonder if there might be a use for a simpler compression scheme that (obviously) didn't compress as well but was easy to implement efficiently. Extending DSF to accommodate compressed data would be trivial, or some other well-known container format could be used.

Miska · September 5, 2015

The main problem with the currently popular DSF is the size. DSD is inherently large

I don't think there's any real problem with the size. 6 TB of disk space costs peanuts these days.

I wonder if there might be a use for a simpler compression scheme that (obviously) didn't compress as well but was easy to implement efficiently. Extending DSF to accommodate compressed data would be trivial, or some other well-known container format could be used.

ZIP and other similar ones work fairly OK. The compression ratio is not high, but still makes difference to uncompressed.

I've thought about adding gzip compression support to DSF and why not to DSDIFF too. That would be a bit like TIFF for image data.

audiventory · September 5, 2015

ZIP and other similar ones work fairly OK. The compression ratio is not high, but still makes difference to uncompressed.

I've thought about adding gzip compression support to DSF and why not to DSDIFF too. That would be a bit like TIFF for image data.

For some cases ZIP can save about 25% of space (depend on stuff). However it is out of DSF and DFF standards. Need support from (as software as hardware) audio player side.

Miska · September 5, 2015

For some cases ZIP can save about 25% of space (depend on stuff). However it is out of DSF and DFF standards. Need support from (as software as hardware) audio player side.

Sure, it is non-standard, but would be fairly easy to establish as extra, just like ID3v2 tags on WAV or DSDIFF. Although it naturally wouldn't play on software that doesn't support it. Given how DSDIFF is designed, it would be less likely to cause conflicts there.

But overall, I don't feel a pressing need for such things given how cheap the storage is. If the storage space is issue for someone, time will fix that...

audiventory · September 5, 2015

Again will infinite comparing sound quality like WAV vs. FLAC

mansr · September 5, 2015

I don't think there's any real problem with the size. 6 TB of disk space costs peanuts these days.

True, size isn't much of an issue these days, but it's still the biggest drawback of DSF.

ZIP and other similar ones work fairly OK. The compression ratio is not high, but still makes difference to uncompressed.

The problem with standard zip is that it operates on bytes whereas a bit-oriented compression may perform better on DSD data.

Miska · September 5, 2015

The problem with standard zip is that it operates on bytes whereas a bit-oriented compression may perform better on DSD data.

Sure, but it (zlib to be specific) is fairly widely supported standard and generic compression standard that is also fairly fast to decompress even on lighter CPUs. And achieves some space savings on DSD depending on the content.

It doesn't really matter if the compression is byte based on something else, as the general purpose compressor cannot make any assumptions about alignment or word lengths of the data being compressed.

It would be easy enough to add and without big negative impact on performance, that's why I've considered it. Although I don't have big "need" for it.

mansr · September 5, 2015

Sure, but it (zlib to be specific) is fairly widely supported standard and generic compression standard that is also fairly fast to decompress even on lighter CPUs. And achieves some space savings on DSD depending on the content.

It doesn't really matter if the compression is byte based on something else, as the general purpose compressor cannot make any assumptions about alignment or word lengths of the data being compressed.

It's not a question of alignment, it's that a compressor probably has a better chance of finding patterns at the bit level. As a stupid example, consider a sequence like 000111000111... repeating. Viewed as bits, the pattern is obvious. When bundled into bytes, this sequence looks like 00011100 01110001 11000111 etc. It eventually repeats, but the period is much longer.

On the other hand, there's so much randomness (noise) inherent in a DSD bitstream that it perhaps doesn't make much difference.

It would be easy enough to add and without big negative impact on performance, that's why I've considered it. Although I don't have big "need" for it.

Oh sure. The ubiquity of zlib is a compelling argument in its favour.

Miska · September 5, 2015

It's not a question of alignment, it's that a compressor probably has a better chance of finding patterns at the bit level. As a stupid example, consider a sequence like 000111000111... repeating. Viewed as bits, the pattern is obvious. When bundled into bytes, this sequence looks like 00011100 01110001 11000111 etc. It eventually repeats, but the period is much longer.

Usually you can certainly get better compression ratios when you have some knowledge about the content. For example if the compressor would know that data contains 16-, 24- or 32-bit words instead of operating of bytes or something else.

Huffman-coding does pretty good job with DSD, just like it does with JPEG and MP3 which also don't have byte oriented data stream but can utilize the exact number of bits they need for certain entity.

mansr · September 5, 2015

Huffman-coding does pretty good job with DSD, just like it does with JPEG and MP3 which also don't have byte oriented data stream but can utilize the exact number of bits they need for certain entity.

The Huffman coding in JPEG and MP3 doesn't operate on bytes.

Anyhow, this discussion is veering into the pointless...

manisiutkin · September 6, 2015

DST still manages pretty well, and doing better would likely be very difficult. The trouble with DST is its complexity. It's not easy to write a decoder, much less an encoder, and making it fast is a challenge on top of that.

I wouldn't agree about complexity in algorithmic sense. It's kind of "traditional" FIR filter prediction plus arithmetic encoding of residuals. The actual problem is in performance because DST tries to predict each bit of the DST stream. And at high DSD samplerates it is the hard job even for modern CPUs. Just like doing a dozen line SDM gives a performance issue. I probably do DST encoder as well, but I don't see where it can be used.

DSD encoding with SoX

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Create an account or sign in to comment

Create an account

Sign in