Upsampling MQA files to original resolution with sox will sound like the original resolution

soxr · July 9, 2017

Patching ffmpeg for sox minimum phase looks trivial to me.

In libswresample/soxr_resample.c observe the following chunk of code:

static struct ResampleContext *create(struct ResampleContext *c, int out_rate, int in_rate, int filter_size, int phase_shift, int linear,
        double cutoff, enum AVSampleFormat format, enum SwrFilterType filter_type, double kaiser_beta, double precision, int cheby, int exact_rational){
    soxr_error_t error;

    soxr_datatype_t type =
        format == AV_SAMPLE_FMT_S16P? SOXR_INT16_S :
        format == AV_SAMPLE_FMT_S16 ? SOXR_INT16_I :
        format == AV_SAMPLE_FMT_S32P? SOXR_INT32_S :
        format == AV_SAMPLE_FMT_S32 ? SOXR_INT32_I :
        format == AV_SAMPLE_FMT_FLTP? SOXR_FLOAT32_S :
        format == AV_SAMPLE_FMT_FLT ? SOXR_FLOAT32_I :
        format == AV_SAMPLE_FMT_DBLP? SOXR_FLOAT64_S :
        format == AV_SAMPLE_FMT_DBL ? SOXR_FLOAT64_I : (soxr_datatype_t)-1;

    soxr_io_spec_t io_spec = soxr_io_spec(type, type);

    soxr_quality_spec_t q_spec = soxr_quality_spec((int)((precision-2)/4), (SOXR_HI_PREC_CLOCK|SOXR_ROLLOFF_NONE)*!!cheby);
    q_spec.precision = precision;
#if !defined SOXR_VERSION /* Deprecated @ March 2013: */
    q_spec.bw_pc = cutoff? FFMAX(FFMIN(cutoff,.995),.8)*100 : q_spec.bw_pc;
#else
    q_spec.passband_end = cutoff? FFMAX(FFMIN(cutoff,.995),.8) : q_spec.passband_end;
#endif

    soxr_delete((soxr_t)c);
    c = (struct ResampleContext *)
        soxr_create(in_rate, out_rate, 0, &error, &io_spec, &q_spec, 0);
    if (!c)
        av_log(NULL, AV_LOG_ERROR, "soxr_create: %s\n", error);
    return c;
}

We need to modify the quality spec (q_spec).

Observe how they already override flags in the quality spec:

q_spec.precision = precision;

Looking at the soxr header file and what properties can be changed:
https://github.com/chirlu/soxr/blob/master/src/soxr.h

just add one extra line extra which overrides the phase response:

q_spec.precision = precision;
q_spec.phase_response = 0;

And this is based on the possible values in the soxr header file:

double phase_response; /* 0=minimum, ... 50=linear, ... 100=maximum 50 */

Hifi Bob · July 10, 2017

Or just submit the feature request to ffmpeg?

Sloop John B · July 19, 2017

On 09/07/2017 at 11:35 PM, soxr said:

Patching ffmpeg for sox minimum phase looks trivial to me.

In libswresample/soxr_resample.c observe the following chunk of code:


static struct ResampleContext *create(struct ResampleContext *c, int out_rate, int in_rate, int filter_size, int phase_shift, int linear,
        double cutoff, enum AVSampleFormat format, enum SwrFilterType filter_type, double kaiser_beta, double precision, int cheby, int exact_rational){
    soxr_error_t error;

    soxr_datatype_t type =
        format == AV_SAMPLE_FMT_S16P? SOXR_INT16_S :
        format == AV_SAMPLE_FMT_S16 ? SOXR_INT16_I :
        format == AV_SAMPLE_FMT_S32P? SOXR_INT32_S :
        format == AV_SAMPLE_FMT_S32 ? SOXR_INT32_I :
        format == AV_SAMPLE_FMT_FLTP? SOXR_FLOAT32_S :
        format == AV_SAMPLE_FMT_FLT ? SOXR_FLOAT32_I :
        format == AV_SAMPLE_FMT_DBLP? SOXR_FLOAT64_S :
        format == AV_SAMPLE_FMT_DBL ? SOXR_FLOAT64_I : (soxr_datatype_t)-1;

    soxr_io_spec_t io_spec = soxr_io_spec(type, type);

    soxr_quality_spec_t q_spec = soxr_quality_spec((int)((precision-2)/4), (SOXR_HI_PREC_CLOCK|SOXR_ROLLOFF_NONE)*!!cheby);
    q_spec.precision = precision;
#if !defined SOXR_VERSION /* Deprecated @ March 2013: */
    q_spec.bw_pc = cutoff? FFMAX(FFMIN(cutoff,.995),.8)*100 : q_spec.bw_pc;
#else
    q_spec.passband_end = cutoff? FFMAX(FFMIN(cutoff,.995),.8) : q_spec.passband_end;
#endif

    soxr_delete((soxr_t)c);
    c = (struct ResampleContext *)
        soxr_create(in_rate, out_rate, 0, &error, &io_spec, &q_spec, 0);
    if (!c)
        av_log(NULL, AV_LOG_ERROR, "soxr_create: %s\n", error);
    return c;
}

We need to modify the quality spec (q_spec).

Observe how they already override flags in the quality spec:

q_spec.precision = precision;

Looking at the soxr header file and what properties can be changed:
https://github.com/chirlu/soxr/blob/master/src/soxr.h

just add one extra line extra which overrides the phase response:

q_spec.precision = precision;
q_spec.phase_response = 0;

And this is based on the possible values in the soxr header file:

double phase_response; /* 0=minimum, ... 50=linear, ... 100=maximum 50 */

Trivial, that's what I was thinking too.

.sjb

Mihaylov · July 21, 2017

On 10.07.2017 at 1:35 AM, soxr said:

Patching ffmpeg for sox minimum phase looks trivial to me.


static struct ResampleContext *create(struct ResampleContext *c, int out_rate, int in_rate, int filter_size, int phase_shift, int linear,
        double cutoff, enum AVSampleFormat format, enum SwrFilterType filter_type, double kaiser_beta, double precision, int cheby, int exact_rational){
    soxr_error_t error;

    soxr_datatype_t type =
        format == AV_SAMPLE_FMT_S16P? SOXR_INT16_S :
        format == AV_SAMPLE_FMT_S16 ? SOXR_INT16_I :
        format == AV_SAMPLE_FMT_S32P? SOXR_INT32_S :
        format == AV_SAMPLE_FMT_S32 ? SOXR_INT32_I :
        format == AV_SAMPLE_FMT_FLTP? SOXR_FLOAT32_S :
        format == AV_SAMPLE_FMT_FLT ? SOXR_FLOAT32_I :
        format == AV_SAMPLE_FMT_DBLP? SOXR_FLOAT64_S :
        format == AV_SAMPLE_FMT_DBL ? SOXR_FLOAT64_I : (soxr_datatype_t)-1;

    soxr_io_spec_t io_spec = soxr_io_spec(type, type);

    soxr_quality_spec_t q_spec = soxr_quality_spec((int)((precision-2)/4), (SOXR_HI_PREC_CLOCK|SOXR_ROLLOFF_NONE)*!!cheby);
    q_spec.precision = precision;
#if !defined SOXR_VERSION /* Deprecated @ March 2013: */
    q_spec.bw_pc = cutoff? FFMAX(FFMIN(cutoff,.995),.8)*100 : q_spec.bw_pc;
#else
    q_spec.passband_end = cutoff? FFMAX(FFMIN(cutoff,.995),.8) : q_spec.passband_end;
#endif

    soxr_delete((soxr_t)c);
    c = (struct ResampleContext *)
        soxr_create(in_rate, out_rate, 0, &error, &io_spec, &q_spec, 0);
    if (!c)
        av_log(NULL, AV_LOG_ERROR, "soxr_create: %s\n", error);
    return c;
}

But not to me . Can anyone do static ffmpeg.exe for 32 bit Windows?

Hydralesia · March 10, 2020

Sorry to revive this old topic but I was also reading into MQA recently and figured that there must be a way to replicate what MQA-enabled decoders would do.

Asa test, I selected "MOZART Violin Concerto in D major KV 218, I. Allegro", specifically the following files:

2L-038_MQA2016-352k-24b_01.flac (the desired result)
2L-038-MQA-2016_01_stereo.mqa.flac (the MQA)

However, the result is not what I expected after applying the recommended sox resampling using the following command:

sox -S "2L-038-MQA-2016_01_stereo.mqacd.mqa.flac" "2L-038-MQA-2016_01_stereo.mqacd.mqa.upsampled.flac" rate -vsM 352800

So, the desired spectrum result should look similar to this:

However, using the above mentioned command I get the following result instead:

The original MQA file spectrum for reference:

Any clues what to do here?

Btw.: current SoX version is v14.4.2 and I am on Linux.

misterspense · March 12, 2020

On 3/10/2020 at 7:10 PM, Hydralesia said:
Sorry to revive this old topic but I was also reading into MQA recently and figured that there must be a way to replicate what MQA-enabled decoders would do.

Asa test, I selected "MOZART Violin Concerto in D major KV 218, I. Allegro", specifically the following files:

2L-038_MQA2016-352k-24b_01.flac (the desired result)

2L-038-MQA-2016_01_stereo.mqa.flac (the MQA)

However, the result is not what I expected after applying the recommended sox resampling using the following command:
sox -S "2L-038-MQA-2016_01_stereo.mqacd.mqa.flac" "2L-038-MQA-2016_01_stereo.mqacd.mqa.upsampled.flac" rate -vsM 352800
So, the desired spectrum result should look similar to this:

However, using the above mentioned command I get the following result instead:

The original MQA file spectrum for reference:

Any clues what to do here?

Btw.: current SoX version is v14.4.2 and I am on Linux.

The second and third pictures look a lot like each other, just the scaling on vertical axis is different. The second picture shows the frequency spectrum up until 176kHz, while the third picture only shows up to 22kHz.

The first picture is interesting, it shows the aliasing effect, when a 'leaky' filter is used for upsampling. The major difference between the first and second picture is that the first was created using a filter that allowed aliasing, and the second had a steeper filter with no aliasing.

Miska · March 12, 2020

3 hours ago, misterspense said:

The first picture is interesting, it shows the aliasing effect, when a 'leaky' filter is used for upsampling. The major difference between the first and second picture is that the first was created using a filter that allowed aliasing, and the second had a steeper filter with no aliasing.

First one is just the original DXD, there's no upsampling or aliasing. The noise at top is the noise shaping left-over of the delta-sigma ADC chip.

On 3/10/2020 at 8:10 PM, Hydralesia said:

However, using the above mentioned command I get the following result instead:

You need to first perform first unfold in software and then do the upsampling to get more what you'd expect. Now the result you are getting just as expected with MQA data as encoded noise below 22.05 kHz.

Hydralesia · March 12, 2020

9 hours ago, Miska said:

First one is just the original DXD, there's no upsampling or aliasing. The noise at top is the noise shaping left-over of the delta-sigma ADC chip.

Yes, that's what I assume as well, as the original source at 2L states DXD.

9 hours ago, Miska said:

You need to first perform first unfold in software and then do the upsampling to get more what you'd expect. Now the result you are getting just as expected with MQA data as encoded noise below 22.05 kHz.

Oh yes, how could I forget about the unfolding... my bad.

Is there any open-source way to do this in software to this date?

granosalis · May 13, 2020

Hello,

This tread was really very interesting, it's a pity that it's turned off.

is there any way to perform the MQA upsampling in minimserver (using minimstreamer) of MQA file using sox?

Thanks and regards,

Giuseppe

Cebolla · May 14, 2020

On 5/13/2020 at 11:23 AM, granosalis said:

is there any way to perform the MQA upsampling in minimserver (using minimstreamer) of MQA file using sox?

It should be possible, but as has been pointed out by Miska two posts above yours:

On 3/12/2020 at 11:40 AM, Miska said:

You need to first perform first unfold in software and then do the upsampling to get more what you'd expect. Now the result you are getting just as expected with MQA data as encoded noise below 22.05 kHz.

How do you propose to get the first unfold from the MQA files into Minimserver/Minimstreamer for it to do the upsampling with filter in the first place?

granosalis · May 14, 2020

2 minutes ago, Cebolla said:

It should be possible, but as has been pointed out by Miska two posts above yours, how do you propose to get the first unfold from the MQA files into Minimserver/Minimstreamer for it to do the upsampling with filter in the first place?

Is the first unfold strictly necessary?

Cebolla · May 14, 2020

Possibly not for 16-bit MQA-CD file tracks, as it's hard to see how much of higher frequency info can be uncompressd from just the one bit that's MQA encoded, especially as the same bit is also used to contain other information such as identify it as MQA encoded & what it should be upsampled to. However, the same could not be said for the 24-bit MQA source file tracks which has 9 of those bits MQA encoded.

Miska · May 14, 2020

The number of bits they use for MQA uses some bit reservation, so it is the same throughout the track, but can vary from track to another. For example one test track I'm using (24-bit MQA encoded) has 15-bit resolution because it has so much high frequency content that MQA needs to reserve quite a lot of bits for it.

So I wouldn't be surprised to see MQA CD have just 12 bits of actual resolution, although I have not encountered any MQA CD myself, so I have not checked.

Hydralesia · May 15, 2020

On 5/14/2020 at 9:56 PM, Miska said:

The number of bits they use for MQA uses some bit reservation, so it is the same throughout the track, but can vary from track to another. For example one test track I'm using (24-bit MQA encoded) has 15-bit resolution because it has so much high frequency content that MQA needs to reserve quite a lot of bits for it.

I might have missed that in this thread, but what do use for testing and how did you find out about the actual bits?

On 5/14/2020 at 9:56 PM, Miska said:

So I wouldn't be surprised to see MQA CD have just 12 bits of actual resolution, although I have not encountered any MQA CD myself, so I have not checked.

Taking bits away is taking information away from the original, no matter how good algorithms get, trying to "restore" the original.

Wouldn't it be really cool if we - at some day - have MIDI-style music (actual encoded notes and everything around it) together with track-specific instrument/voice profiles and algorithms/code for how to accurately restore what the musicians actually wanted it to sound like? At least that would be the "cleanest" solution - taking any source of limitations or errors (microphone, room, instrument, player/singer, ...) and analog-to-digital conversion out of the equation.

Just kidding. I like to hear where things have been recorded, as each recording/microphone/... in a specific room has it's own noise floor (as long as mastering doesn't try to eliminate all of that and ruin the sound). On the other side, there are more and more combos/projects that record things individually and remotely, using completely different equipment of different quality.

So, for MQA, what was the selling point again?

When it comes to perceptual quality vs. lower-resolution FLAC, it certainly might have a point, but then I could also use a proper AAC encoder (and decoder, probably) to save even more download bandwidth/storage space (even open-source encoders can do a pretty good job these days). In the end, I will lose information compared to a pure FLAC, no matter what marketing says: bits taken away are taken away, period.

The perceptual quality is a different point, but then you could argue the same when comparing a 16 kHz 8 bit FLAC with a 48 kHz 16 bit AAC @ 128 kbit/s. That comparison let's AAC stand out as superior. Although that is a bit drastic and mathematics are completely different (no technical comparison intended here), it is the exact same type of comparison, perceptually.

I have no doubt that a lot of brains and maths and research has been put into MQA, but it completely messes up with "lossless" as it misuses the FLAC format. The bad thing here is: if I where a music streaming provider and I would believe the MQA guys and taken for granted I always assumed FLAC to be truly lossless, I could simply offer MQA'd FLAC's and not offer a non-MQA version to save on money (for mass-upsampling all MQA material using dedicated certified hard- or software) and storage space. That would betray any potential customer that is not interested in MQA nor have hardware or software for proper decoding/upsampling. They would still just see FLAC files and assume that it's lossless, for which it is not, although they might "sound" very similar.

Concerning listening tests, even AAC 256 kbit/s was more than enough to "fool" almost everyone no matter how good the listeners ears where (supposedly) trained or how good the equipment was. From that perspective, the actual point for having FLAC or anything similar is to be able to encode into a newer lossy format later to achieve even better perceptual quality at even lower storage space used than available today. OTOH, I personally can recall that certain decoders (even in expensive hardware) messed up with decoding MP3, OGG or even AAC good enough, which might be another source for errors to eliminate in the listening chain.

One point I always miss in all these discussions is energy efficiency for decoding. There is generally little work done in this area, but as we all want to reduce our energy footprint, it is getting much more of a point. Usually, you'd listen not just for a few minutes but for hours. I'd be interested in having some numbers comparing an MQA-certified DAC playing a true lossless FLAC at 192 kHz / 24 bit vs. a 48 kHz / 24 MQA-FLAC being upsampled to the same and how that compares to other lossy codecs (whatever that DAC supports).

Miska · May 15, 2020

46 minutes ago, Hydralesia said:

So, for MQA, what was the selling point again?

Lossy codec with DRM style features (full resolution decoding is strictly controlled)

Main selling point I guess to record labels.

46 minutes ago, Hydralesia said:

When it comes to perceptual quality vs. lower-resolution FLAC, it certainly might have a point, but then I could also use a proper AAC encoder (and decoder, probably) to save even more download bandwidth/storage space (even open-source encoders can do a pretty good job these days).

As I have shown before, MQA doesn't really save bandwidth compared to normal FLAC if you are to have equivalent resolution. Mainly because it is delivered in FLAC container and lossless encoders like FLAC cannot compress noise-like data that MQA encoded part is. Just like ZIP is bad at compressing random data (so you always first compress and then encrypt, not vice versa).

And network bandwidth for audio is non-issue anyway.

But this topic has been discussed to death already long time ago, so no reason to go through everything all over again.

Hydralesia · May 16, 2020

13 hours ago, Miska said:

Lossy codec with DRM style features (full resolution decoding is strictly controlled)

Main selling point I guess to record labels.

Yes, I just don't get why they fell for it instead of demanding something more open.

13 hours ago, Miska said:

But this topic has been discussed to death already long time ago, so no reason to go through everything all over again.

Sorry, I was just writing where my thoughts went.

Nevertheless, I just hope that some day some clever girls and boys reverse engineer that unfolding stuff and just publish it.

lucretius · May 16, 2020

20 hours ago, Miska said:

Main selling point I guess to record labels.

I believe the labels can embed their signature in the file. I suppose then they could check up on streaming services to see that they are using only files provided directly by the labels (i.e. revenue control).

lucretius · May 16, 2020

7 hours ago, Hydralesia said:

Nevertheless, I just hope that some day some clever girls and boys reverse engineer that unfolding stuff and just publish it.

Since it's a lossy format, you cannot go from the MQA format back to the original source file.

bhobba · November 7, 2021

On 5/17/2020 at 5:51 AM, lucretius said:

Since it's a lossy format, you cannot go from the MQA format back to the original source file.

No. But the issue is it audibly the same. The downsampling is pretty neat, but as this thread showed other upsampling filters than what MQA uses can do a good job of producing a good sounding result. I like MQA - to me it sounds a bit leaner - like a layer of grit has been removed. It may be the result of the upsampling filter removing the post ringing - I do not know. I use an M-Scaler to upscale it and it sounds good to me. Shannon's sampling theorem says you get an exact reconstrun of a bandlimited signal with a sinc filter so I don't get this minimum phase filter stuff.

Thanks

Bill

Miska · November 8, 2021

On 11/7/2021 at 8:46 AM, bhobba said:

No. But the issue is it audibly the same. The downsampling is pretty neat, but as this thread showed other upsampling filters than what MQA uses can do a good job of producing a good sounding result. I like MQA - to me it sounds a bit leaner - like a layer of grit has been removed. It may be the result of the upsampling filter removing the post ringing - I do not know. I use an M-Scaler to upscale it and it sounds good to me. Shannon's sampling theorem says you get an exact reconstrun of a bandlimited signal with a sinc filter so I don't get this minimum phase filter stuff.

With MScaler you get maximum pre-ringing and post-ringing. About one second long... Total opposite of what MQA is after, which is to use as few taps as possible for the filters. Their downsampling result comes with some aliasing and such due to use of extremely short leaky filters.

Upsampling MQA files to original resolution with sox will sound like the original resolution

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Create an account or sign in to comment

Create an account

Sign in