'FeralA' decoder -- free-to-use

KSTR · April 11, 2021

Hi,

I did not follow the whole thread and do not know how many people have noticed what I have noticed on the demo tracks provided as well as on all decodings I ran myself on various music. I saw at least some have while skimming through the thread, though.

Besides the general core function (downward expansion) which works well, there is a very strong overall EQ curve on the resulting output data which in almost all cases spoiled the tonal balance for me big time. Tubby lower mids and harsh treble.

I have not evaluated the various EQ command-line switches in my trials, I just the defaults.

Being an accomplished engineer I tried to find out what going on and this is what I found on basically all examples I've tried (about 20 so far) and with the two decoder versions I've tested myself (V2.2.3E and V2.2.4E):

https://www.audiosciencereview.com/forum/index.php?attachments/dysondecoder-fr-mag-phase-png.122626/

A dip/peak valley (blue curve) spanning more than 12dB(!) in the range from 50Hz to 20kHz (there might be small differences between versions but you'll get the picture). Ignore the absolute levels, just look at the span.

Note those kinks at exactly 3kHz and 9kHz and the irregularity at ~1.5kHz which don't have corresponding wiggles in the phase (red) which means this overall EQ is not fully minimum phase (not a big deal, but still an interesting detail).

I've had an encounter with Mr.Dyson before and therefore will not engage again, everything has been said.

I just wanted to share that bit of information, backed up with data (which was created properly and competently) which is IMHO elementary to understand why the decoder sounds the way it sounds, at least this is the dominating factor.

I've also created a compensating EQ that restores the original tonal balance which I could share if anyone is interested (or just follow the link and find it there).

Once this is applied, it's much easier to hear (and judge) the dynamic processing that takes place, being the only variable left (a key requirement for fair comparisons in any field of engineering/science). One can still apply any needed (or preferred) EQ after (or before) decoding to get the best results.

On 4/10/2021 at 1:01 AM, John Dyson said:

2) Source code.

It would be a pity if this valuable project were lost.

KSTR · April 12, 2021

My FR measurement is correct and fully applicable to the decoder's function and it's easily proven. The decoder does nothing at high signal levels (say, at least down to -10dB), this has been pointed out many time by the designer. When the log sweep, MLS noise or other content used to obtain the transfer function is playing above the "action threshold" it is fully resembling any static EQ applied. I've used all three methods and the results were always identical. Only when the content falls below the action threshold (actually there's a multitude of threshholds because of the multi-band and multi-layer nature), the additional(!) dynamic EQ effects Mr.Dyson explained are popping up and spoil the accuracy of the FR.

The proof that the FR is correct: when I precisily de-embedded that FR so I could successfully subtract the original and processed file, a Null was obtained down at -40dB thereabouts... which exactly is what's expected. The difference showed what the encoder actually does in great detail at those levels and below (see ASR thread, including a listening example, I won't repeat everything here). Had I erred on the total FR the subtraction would have completely failed on the large signal sections, no way to get a 40dB Null (content is the same down to 1% of full-scale level).

It all started because in listening trials I immediately noticed the skewed tonal balance on all the test snippets provided which hugely dominated all the subtle dynamic detail changes. Same when trying to decode my own source tracks. Proper analysis then showed why. And I've not been the first one to find this and point it out.

So that overall static EQ is real, there cannot be a single microsecond of doubt about that. I'm not saying that no EQ whatsoever should be placed to obtain the best results (which are strongly subjective anway), what I'm saying is that this gross tonal change of more than +-6dB variation accross the audio band impressed on each and every decoding is 100% sure *not* what the original master tape (without any DolbyA compression to pimp the CD versions) sounded like.

We can only speculate about Mr.Dysons rationale for this EQ, as he never explained why so much of static EQ is deemed appropriate to "restore" the original, Even comparision with vinyl would not warrant that much of change, plus viny masters also processed to unknown amounts, both static (EQ) and dynamic stuff (the last in chain being the cutting head vertical limiter).

KSTR · April 12, 2021

EDIT: I see my original post has been removed.... this one will share that fate I guess. I'm out of here.

@Moderator: please delete my account while you're at it.

KSTR · April 13, 2021

OK, I see my previous posts have been restored, at least the textual section. Thanks for that to all who supported this.

I've now made a series of measurements to show the level-dependent characteristics of the decoder (V2.2.4E). Test stimulus was (periodic) Pink Noise at various levels, starting at -3dB(rms) down to -73dB(rms) in 10dB steps. Pink noise does resemble typical music signal quite well, notably very close to reverb tails and such. It is static, though, it does not have dynamic changes so temporal effects (attack and decay times of the expander) will not show up -- time domain effect example will be shown in the last paragraph.

Command line (in a batch file) to create the decoder outputs:

da-win.exe --info=11 --fz=opt --fw=classical --fa --input="%1" --outgain=0 --floatout --overwrite --output="%2"

First, a set of curves that represent the output signal vs input signal over the various levels:

Since we are interested in the ouput vs input characteristic allowing a direct reading, the input has to be flat. This was achieved by using 1/48th octave bins in the spectum instead of the direct FFT bins (which are spaced evenly at x.yHz intervals) so that the pink spectrum (with its -10dB/decade slope) is rendered flat, the corresponding trace is labelled "Input".

We can see that the first three levels (at -10dB, -20dB and -30dB relative to the -3dB(rms) input) have the known shape (spanning some 12dB of level change and with those kinks at 3kHz and 9kHz mentioned previously), let's call it the "JD house curve", and they are evenly spaced 10dB apart (the dB levels in the legend are taken at 300Hz and we can see the corresponding -3.9dB, -13.9dB and -23.9dB level point sequence). This means there is basically no action from the decoder (other than the house curve).

At -30dB and below, additional effects are introduced: the shape vs. frequency changes slightly, notably at the low bass and high treble, and the curves are now spaced more than 10dB apart. This means now there is downward expansion and it is frequency dependent.

A better visual representation of that frequency-dependent downward expansion is achieved when we plot the gain reduction vs frequency over the set of input levels:

This plot is "normalized" to the output signal of the -3dB(rms) input signal so that the constant house EQ (as measured at higher levels) is factored out, so we can see the actual gain reduction. In other words, it is normalized to the "IN: 0dB" trace of the previous plot.

The dB values in the legend show the gain reduction at 1kHz.

The first two lines, for input levels down 10dB or 20dB repectively, again show (and must show, it's the same data as before) that there is no gain reduction at all -- maybe a hint (0.2dB) of expansion for the dark green -20dB curve below 100Hz.

At -30dB, the very low bass is reduced by 4dB and interestingly there is a slight gain increase at 100Hz.

At the subsequent lower input levels, we see more and more gain reduction the lower the signal is (reaching 50dB of gain reduction at 1kHz for a -70dB input) and also that the overal curve shape changes, showing the multi-band nature of the expander with different expansion profiles in the individual bands.

This relates very nicely to what happens to a slowly decaying reverb tail, it is dying overproportionally in level vs. the input and even more so at the high treble. Likewise, a constant pink noise floor in the recording of -70dB is reduced by at least 50dB making it totally disappear. This all happens multi-band, that means for example a high signal level in the bass band does not keep the decoder from expanding and reducing noise elsewhere according to the signal levels in these other bands.

Finally, a short look at the temporal behavior, showing the settling to the new charactistic when dropping a pink noise signal from -23dB(rms, input) down to -63dB in a step change:

Here we can see that it takes roughly 250ms to settle to the new output for this example, basically morphing from the -20dB characteristic to the -60dB characteristic shown above.

I hope these visual representations help to better understand the basic function of the decoder. While Mr.Dysons textual explanations are very elaborate and extremly detailed I feel that the overall picture of the decoder's working is easily lost in all that details.

KSTR · April 14, 2021

5 hours ago, John Dyson said:

I mentioned this because a lot of people DO want a stronger midrange -- I just cannot do it without post-decoding EQ.

I think a "--no-global-eq" switch would be highly appreciated by most anyone here, the output rendered flat for higher "pass-through" input levels. As applying EQ during playback or after rendering is so trivial for users these days, let them decide if and how much global EQ they want. Also, it is impossible to judge if a decoding sound better than the original because of that tonal skewing it never does, by first principles. The fine grain dynamic stuff is completely buried in the gross EQ change, perception-wise.

I'm working on a precision convolution kernel to undo the EQ (with all its effects, notably those non-minimum kinks) but as you know, inversion of a transfer function is very tricky when the only data is empirical... therefore, if you see a way to remove the EQ (even to first order only, with a simple minphase post-correction with some IIRs) please do so.

KSTR · April 14, 2021

btw, just found out that even "--fw=classical" or "--stw=0" still applies about -20dB (10%) of out-phase-crossfeed (giving noticable widening), is this intentional? Zero should be zero in my book.

KSTR · April 14, 2021

55 minutes ago, John Dyson said:

Without --fw=classical, the decoding is Mid/2*Side, while with --fw=classical, the decoding is Mid/Side.

I've out found out that --wof=, --stw=, --fw= have to placed after any --fa option, otherwise the setting is overriden by --fa.

I was not aware that the processing of the cmd params is strictly sequential.

Now it works as expected/described.

KSTR · April 14, 2021

4 hours ago, John Dyson said:

I really want to make sure that you know that I feel sorry for my irritated response to you, SO I DO APOLOGIZE.

Dear John,

Apology accepted ;-)

In these crazy pandemic times it's so easy to get upset, I've noticed myself that I'm not as cool and polite in online comversations as I used to be, pre-pandemic.

Best Regards and take care,

Klaus

KSTR · April 15, 2021

The original master is from 1952, no DolbyA ;-)

On the CD releases, they may or may not have used DolbyA to pimp it, whatever n-th generation copy of the master they had as source. The source could as well have been a professional vinyl rip with all the bells and whistles.

What we know:

- John's raw version has been resampled to 48kHz (why?)

- decoded version(s) (which would have been 88.2kHz) have been resampled as well to 48kHz (why?)

- decoded version(s) have the JD house EQ as ususal, making un-skewed comparisons very hard (IHMO)

To check out which CD version John used (unless he's telling us) would be to use DeltaWave to check against available CD versions. DW takes care of the resampling back to 44.1, and any larger differences in the mastering would quickly pop up (setting filters at 20kHz to keep the resampling stuff out of the picture would certainly be needed).

KSTR · April 15, 2021

1 minute ago, John Dyson said:

The items above might take several hours to review and fix. Therefore, the release isn't planned for +5Hrs, but instead +17Hrs.

Take your time, John.

I think many of us would be perfectly happy with weekly updates. This gives you the headroom to check things out, let impressions settle (and collect impressions from others), find early bugs sneaking in and of course start working on side topics like the --no-global-eg switch ;-) Plus the time to update the docs. Take a break when needed.

KSTR · April 15, 2021

15 minutes ago, PeterSt said:

[...]yesterday I tried to get some sense out of "original" ABBA albums. Well, they are terrible.[...]

Assuming you're an audiophile (in the non-derogative meaning of the term), you're first one I've heard of who actually owns and listens to stuff like ABBA, haha!

ABBA is great to teach people about excellent song-writing and clever arrangement of pop songs for the masses, but listening to this for recreational purposes on a HiFi rig? Of course everybody is entitled to like what they like...

KSTR · April 16, 2021

1 hour ago, John Dyson said:

* with a spec or a schematic, even just an accurate circuit diagram without parts values, this project would result in functioning software in about 2 months.

https://audio-circuit.dk/wp-content/uploads/simple-file-list/d-other/Dolby-361-sm.pdf

Unreadable part values, though. But good general information.

EDIT: A way better readable schematic is located close to the end of the manual.

KSTR · April 16, 2021

With this manual, the path is open to do a Spice sim with LTspice. LTspice can use .WAV files for time-domain input and output, so actual music files could be rendered, and then decoded and see how close one gets to the original.

Only problem is those JFET models and trimming their parameters so they match what Dolby had selected them for, though some educated guesses sure could be arrived at.

Even better scan here: https://www.richardhess.com/manuals/Dolby/Dolby CAT 22 Schematic scan 01.pdf

KSTR · April 17, 2021

6 hours ago, PeterSt said:

Do you really expect anyone to read this all ?

Yes, this is getting out of control.

@John Dyson, If I were in your position this would be my ToDo list (my 2ct.):

zip the source tree and important toolchain settings etc and any relevant docs and stash it away in a safe place (real or in a private net cloud) where someone, say your best friend or family, has access to. Just in case something fatal happens.
Make a public note so people know and then take a real break from the project. Something like two to four weeks, minimum. Don't read/write forums etc, shut down all non-essential computers etc etc. Try to enjoy your time concentrating on other beautiful things of life.
When you revisit the project, make a sane first things first prioritisation. From my point of view, wrt to the code itself, the top topic is the house EQ. This EQ spoils everything, really, I mean it. Neither you nor anybody else will ever be able to do any meaningful comparisons as long as the main and very dominant effect is that EQ, spanning some 12(!!!) dB (and even if it were only +-1dB that still is too much). Comparing/judging the low-level dynamics requires same large-signal frequency response to +-0.1dB and +-0.1dB of level matching, no way around it. We know this EQ is not helping, you know it is not helping. It is completely superfluous. If some post-EQ is deemed necessary to polish up the result (which may or may not be the case), this can always be done in a extra pass, with different means.

KSTR · April 17, 2021

1 hour ago, pkane2001 said:

Klaus, not sure if you've tried it, but this is what I do to de-EQ John's files. I simply use DeltaWave to match the RAW and Decoded file, and use non-linear level EQ correction only (uncheck phase). The result is to undo the large-level EQ in processed files. You can then play or export the corrected files.

Hi Paul,

I thought about it but haven't tried. I've used my own de-embedding, which, as you might know, suffers in that it applies the correcting IR to the original, rather than than applying 1/IR to the decoding. But, it compensates 100% all mag and phase errors to full precision (good enough to do a "simple load" only in DW and get the best null possible).

The overall EQ is still present but now it's the same for both files. And then I would apply a simple min-phase correction, a curve fit EQ filter parameter set obtained from REW and transformed to an IR in RePhase. For quick checks, I only apply the latter... which leaves 1dB of wiggle room in the difference and more importantly, does not correct those linear phase level jumps at 3kHz an 9kHz.

With level EQ only applied by DW, that would mean applying a linear-phase correction(?), which isn't fully correct. Most of John's EQ is minimum phase, except for those mentioned step changes. Probably not a big deal as the overall curvature is low-Q, reducing chances of nasty pre-ringing.

KSTR · April 17, 2021

59 minutes ago, John Dyson said:

Have you listened to the latest demos? There isn't much difference that I'd call "EQ". There never was any intentional EQ

Not yet, John, but will do (and analyzed them, or the new binary directly).

At the moment there IS difference that is plain EQ, as explained, anything more than +-0.1dB is counterproductive for the task. It doesn't matter if the EQ was introduced intentionally or by accident.

KSTR · April 17, 2021

We do not have general access to pre-encoded masters (other than the occasional old vinyl edition). We only have CD's where DolbyA units (sometimes even modifed ones) have been intentionally mis-used artistically to "improve" the final sound. Most certainly there was additional EQ and sum compressors etc applied before the master for a CD was send out. All of this is a great unknown.

And therefore undoing the specific part of low-level dynamic compression of the DolbyA is obviously close to impossible with a one-size-fits-all approach. Exactly as you say, way too many unknowns. For example, you cannot know the pre-/de-emphasis required to undo a potential final mastering EQ and hence the levels for DolbyA de-embedding will be off anyway even if the general level match were close otherwise.

FWIW, I happened to find the Simon&Garfunkel LP in my wife's collection and sorry, no way your decoded version (from two weeks back or whatever) is any close tonally to that vinyl.

Therefore, best leave the post EQ alone. That can be applied seperately if needed (which again is mostly personal preference).

KSTR · April 18, 2021

V2.2.6C, Anne Murray "Danny's Song" (a critical piece as the levels are quite low, not to many sources playing at once, and the general quality is excellent), compare RAW vs. DEC-V2.2.6C vs. DEC-2.2.6C de-EQ'd with DeltaWave.

RAW vs DEC-V2.2.6C : EQ dominates (though the dynamic effects can be heard if one tries to ignore the EQ mentally as much as possible). It is a bit better balanced than earlier versions, less 300Hz honk and less 7kHz screech. But still completely unbearable. Again, I don't believe the original (whatever that was) sounded that way.

RAW vs DEC-V2.2.6C de-EQ'd : Failed for me, the decoder turned the steelstring guitar into a nylon-string, very muffled. Plate reverb on vocals... the same, killed decay and killed HF. Once levels get louder when the whole band starts playing, things get better as the expander is working less.

Comparison of large-signal frequency responses for V2.2.4E vs 2.2.6C:

It's gotten better with 2.2.6C in relative terms. In absolute terms, still completely off.

And no attempt seen to fix those strange jumps at 3kHz and 9kHz, where I'm still waiting for an explanation from John (also for the less severe roughness around 1.3kHz). Those jumps are linear phase and by this must produce pre-ringing when hit by a transient.

And, surprise, surprise, that is exactly what is seen in the large-signal impulse response:

The snippet highlighted is 30samples long and a part of the pre-ringing building up before the main pulse (it also produces post-ringing but that is audibly much more benign), at 88.2kHz sample rate that equates to 3kHz. The 9kHz overlay can readily be seen thanks to the integer ratio of the frequencies.

It is not clear if that pre-ringing is any audible, though. The effect is almost impossible to be judged in isolation with so much else going on, to get there a more thorough investigation and analysis would be needed.

KSTR · April 19, 2021

2 hours ago, John Dyson said:

The decoder doesn't care at all about the signal output level because the decoder output is all FP

Yes, that's one of really good things about it. I've hit the decoder with +40dBFS Diracs (to keep it from expanding) etc with no apparent issues.

SoX is limited as it's 32bit integer internally so while it can take FP input and output it still clips internally when "overdriven".

'FeralA' decoder -- free-to-use

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Create an account or sign in to comment

Create an account

Sign in