Jump to content
IGNORED

'FeralA' decoder -- free-to-use


Recommended Posts

6 hours ago, John Dyson said:

Whatever is happening, lets try to keep this discussion going until it is resolved --

 

 

4 hours ago, KSTR said:

SoX is limited as it's 32bit integer internally so while it can take FP input and output it still clips internally when "overdriven".

 

 

On 4/17/2021 at 8:11 PM, PeterSt said:

But if you ask me things are digitally clipping (+32768 becoming -32767 instead of +32767). Audible on track 03 and 07 of Crime, and track 07 and 08 on Crisis.

 

Probably not related, but still noticing ...

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment
20 hours ago, PeterSt said:

One thing why others may not be bothered: I have my volume normalized by standard, and thus any lowest level "congestion" works out to an enormous boost in level.

 

@John Dyson, I hope it is clear that this is the same as what you refer as ReplayGain ?

(I just don't call it like that because in normal circumstances no gain as such is used - only attenuation)

 

7 hours ago, John Dyson said:

The difference on the .wav file is (31.41 - 8.24) is pretty much the same as (23.66 - 0.49).

 

So why is this the 24dB of difference that I see and told about ??

(the 24 dB is on estimation because it is my Volume Normalization that makes that visible and it varies per track or album**).

 

**): With the snippets I need to do it per track while with full albums I do it per album.

 

To be hopefully clear: If I would not apply my Normalization, then the snippets would be "without sound" with the same -dBFS output volume. With pre-amp one would normally turn up the volume (knob). Not so when you don't use a pre-amp. Then you'd use -dBFS values which are readily visible.

 

image.png.fbaf46ff8d92a3a7cefe11cb0f7e8cce.png

 

This is totally normal for any good album without too much compression. Notice that the -31.5 is some norm I determined as a randomly compressed album. All on the right side which is less attenuated (like -21dBFS is that) is less compressed. The higher numbers (lower minus values) on the right side than the left side are more compressed and not - or at least less on par (to my standards).

 

Here you see Crime and how it Normalizes as the Album :

 

image.png.ceeb5971175576a1295fb82c08a5ce1d.png

 

 

Here you see the first track and how it Normalizes as the Track - coincidentally the same as the Album :

 

image.png.cd3ac36889b7a2320e1b8d3cf3222a62.png

(notie the T now being active in the top-left corner)

 

 

Here you see the snippet of that track and how it Normalizes - mind you, this is PLUS 6 now :

 

image.thumb.png.3df2cedce249ece11f9b57edcac9eee7.png

 

(and the PLUS indeed would be gain - I allow a maximum of 6dB of sheer boost)

 

FYI If we don't allow the gain, I'd need to attenuate a bit more and this comes from it :

 

image.thumb.png.a7f5354d525e280a714d63fd926f8ea1.png

 

Lastly, if I make it the same output level (right hand side) as the 88.2, then this shows :

 

image.thumb.png.b276c6e1a322d1eb62af7716baf1ed8b.png

 

(this is attenuated 1 dB more because of the step size of the volume at those levels (see the -60dB by now)

 

For your convenience the "original" from your decode to compare repeated (from above):

 

image.png.7780f97d90947830e7eac89932f2ee37.png

 

So in this case this is 28.5dB difference. But notice: The snippet is heavy on silence because of the overweight of the lead-in (the schoolyard sounds) and is faded at the end. So logically it is softer overall.

 

Anyway the original message : 

a. if I don't apply the normalization then nothing is audible (say 24dB too soft);

b. if I do apply the normalization then the windows go out.

 

Ad b.: As discussed before (also by you, John), this is because of way low compression; see the first paragraph of this post.

But this is what I always thought, because the 88.2 shows very normal behavior on the compression.

 

(I am afraid this all is not very clear (English))

 

Interesting would be to now listen to the "full demos" - the ones I judged and described earlier on. Will do that in today's listening session (12 hours+ from now).

 

 

 

 

 

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment
On 4/20/2021 at 12:04 AM, PeterSt said:

 

@John Dyson, I hope it is clear that this is the same as what you refer as ReplayGain ?

(I just don't call it like that because in normal circumstances no gain as such is used - only attenuation)

 

 

So why is this the 24dB of difference that I see and told about ??

(the 24 dB is on estimation because it is my Volume Normalization that makes that visible and it varies per track or album**).

 

**): With the snippets I need to do it per track while with full albums I do it per album.

 

To be hopefully clear: If I would not apply my Normalization, then the snippets would be "without sound" with the same -dBFS output volume. With pre-amp one would normally turn up the volume (knob). Not so when you don't use a pre-amp. Then you'd use -dBFS values which are readily visible.

 

image.png.fbaf46ff8d92a3a7cefe11cb0f7e8cce.png

 

This is totally normal for any good album without too much compression. Notice that the -31.5 is some norm I determined as a randomly compressed album. All on the right side which is less attenuated (like -21dBFS is that) is less compressed. The higher numbers (lower minus values) on the right side than the left side are more compressed and not - or at least less on par (to my standards).

 

Here you see Crime and how it Normalizes as the Album :

 

image.png.ceeb5971175576a1295fb82c08a5ce1d.png

 

 

Here you see the first track and how it Normalizes as the Track - coincidentally the same as the Album :

 

image.png.cd3ac36889b7a2320e1b8d3cf3222a62.png

(notie the T now being active in the top-left corner)

 

 

Here you see the snippet of that track and how it Normalizes - mind you, this is PLUS 6 now :

 

image.thumb.png.3df2cedce249ece11f9b57edcac9eee7.png

 

(and the PLUS indeed would be gain - I allow a maximum of 6dB of sheer boost)

 

FYI If we don't allow the gain, I'd need to attenuate a bit more and this comes from it :

 

image.thumb.png.a7f5354d525e280a714d63fd926f8ea1.png

 

Lastly, if I make it the same output level (right hand side) as the 88.2, then this shows :

 

image.thumb.png.b276c6e1a322d1eb62af7716baf1ed8b.png

 

(this is attenuated 1 dB more because of the step size of the volume at those levels (see the -60dB by now)

 

For your convenience the "original" from your decode to compare repeated (from above):

 

image.png.7780f97d90947830e7eac89932f2ee37.png

 

So in this case this is 28.5dB difference. But notice: The snippet is heavy on silence because of the overweight of the lead-in (the schoolyard sounds) and is faded at the end. So logically it is softer overall.

 

Anyway the original message : 

a. if I don't apply the normalization then nothing is audible (say 24dB too soft);

b. if I do apply the normalization then the windows go out.

 

Ad b.: As discussed before (also by you, John), this is because of way low compression; see the first paragraph of this post.

But this is what I always thought, because the 88.2 shows very normal behavior on the compression.

 

(I am afraid this all is not very clear (English))

 

Interesting would be to now listen to the "full demos" - the ones I judged and described earlier on. Will do that in today's listening session (12 hours+ from now).

 

 

 

 

 

 

I am not familiar with the SW that you are using, and I'll come back and read this a few times so that I can comprehend your message (all of the comprehension problem is mine.)   There appears to be something incompatible going on.   My SW is 'bare bones' other than the decoder itself, which is actually fairly active about metadata, including updating BEXT, etc. SoX even likes to destroy metadata.

 

This really seems like some kind of metadata or secret flag in the files -- or perhaps a behavior resulting from an eccentric from of the materials that I create.  I am asking this question

almost totally blind because text is very difficult for me (I can write it much more easily than read it) -- is there the possibility of a data base lookup that expects a certain behavior of a snippet?   What I am trying to say, if I produce snippets without any hints about the recording, does it still work the same?   I noticed that software on MS windows sometimes does things automatically that an be confusing in a research type situation.  I am NOT making any assertions at this point, but this is only a guess, NOT EVEN A HYPOTHESIS -- just a possibility?

 

There IS something strange going on -- for QC purposes, I do listen to at least a few recordings in each block.  (For example, a few minutes ago, Ifound a bug in the 'Even In the Quietest Moments' test recording that we are working on right now.   I DO listen, and levels are very important to me.   Since I normalize based on album basis, in the case of full songs, sometimes it wil be -6dB instead of 0dB, but I do care about signal levels.

 

This really does seem to be a miscommuncation between the test examples and the program that you use.  I'll review the 'bits' in the files, watching both the 'bits' and 'metadata' more closely.   This is an 'interesting' problem for sure.

 

 

Link to comment

These samples have not been reviewed yet, but the changes in results have come from feedback and some rather embarassing

revelations about mistakes in the architecture.   This set of decoder versions are still a little too 'green' to offer to use.  Since there have been

no internal reviews yet, there might be significant changes coming.   Feedback is welcome -- I am most interested in the amount of HF,

are the vocals clear?  Is there too bass?  Is the bass muddy or is the bass insufficient.

 

Relative to previous versions, is the midrange more 'complete?'  Should there be even more midrange?  (This is a hard one to determine -- midrange

comes along with high-end and bass, so the delineation can be tricky to do.)

 

When doing the comparisons, do NOT expect that every example will sound similar between RAW and DEC.   I fully expect that in some cases that RAW might

sound better, but on the other hand, when there is hiss, DEC usually has much much less.   Also, the DEC version has less of the 'telephone' sound, but

sometimes that 'telephone sound' makes vocals sound more clear.   There are good things, sometimes profoundly good things about the decoded versions,

sometimes there are unexpected differences also -- which might even disappoint those who really expect a certain sound.  This is not an ALL or NOTHING

thing.   The decoder is intended to provide an option.

 

Here we go!!!

https://www.dropbox.com/sh/i6jccfopoi93s05/AAAZYvdR5co3-d1OM7v0BxWja?dl=0

 

Link to comment
2 hours ago, John Dyson said:

 

I am not familiar with the SW that you are using, and I'll come back and read this a few times so that I can comprehend your message (all of the comprehension problem is mine.)   There appears to be something incompatible going on.   My SW is 'bare bones' other than the decoder itself, which is actually fairly active about metadata, including updating BEXT, etc. SoX even likes to destroy metadata.

 

This really seems like some kind of metadata or secret flag in the files -- or perhaps a behavior resulting from an eccentric from of the materials that I create.  I am asking this question

almost totally blind because text is very difficult for me (I can write it much more easily than read it) -- is there the possibility of a data base lookup that expects a certain behavior of a snippet?   What I am trying to say, if I produce snippets without any hints about the recording, does it still work the same?   I noticed that software on MS windows sometimes does things automatically that an be confusing in a research type situation.  I am NOT making any assertions at this point, but this is only a guess, NOT EVEN A HYPOTHESIS -- just a possibility?

 

There IS something strange going on -- for QC purposes, I do listen to at least a few recordings in each block.  (For example, a few minutes ago, Ifound a bug in the 'Even In the Quietest Moments' test recording that we are working on right now.   I DO listen, and levels are very important to me.   Since I normalize based on album basis, in the case of full songs, sometimes it wil be -6dB instead of 0dB, but I do care about signal levels.

 

This really does seem to be a miscommuncation between the test examples and the program that you use.  I'll review the 'bits' in the files, watching both the 'bits' and 'metadata' more closely.   This is an 'interesting' problem for sure.

 

 

More and more that I think about this, it really seems like 'expected signal levels' per a database somewhere and it clashes with the normalization that I do.  Not only that, decoded material can have a very different signal level than undecoded.   Also, I believe that @PeterSt did mention different file formats.  If that truly ends up being the problem, I'll have to be more careful to use consistent file formats.   I never thought that it made a difference, but I guess it just might...

 

Here are my standards (sometimes not fully acheiving  them):

1)  Both versions of snippets are normalized to the same level.   The snippets are normally normalized on the snippet basis and not the entire recording.   The reason for normalizing on the snippet basis is that sometimes a recording has stronger dynamics after being decoded, and the decoded snippet will be pushed down in level.  The method used is an attempt at fairness when doing comparisons.

2) When doing album decodes, they are done in-place, and whatever level that each song comes out with is left as is.   Then, after the entire album is decoded, then the maximum level of all songs are determined, and everything is normalized based on that.   Since this normalization is done manually, after manually measuring the levels,  the target maximum peak for the entire album is approximate, and usually between -0.10dB and as low as -1.0, but seldom ends up being as low as -1dB  maximumt.   My target is  -0.80dB, usually ends up being a little higher like -0.50dB.

3) When a single song is decoded, most often I'll automatically normalize it to -0.80dB.   On request, I'll certainly leave it alone, but when decoding without modifying levels, FA materials usually end up being between -6/-7dB to -2/-3dB.    Very seldom they approach 0dB and very very seldom below -7dB after decoding.

 

As an experiment, I believe that we should obscure the metadata on some test recordings, but that just might not work because tere are methods for getting a signature of a recording by processing the beginning, then can be found on a database.

 

If my guess about the problem is getting close to theproblem, we might have to look into explicitly disabling the recogntion of recordings?   This might not be the problem -- but obviously something odd is going on.

 

I promise you -- I almost never create something at -20dB or worse (unless test material), and certainly cannot do above 0dB unless handing off FP files.

Maybe, I'll check into the user manuals on the tool that seems incompatible with files.

This is a very interesting problem, because I am so very careful to produce REASONABLE (however imperfect) levels.

I am actually hoping that there might be a metadata flag somewhere that suggests disabling any magic processing.

 

BOTTOM LINE -- I truly don't know yet.

 

 

Link to comment

You know how difficult this project has been, however probably more difficult than it should have been.   This note talks about resolving one of the persistent problems, and how much easier the testing and less variability there will be in the near future.

GOOD GOOD THINGS ARE HAPPENING (shouting with glee.)

 

--------------------------------------

Of course you know about the hearing problems, etc.   But there is another technical matter that I am resolving (with the help of reviewers):

the software architecture has been workable, but hasn't been perfect, therefore the program previously needed too many adjustments (steps/EQ) to make the sound correct.

 

One historical example change -- done a few weeks ago...   Remember that long LF EQ problem that I never did make correct?   There was a sequence of about 10-15 LF Shelving filters

to bring about 30dB of excess LF gain back to normal.   The result could become plausilble, but was very sensitive to changes.

 

Instead of they  10-15 shelving filters at the end, I found that the mass of Eq filters can be replaced by two, very simple EQ between each layer.   The EQ is 250Hz, -3dB and 75Hz, -3dB.   THATS IT!!!   All of that adjustment, trying to find the right answer, listening judgement, etc -- NOW GONE!!!.   For ever and ever, the LF EQ is now close to correct no matter what changes are made.  As a result of the work a few weeks ago for the LF, now there are 2 simple EQ per layer, instead of a complex morass at the end -- the simple EQ is also more correct.

 

There was another similar problem like that with HF.   Previously there was 3 HF rolloffs starting at 9kHz and stopping at 21kHz.  The rolloffs did result in pretty good sound, but wow -- previously, there needed to be a lot of HF rolloff (a total of -21dB when you count all three EQ.) As of right now, the correct HF rolloff appears to be 3 each of 18kHz at -3dB between each layer.  Thats-IT!!!  Simple as can be.   Things seem suspcious when messing with 15-30dB of signal, I do feel that 1.5,3,6dB, etc -- those are NICE numbers.  30dB is NOT a nice number :-).

 

The actual primary benefit to these EQs being more correct, it also makes the signal dynamics more correct.   This is because each layer now has the correct signal sent to it, and the dynamics will be more precisely correct.

 

=======================================

In the last few days, I have done some aborted test runs, this is because of the rather large changes and my recently being very mistake prone.

 

The eventual goal -- remove almost all adjustments, and if things keep going the way that they are, the only real adjustment will be a slight difference in HF and LF.   When we make the code adjustments so minimal, then all of these subjective issues diminish in difficulty.

 

The investment over the last several days/weeks is towards a more robust program, simpler code and less likely that opinion about sound characteristics will have less negative impact.  We still need to listen and check out the results, but there will be less adjustment based on sound characteristcs.

 

=======================================

The delay in correcting the code isn't about being stupid or lazy it is just I didn't fully understand what the original designer was thinking & doing until the last few days.  NOW I truly understand!!!   It has taken a long time, and I guess that delay might show stupidity, but there are a lot of things going on!!!

 

Nice thing:  the EQ lists are now much shorter!!!

 

John

 

Link to comment
9 hours ago, John Dyson said:

when decoding without modifying levels, FA materials usually end up being between -6/-7dB to -2/-3dB.

 

I am not sure whether I already said similar, but we realize that 6dB is half of the total headroom, do we ? It would mean a max (digital) plus value of 8191 (minus the same). This can-not-work ...

 

THUS if you first for all your good reasons threw out half of the higher (!) resolution data in order to get the result, then down convert to 44.1 (or 48) you will have thrown out the data for real, never mind it is still 24 bits. The "horizontal" sampling rate can't catch that resolution (this is harder to explain for me, but with this hint you can work it out yourself while waiting on a decode ;-)).

 

Apart from throwing out resolution data, the sound will be grainy from it.

Notice that the original Crime of the Century suffers from the same problem in the base; It comes with 6dB too low level, and that can't be restored to anything normal - there will be no bass and way way too much dynamics (you'd expand the transient jumps without smoothing support).

 

Summarized, you can use 1,5 - 2dB for filter activity and leave it at that (expanding that to -0dBFS would already incur for inconsistencies), but you surely can not throw out half of the data and think it can be brought back by highering the level. OK, doing that from -6dBFS could at least be done consistently (multiply all by 2), but still half of the resolution would be lost.

--> This is not so when expanding to 24/88.2 first, do your operations there, and leave it at that. Normalizing level in there would also be fine. After that downsample to 24/44.1 by decent means ... also fine (but better not).

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment
13 hours ago, John Dyson said:

Here are my standards (sometimes not fully acheiving  them):

1)  Both versions of snippets are normalized to the same level.   The snippets are normally normalized on the snippet basis and not the entire recording.   The reason for normalizing on the snippet basis is that sometimes a recording has stronger dynamics after being decoded, and the decoded snippet will be pushed down in level.  The method used is an attempt at fairness when doing comparisons.

2) When doing album decodes, they are done in-place, and whatever level that each song comes out with is left as is.   Then, after the entire album is decoded, then the maximum level of all songs are determined, and everything is normalized based on that.   Since this normalization is done manually, after manually measuring the levels,  the target maximum peak for the entire album is approximate, and usually between -0.10dB and as low as -1.0, but seldom ends up being as low as -1dB  maximumt.   My target is  -0.80dB, usually ends up being a little higher like -0.50dB.

3) When a single song is decoded, most often I'll automatically normalize it to -0.80dB.   On request, I'll certainly leave it alone, but when decoding without modifying levels, FA materials usually end up being between -6/-7dB to -2/-3dB.    Very seldom they approach 0dB and very very seldom below -7dB after decoding.

 

In regard to normalisation, wouldn't it make sense to measure the Leq (Equivalent Continuous Sound Pressure Level) and then use that for level matching?

"Science draws the wave, poetry fills it with water" Teixeira de Pascoaes

 

HQPlayer Desktop / Mac mini → Intona 7054 → RME ADI-2 DAC FS (DSD256)

Link to comment

Have lots of good comments and private PMs in my mailbox right now, and will answer later today, but I wanted to inform about the status as early as possible today.

 

There is also a good reason for the jerky strange releases, etc...   There HAS been a lot of change and even confusion on my part.   This message below explains some of it (this is a cut 'n' paste to help me organize and control the blather).   BOTTOM LINE -- GOOD PROGRESS HAS BEEN MADE (capitals for emphasis, not 'yelling'.)

 

This several day  delay isn't because of problems making Supertramp sound right.   The delay has happened because there is a bit of a 'whack a mole' game, where all recordings have to sound correct, not just a few test recordings.

 

A lot of rework is being done so that the whack a mole game is less and less of a problem.  If all of the decoder internal interactions aren't perfect, then making Supertramp have enough bass can destroy the sound of other recordings (and vice-versa).   The dynamics must be correct, and any sloppiness results in sloppy sound.  Recent versions might have been better, but still too strong & a little 'fake sounding' bass on Supertramp.  However, some other recordings had too little bass, and other older recordings had too much.  No recordings sounded really correct. So -- the real solution isn't about making a perfect Supertramp, but studying a large array of different recordings.   Studying lots of recordings, while also studying the interactions takes lots of time and patience.

 

Recently, these inconsistencies caused by  dynamics problems have forced my focus into studying, perhaps reworking the architecture. This rework ended up helping me to find the more 'basic' architecture that was originally designed 40+yrs ago. As time has gone on, the program is less and less a forced-fit into the original design, starting to better  implement a closer natural reverse replica of the original encoding method.   The design is still complicated, in fact just as subtle, but also shows the need for less HW in the original encoder design.

 

On the current test code,  the bass seems to be more consistent across different dynamics and different frequency response balances.   The resulting internal EQ scheme appears to be a more 'natural', less 'forced' structure.   As mentioned above, it also implies less hardware -- that is a GOOD thing, because the original design couldn't waste hardware.

 

Perhaps there might still be a few eccentricities, but the goal about a consistency in the sound of the bass, and a nice, clean high end is much closer to being met than any previous released or demoed version.  At times, when my hearing is working, I  check back by running previous decoder versions, and there is generally profound improvement from previous/historical versions.  Any eccentricities, along with a week or two more patience from those who are actively trying to help -- the decoder will likely be completed.

 

Notice that 'the high end' (HF) was mentioned.   There are interacting factors between almost every aspect that are affected by the dynamics processing, and as the decoder architecture becomes more and more correct, then there are fewer adjustments for me to screw up.

 

Even though Supertramp is one, perhaps the most important group of test recordings, but also, believe it or not, ABBA has some characteristics that are good to test.  I am not claiming that the ABBA recordings are of the highest quality, but do things in their recordings that create 'tells' for testing.   In fact,  the very highest quality recordings, with the fewest mistakes might not always be the best test subject.   The 'super good' stuff is good for final testing, verification.   To make sure that th program is perfect, we almost need 'junk' recordings that test the limits of the program.

 

All of this said, there will always need to be help from those who can hear better than I can.   Now there are many fewer 'knobs' that need to be internally adjusted, the complexities of communicating with those who can really hear will be lessened..   There are fewer selections of EQ, or use -1.5 or -3.0 or whatever dBS.   The choices are fewer, and the sound of the decoder will be better.  Most of this simplification was done over the last several days, yet the sound is MORE consistent from recording to recording.

 

For success, we have needed both a more naturally designed decoder and people who have good hearing & listening skills.  Without specs, the quest for the proper design has been onerous, but also without being able to hear well (and not knowing it) has also been a lot of trouble.

 

WE have one good group of people who have faith, along with others who have been dabbling.   The code is close to being a winner, and I am hoping to do demos and a release tonight.  If todays test recordings running today pass my own subjective tests,   I'll provide both the public and private style examples for review, along with a copy of the decoder for the most intrepid individuals :-).

 

* When the results pass MY subjective tests, that only means that it is ready for those 'who can hear' to tell me how bad it really sounds :-).

 

 

 

 

Link to comment

The V2.2.8D decoder is available, along with snippets (same place as before):

https://www.dropbox.com/sh/i6jccfopoi93s05/AAAZYvdR5co3-d1OM7v0BxWja?dl=0

 

OPEN FOR CRITIQUE -- IDEAS TO IMPROVE?   Catching problems and describing them is so very welcome.

 

Please enjoy the comparisons - enjoy the decoder if you are intrepid enough to use it yet, the tech-speak below and excuse making simply says that there

is still another significant bug.   The decoder usage hasn't changed at all, even though the internals are much simplified in important respects.  (Still not trivial,

but simpler.)

 

I had lots of troubles with compatibility on one group of recordings (Supertramp), but otherwise these snippet results are pretty nice.   I do regret my continued

Supertramp problems -- I could just palm them off, most people might not notice them -- but I DO.  I cannot hear frequency balance very well, but I can

hear dynamics problems -- and my Supertramp results have one H*LL of a dynamics problem.


This last few days has been about fixing another case of overdesign, witih the ability to further simplify the decoder in some ways.   Alas, this left me with

one MAJOR final problem -- getting compatible results between Supertramp and the other recordings.   Most recordings sound 'okay', but Supertramp either

gets a 'metallic' sound or 'expansion surging', frustratingly cannot get a nice general sound.

 

This problem is about calibration level sequencing, and apparently, over all of these years, the decoder has NOT matched Supertramp recordings.  There are some other

eccentric recordings also, but the decoder has a built-in mechanism to deal with the other troubles.

Link to comment

This is for all of the private and public discussions and feedback.

A profoundly improved bass and midrange version is coming, very likely today.

A little more can be moderately easily added (perahps 1.5->2dB in the 50->100Hz range) beyond

the characteristics of the planned 'F' or later demos/examples & code release today, but

I'd rather the bass/midrange be a little constrained until more feedback.

 

As many/most of you know, I have a really difficult problem with bass and midrange.  It isn't just my hearing,

even though that is a major component, it is that I am bass-adverse.  I LOVE clarity, and unless bass/midrange

is done perfectly, the result can be muddy.  Also, even though yesterday I 'reached' to increase the midrange (while following

the rules), then I became accomodated.   This morning, I heard the effects of the accomdation and also received some

feedback.  I believe that an improvement is coming in about +4 Hrs or so.  (might be delayed as I am the family cook

for larger meals -- and making a really nice meal today.) Very important:  the 'rules' are being followed to keep chaos from

happening.

 

The wonderful thing with the basis of the new release coming in a few hours is that even with more bass and

midrange, there is NO muddiness.   In fact, the previous released version as it still is could support more bass

without muddiness, but my habit is naturally LESS midrange/LESS bass.

 

The new version has significantly more midrange and bass (perhaps about 3dB), but also the bass/midrange

is wider, so it reaches up higher, the bass/lower midrange in the 200->100Hz range is stronger by some amount,

and the lower bass is strong without being overwhelming.

 

There is a relatively well recieved, but also too little midrange/bass release of V2.2.6D-2 (-2 for the specific

version of the associated demos), and this new version has over V2.2.6D:

 

1) Just as much bass.

2) Less grainy & greater midrange.

3) Pristine/clear highs.

 

The less muddy midrange/bass is related to more gain control being active in that region, where the greater

gain control holds back the midrange/bass when it needs to.  That keeps the 'mud' around the various elements

of detail from overwhelming the higher midrange and high.  Even the 'Reason To Believe' from the Carpenters

doesn't have an overwhelming sound anymore.

 

There is still some room (not much) for increasing the bass & midrange.  Perhaps 1.5->2dB can be

added, but not much more without rework.

 

I'd prefer to keep the bass/midrange a little constrained, because that is what I remember from doing

recordings in the 'good old days', but I could be wrong and the 'processed' recordings might have

more bass.

 

Better/more bass/midrange coming later on today.

 

 

Link to comment

This is a response to someone who basically said that the previous release 'sucks' (paraphrased and correct.)   The new 'V2.2.8J' version

probably 'sucks' for the opposite reason -- too much bass.   This is being zeroed-in on.   This is the full status report with anything

personal removed:

 

*  ALL of the eqs below are achitecturally plausible as if I am reading the design engineers mind...   However, all I can do is to guess and try to do what

makes engineering sense.  This still leaves LOTS of leeway and possibility of error.   If my hearing was reliable, the result would have been perfect on the

2nd try.   The situation sucks.   If we had someone who could do this level of programming AND hear AND understand the kind of stuff going on in the

code, then this could be fixed VERY quickly.

 

Added more bass in the new V2.2.8J version -- too much I think.

The old version was too thin.   Trying to zero-in on the right value.

 

I understand about it not having enough bass in the recent version.

I tried to improve, but in the 'J' version, I can actually hear intermod from too much bass on the Carpenters.

 

Here is the problem -- my 'taste' is for less bass, lack of ability to hear it, and the basic way that 'bass' works has changed because of some realizations about architecture.   It is a matter of bisection, but also my  hearing/taste tends to cause a bias.  I did too much bass attenuation

in the pre 'J' version.  ('J' version is brand new/current per this posting, 'K' version is in process right now.)

 

The 'J' version has LOTS more bass...  It has LOTS more midrange. (TOO MUCH.)

 

Because of my 'bass' paranoia to over-compensate for 'J' (current), mistake in doing a 20Hz+3dB instead of 20Hz-3dB, and inability to hear it most of the time, in the 'J' version, the bass from 20-50Hz was 'boosted'.   There is nothing more evil sounding (to me) than 20Hz +3dB (too much, that is.)

 

The 'K' version or later (whichever I should upload) should have a happy medium, where the heavy, ugly excess 20Hz is gone, but the rest should stay the same.

 

I am pretty sure that 150Hz is flat, below is too much -- I had to move the 500Hz -3dB (it must be -3dB or all hell breaks loose) to 150Hz -3dB, and that brought up the midrange by a few dB.  But,  I have probably been diddling with the super lows too much.


Also, isntead of one batch of 80Hz EQ, I did the more architectually correct (WRT DolbyA) to do the level correction on each layer.  This 'level correction' is the 80Hz boost.

 

LOTS of changes have been made before 'J', but it was a learning experience.  'K' (which will probably be +18Hrs, but will try to do it before +12Hrs)  should be close to correct.   I would NOT have posted earlier versions except I couldn' thear it.  I wanted to post 'J' mostly to show that the program CAN do the bass, it is just htat I am normally conservative (probably too conservative.)

If you don't want a let down with too much bass, wait for at least +15hrs, but maybe +12Hrs if this new 'K' version being built right now ends up

being satisfactory.)

Link to comment
8 hours ago, jabbr said:

@John Dyson the link to the software at the top of this thread is broken ... says "removed"

(Personal matter delaying a good response -- will post an update to the link.)

 

PS:  I don't think that some recrodings will ever be best once decoded, but as I always say, sometimes the original, non-decoded version IS better!!!  I still vacillate on Take Five.  It seems to be about my mood at the time.   I can say that 'sometimes' non-decoded is better, even when advocating that the FA decoding option is a good thing.

 

 

Link to comment

Still making slow and painful progress.   Been trying to give updates on the 'sites', but they are generally pretty bad.   My hearing usually catches up in a couple of hours, then having to pull back.  This just keep bouncing around because of the attempt to 'please', but instead just ends up being 'frustrating'.  My new process should help control the results of my extremely variable hearing.

 

Before writing further, I want to make sure that everyone knows that I have been listening to private comments, it is just that I had been stuck on a problem that has resulted in frustration and bad test versions and had to protect my thinking to figure out this very technical, internal problem.   Once I get this thing stabilized, and with a REAL, OFFICIAL release, then feedback will make sense again.  (Also, I got REALLY ill a few days ago, which also delayed things.)

 

The big problem has been the very desirable moving of the HF EQ from the end of the last layer to after each layer, while resulting in sound that is correctly HF balanced.   Just today, a few hours ago, I found the problem.   I am writing in generalities, and note that this is one case where my 'rules' screwed me up:

 

When doing almost anything in the decoder, there is a standard set of frequencies for the EQ.   These standard freqs include 9kHz, 12kHz, 18kHz, 21kHz on the high side.   You could also add 3kHz and 6kHz if you want to look lower.   The EQ needed between each layer is a kind of -6dB (or -3dB -- whatever the design requires) per layer.  I inferred this from a lot of features of the architecture.

 

So, I kept trying various combinations of 18kHz based EQ and 21kHz based EQ.   So, we got either 'shrill' or 'dead' sound.   Most of the test copies were 'shrill', but the problem of being 'shrill' or 'dead' sounding is the same:  wrong choice of EQ freq.   I finally decided to break my rules, and try 20kHz -- 20kHz is a standard frequency of sorts, but not  normally encountered in the design.   Magically, the sound cleared up.

 

The rest of the EQ, etc must be corrected back to the original design -- there aren't all that many degrees of freedom, even though I did try some variations.  Of course, when doing so, that is usually fruitless and frustrated 'tweaking'.   After moving to 20kHz, -6dB per layer, magically I could return to the other values elsewhere that I wanted.

 

When the HF EQ was messed up, then to balance the HF and LF, I screwed that up also.  Once the HF is now making sense, then the LF has been reverted to normal.

 

There will be more information in the release information, but the entire goal of moving the EQ into the appropriate structural location is MUCH MUCH MUCH more clean vocals, etc.  Also, I prefer correctness once I figure out what 'correct' really is.

 

My current testing is going to have to span at least one day, maybe longer so that my perception can take into account the various cycles in my hearing ability.


Even though the results in a day or so might not be perfect, they WILL be perfect per my hearing over the period of a day.   Also, my hearing is NOT used to tweak, but it is used to 'choose' between technically defendable choices.

 

I do feel pressure, and without this recent improvement, there would always be that 'crackle' or 'grainy'  sound in the Supertramp vocals for example.

 

 

 

Link to comment
On 4/25/2021 at 8:47 PM, John Dyson said:

Personal matter delaying a good response -- will post an update to the link.)

I hope things are working out. 
 

Im thinking that it will be better for me to focus on more highly compressed/processed vocals rather — Its hard to process the progress details until I have some concrete software to test. N

Custom room treatments for headphone users.

Link to comment
11 hours ago, jabbr said:

I hope things are working out. 
 

Im thinking that it will be better for me to focus on more highly compressed/processed vocals rather — Its hard to process the progress details until I have some concrete software to test. N

The release IS getting closer.   There is a non-released V2.2.9L right now, the snippets for this non-release will be on the usual place, and the private  place already has a recording or two.   THE DEFECTS THAT I DISCUSS have NOTHING to do with competency, or technical issues.  I simply cannot reliably hear, so after a night of sleep (or 3-4Hrs for me), I hear the disaster.   This will require time for me to 'zero-in'.  I hate, totally hate subjective measurements!!!

 

Snippets (just for information, imagine less lower-middle bass and less garble): (When bass is not very strong, the vocals are BEAUTIFUL.)

https://www.dropbox.com/sh/i6jccfopoi93s05/AAAZYvdR5co3-d1OM7v0BxWja?dl=0

 

  ALL of the current test materials were done with a relatively low quality of decoding, so there IS some garble in the sound.  It takes LOTS and LOTS of processing to remove the garble from all kinds of sidebands distorting the signal, but this processing is mostly disabled in the tests.   The current tests are for frequency balance, however bad it is right now, modulation distortions will be cleaned up for the official demos.

 

I expect that the snippets for the 'L' non release will be uploaded within the hour.   The snippet examples are now  decoded, then the conversions and uploads need to be done, so +1 Hour from posting time is long enough.   Again, the 'L' version is NOT a release, and even though the audiophile listener will find defects, I can hear the progress.

 

The architecture for EQ IS correct now -- the re-factoring of the HF required some other changes -- the changes cause interactions in various places.   The current version has proper HF and very clean vocals, if you can hear them hding behind the approx several dB strong 30-70Hz.   On this test version 'L',  LF is WAY WAY too 'full' sounding, and this could be because of one EQ freq being in error, or perhaps a slightly too aggressive EQ sequence adding too much bass.   Those who know that I dont like too much bass very much will get their jollies on this non-release version of the demos.

 

There has been a serious tension between the Supertramp and other recordings.  I did find that the Supertramp recordings DO have a different kind of FA encoding, and that has been troublesome since day one.  The decoder can adapt to Supertramp style of encoding precisely now.  In fact, using the SoX program, the L+R channels are almost perfectly balanced for peak and RMS levels on 'Crime', 'Breakfast' and 'Crisis'.   This is a good indication, because it does show that the decoding is reversing the encoding VERY VERY well.  The results have the characteristics of a well mixed recording.  The 'Quiet' recording has been problematic, but the decoder produces a plausible (but imperfect) result.

 

Once *I* am happy with the demos, then it will be time for people to give feedback.  I get really confused when three or four people use different terminologies and have different opinions about the errors in the sound.  I RESPECT the opinions, but i also want to avoid going totally crazy bouncing around the settings.

 

As soon as I read the comments, I'll try to follow the suggestions becuase I very strongly want to please everyone -- but sometimes the various suggestions are conflicting.   I believe that making sure that my hearing is NOT misleading me, and then being very careful making choices based upon multiple tests during different parts of the day will be the best way to go UNTIL THIS RELEASE IS READY.

 

I am only trying to do the best that I can do for getting the very best results.

 

 

Link to comment

The V2.2.9P version has been in the various test/evaluation demo sites today.   I didn't want to advertize them, only for those who check once in a  while.  No sense in seriously distracting anyone until things are much better settled down.   V2.2.9P goes a LONG LONG LONG ways towards getting the 'right' sound, but there was a latent problem, even with perfect frequency response balance:  A certain 'buzz' in some recordings hasn't gotten any better.  For example, Carpenters 'For All We Know' had a kind of odd distortion in it.  I knew that it 'sounded like' a modulation of the high frequencies by the lows, but I didn't know of a mechnaism for that to happen until now.

 

There is a necessary per-emphasis and de-emphasis of the low frequencies at (current test version) LF shelf of 25Hz, 50Hz, 100Hz at -6dB each.   With this, the LF waveshape and envelope will modulate the higher frequencies at a less modulation depth.

 

With the more precise pre/de-emphsis, the result is profoundly more clean.  I KNEW that there had to be some pre-emphasis/de-emphasis, and in fact implemented a less precision version a month or so ago, but my original implementation was not precise enough to cancel out this 'modulation distortion'.   I *really* thought that it was an attribute of the recordings,

but since this modifcation significantly improves the modulation effects, I was wrong about this noticeable buzz in the recordings.

 

A good secondary effect is on recordings like 'SuperTrouper', where the 'carnival like bass' is restored.   Other material, like 'Dreamer' on Crime of the Century has the vocal chorus sound much more natural and less synthesized.

 

If the currently running decoding operations are successful during tonight, and there are no regressions from the V2.2.9P that has been on the sites all day today, then I'll post this new V2.2.9Q along with the decoder binaries in the morning at about +11Hrs from now.

 

Link to comment

I started the V2.2.9Q release and after listening to more recordings, I believe that there is a suboptimal in one of the 'crossovers'.   This likely change is only moving an equalizer by another step, but the change in midrange will be slight but still very important.   The result will be a slightly more 'full' sound than the current, aborted V2.2.9Q release.  If it was life or death to use a decoder 'RIGHT NOW', the V2.2.9Q version can be used, but frankly, no need to bother since the V2.2.9R release is coming in a few hours.  At most, it might take until

the normal 9:00PM (+14Hrs) timeframe, but I am pretty sure that it will be ready in +5Hrs.

 

The possible improvement was JUST CAUGHT as uploading the V2.2.9Q version.  It is a SMALL change, but will be important for closer mirroring the original recording.

No matter what, below is a TRUNCATED set of snippets for V2.2.9Q, just uploaded as posting this message,   This change amounts to correcting an approx 1dB error in the lower

midrange -- such matters are very important though.

 

I plan that the corrected  V2.2.9R will be coming in about 5Hrs.   The 9:00AM or 9:00PM rule will be put aside this time, because the change is very small, the risk is very low, but the benefit will be more than noticeable...  If you are time-starved, don't even bother with the V2.2.9Q snippets, because V2.2.9R will be even better...

 

https://www.dropbox.com/sh/i6jccfopoi93s05/AAAZYvdR5co3-d1OM7v0BxWja?dl=0

 

After 'R' is released, I'll be taking a break from internet for a day or so, then will look at both private and public messages & feedback after a few days.

You can imagine the burn-out and my desire to please people -- Just cannot do any more until a bit of rest.

 

So, look for 'R' in about +5Hrs from now, or Noon USA Eastern time.  If I don't announce by +5Hrs, then the normal 9:00PM time (+14Hrs) will be operative.

 

Link to comment

The V2.2.9R release is available.

If you think that BASS has been lacking, well -- it has been lacking.  The bass isn't lacking any longer, while is also as true as possible right now.  There is a set of tradeoffs (like a Tetris game) along with my hearing playing games.  In fact, my sense of frequency response balance has again started distorting  in the last hour.,

 

Listen to the bass on 'Toughen Up' for for what is possible, given the material.   The actual bass needs to be in the material though -- just because you hear the bass on the FA copy doesn't mean that it is so strong on the original recording.  FA does lots of LF compression, boosting the level of bass in wierd ways.

 

I found the formula that takes care of the bass rather more easily than what I had to do before.    Adding more bass might be trouble, because it will interfere with the ambient sound of certain instruments (e.g. pianos become excessively muddy.)   I can thin out the bass a little bit.   It would be safe to trim 3dB at 20Hz, or even 1.5dB.   I wouldn't suggest trimming more than that.   If the lower bass (like 40Hz) seems excessive, I can trim that also instead of just focusing on the 20Hz area.   There is a set of 'rules' that must be followed, yet the bass can be trimmed.  The bass CAN be boosted, but like I wrote above, room and instrument ambience becomes really ugly and muddy.   The settings are right on the edge for maximum bass and controlling muddiness.

 

Important news about the bass -- the resulting response through the entire FA process (encoding/decoding) shows no pronounced lower MF or bass peaks.   This is good, because orchestral stuff sounds REALLY BAD if there are peaks in the lower midrange.   I hear none of of the strange effects of lower midrange or bass peaks.   Frankly, I dislike too much bass, in fact, I probably dislike the correct amount of bass and prefer less.  However, I do think that we got it CLOSE to correct now.

 

This version will have all of the bass (true bass) that you love and desire....

Snippets:

https://www.dropbox.com/sh/i6jccfopoi93s05/AAAZYvdR5co3-d1OM7v0BxWja?dl=0

Binaries (actually in the snippet location, in a subdirectory):

https://www.dropbox.com/sh/5xtemxz5a4j6r38/AADlJJezI9EzZPNgvTNtcR8ra?dl=0

 

THIS IS AN IMPORTANT RELEASE, not just informational.   I'll be looking at private and public posting starting tomorrow.   There are some other demos which will be finished up in the next few hours, and I'll be doing some casual testing/etc.   Might look at private messages today, but most likely tomorrow.    I MUST TAKE A BREAK and not even try to respond even to very good and helpful comments until I get a rest!!!   I GREATLY RESPECT the private coments, and really appreciate the interest, but my brain will explode if I don't step back for a few hours!!!

 

 

Link to comment

Message deleted...

I do suggest giving some feedback -- the results are pretty close to being good, but my hearing is bad.

 

Truly, I am 100% burnt out, and will be back on friday night (+24-36Hrs or so.)  I'll fix any bugs or comments ASAP.

There are a few possible improvements, but my hearing lies to me...

 

(I just had a scare, the sound sucked worse than usual :-)), then realized my hearing was bad.  I simply got up, answered the door, and then my hearing was back, and the recording sounded okay again.)   THIS IS A PROFOUND PROBLEM FOR ME.


Again, thanks to all of you who have sent comments.  This last panic just about burnt me out totally.   Luckily, I now hear what I had expected all along...


Sorry for the troubles the last few years -- if my hearing hadn't screwed us all, the decoder would probably have been perfect 6months earlier.  (Okay, probably not perfect yet, but GIVE IT A LISTEN!!!)

 

 

 

 

 

Link to comment

Had a crazy time today.  Math library problems.  Sometimes the failure happens, sometimes not.  Some of the transcendental math functions use square root of 2 for some of the range reduction of some variables.   Alas, that is the same value as 3dB.  (1.414....).   The math libraries were giving either:  good results, bad results or crashing.

 

I use the AVX512 version of software, which on Linux, gives good results, at least for me.  When I did quick verification for AVX2 on Linux, I got weird results, but sounded okay on the quick check.  Then, sometimes on Windows, using the SSE3 version, the program would crash.  I do check the program EVERY RELEASE on Linux, but when I got the crash there on my last build, I did some more reviews and found that the library was fragile and the results were not consistent from machine type to machine type.

 

I worked-around the problem by using an offset value for 3dB/-3dB, and then the math library appears to work correctly.  I am doing a full retest of the program, but MUST be careful when I do the verifications.  (I am using exactly 1.414 and 0.7071 instead of the higher precision value.)

 

Today, I was chasing this rabbit down some rabbit holes, sometimes the program sounding perfect (per my expectations), but it worked  depending on the version of the software, phase of the moon, etc.   The work around should avoid the problems and continued embarassment.   I wouldn't be surprised if crazy results would have been transient in each decode.

 

This is one REALLY REALLY big DSP program, which are historically 10k instructions at most.   In equivalent terms, the decoder is well over 100k or even 200k-300k of tightly packed, SIMD  instructions and dependent on a free vector transcendental math package.  The transcendental math package (exp, log, pow, sin, cos, etc.)  has actually worked well, and nicely integrated into my SIMD architecture.  Unfortunately, something tickled the bug, and this problem requires a *total* retest to establish a baseline.

 

When testing is done, I'll be doing a V2.3.0A release, and will be reviewing the comments over the last week.  Until I have something controllable and stable, I don't want to worry about frequency response bugs...   There are worse problems (basic math routines) that need to be stabilized.

 

If the decoder is working correctly, it sounds pretty good and clean.   When it acts stupid, it sounds grainy and bad.

(this was supposed to be my day off...)

 

Link to comment

Came up with another reason for troubles in the decoding results:  Phase scrambling problems between the bass frequency region and the rest of the MF/HF regions.  The problem is now likely fixed in the working development version...   The 'bass' problem is part of the more global FA 'phase scrambling' issues.   This message talks about

preliminary results -- so things might change in the future.   However, GOOD THINGS HAVE HAPPENED TODAY.

 

(no need to read further unless you enjoy reading my rambling text.) :-).

 

When making the decoder produce the right amount of bass, there had to be lots of EQ beyond what I'd expect.   In essense, the bass didn't sound as good as it should for three reasons:

 

In the previous versions (before today's test versions):

 

1)  To produce the AMOUNT of bass needed, the 'rules' had to be violated.

2)  When there was enough bass, then the vocals were buried.

3)  (the forever excuse -- my hearing kept guiding me in all kinds of random directions -- lets forget that for now.)

 

Did you ever notice that in a well made *pure* recording that there was usually  a space in the stereo image field for the vocal to reside?   The vocal isn't just 'mixed' in, but consideration is made so that the vocal is more obvious than just blasting it into both L+R so that it is loud enough.   There is more to a good stereo mix than just 'signal levels'.   In this terrible FA world, there is very little 'stereo image' left over, therefore other, brute force methods are used to make vocals stand out.

 

When doing the FA decoding, in previous versions, the vocal was 'buried' deeper than it should be.    Other than EQ issues, the levels weren't all that bad, but there was something really wrong with the decoded stereo image.   The basic FA signal has a seriously damaged stereo image problem, but to partially compensate, the FA signal also takes advantage of upper midrange compression to make the voice just seem louder.   To help overcome the effects of telescoping and phase scrambling, he FA signal has a grotesque, brute force enhancement of vocals because the stereo image is very damaged by the compression process.

 

The older versions of the FA decoders could successfully mitigate the problems with signal levels & dynamics, but the stereo image was still partially damaged. The vocals on the decoded result were sometimes worse because the FA vocal 'enhancement' side-defect is removed.   The FA decoder removes this upper-midrange enhancement, but still previously retained the 'squirrelled' up FA phase.  This phase damage is NOT just because of the telescoping caused by the FA compression, but there appears to be a phase scrambler in the design of FA encoding.   The phase scrambing is probably 'yet another way' of smoothing peaks in the signal -- maybe helps the sound of the brutally compressed signal? (need to figure out the reason some day.)

 

In the last few days,  I have been able to deal with frequency response by imposing very strong discipline while using my hearing, there has STILL been something wrong.  I have been spending hours and hours, in between the 10-15minute intervals that I can hear well, trying to figure out what the heck is wrong?  Much of my time has been spent trying to find the 'correct' EQ answer.  My mistake:  correcting frequency balance with EQ is NOT the only answer.   Also, the phase needed to be descrambled with special phase modifying EQ.   The decoder already uses phase descrambling by default, but only in the >3kHz range.

 

A few hours ago, I had realized that NONE of the LF EQ has the 'special sauce' anymore.   I removed the 'special sauce' when creating the corrected LF EQ code.   I have an ethic of not just adding things to code -- I need to prove that the processing is necessary.   IN a single sentence: the 'special sauce' is my answer to the phase scrambling in the FA signal -- and I added the 'special sauce' processing, magically the vocals now have their own space in the stereo image again!!!

 

Bottom line:  FA decoding is NOT just an expander utilizing DA units witih proper EQ.   FA decoding also needs to consider the phase scrambling in the encoding process.  Additionally, the 'weak' but existent stereo image when decoding classical recordings is now much more realistic.  (Note the term 'realistic', not 'real'...)

 

 

 

 

 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...