Jump to content
IGNORED

A toast to PGGB, a heady brew of math and magic


Recommended Posts

Hello everyone!

 

With great interest I read about the PGGB project - especially after use of Mscaler plus DAVE transformed my listening experience - and am a bit confused about the different given tuning options in PGGB. Logically speaking shouldn't there be only one correct setting? 

 

A little "proof of concept" experiment could be interesting. Something like digitizing an analogue tape and then comparing the tape master with the PGGB file. If the experiment is done carefully there should be only one PGGB-setting, which sounds absolutely identical to the tape. And furthermore, if PGGB is doing what it is expected to do, the same file played through Mscaler + DAVE should sound ever so slightly different to the tape.

 

Rob Watts recently wrote about PGGB: "If it is a windowed sinc function it is no longer true sinc following Whittaker-Shannon". I will not pretend to have any deeper understanding of these things. I just thought, an experiment done by these extraordinary minds and ears of yours could be intriguing. Personally I am not even able, to reliably discern a difference between USB and optical input to Mscaler. 

 

Cheers 

 

 

 

 

Link to comment

@Zaphod Beeblebrox

Thank you so much for your insightful reply!

 

18 minutes ago, Zaphod Beeblebrox said:
2 hours ago, hanshopf said:

A little "proof of concept" experiment could be interesting. Something like digitizing an analogue tape and then comparing the tape master with the PGGB file. If the experiment is done carefully there should be only one PGGB-setting, which sounds absolutely identical to the tape. And furthermore, if PGGB is doing what it is expected to do, the same file played through Mscaler + DAVE should sound ever so slightly different to the tape.

I dont't think the setup required to accomplish this exists today. You will need a A/D converter that would be able to record at 16FS or higher rate, the same converter should also record at a lower rate (say 2FS). More importantly both the conversions should be of a very high quality. Unless I am misunderstanding you, in which case please elaborate on how that setup would look like including what rates to use

 

I thought, a conventual digitalization would be sufficient. Even as low as 16/44.1 may do, because theory seems to suggest that CD-standard might even be good enough, if upsampling with PGGB is done afterwards. I would do that as well as digitize the analogue tape in 24/192.

 

A conventional high quality A/D converter should do, as my basic understanding suggests, that A/D-conversion poses much lower "problems" than D/A conversion.

 

If the result than shows that a difference between the analogue tape cannot be discerned with in neither the 16/44.1 nor the 24/192 file after being PGGB processed and the result sounds different to the same file processed through MScaler, then we would know four things:

 

a) conventional A/D converters are sufficient

b) CD standard is sufficient for recording

c) PGGB works in a way we all wish it would

d) its superior to MScaler

 

This at least would be the result I'd hope for. Other outcomes are possible.

 

 

 

Link to comment
38 minutes ago, Zaphod Beeblebrox said:

Here how will the analog tape be played back? How does one do this comparison between PGGB upsampled file and analog tape? What sort of playback mechanism need one use?


I would connect the output of the tape machine to the same headphone amp to which I would connect DAVE.

Thus the sound of the PGGB file should be identical to the tape (or at least more similar than through Mscaled file), if PGGB works as theory predicts.

Link to comment

Hello everyone,

 

I would like to add something to my recent suggestion for a proof of concept experiment for PGGB.

 

Thanks to the new cloud based option of upsampling files with PGGB, I today had the opportunity to try it out for myself. I played the upsampled files through Audirvana and sent them directly to DAVE. The original reference files for comparison were as well played through Audirvana and sent to MScaler. 

 

I am a trained listener, recording concerts with classical music on a regular base myself (not as a sound engineer but as a video director), but still did not expect to make out a clear difference because I am not able to reliably discern things as subtle as some in this forum seem able to.

That said, I was stunned how obvious the difference between Mscaler and PGGB was. First I perceived a greater depth in soundstage and air between the instruments. But later I very clearly heared the PGGB sound as being brighter and sharper than with MScaler. Trumpets, Violins and soprano voices had an artificial bright touch. After my initial enthusiasm I therefore clearly find Mscaler sounding more natural or, if you like, neutral.

 

Is PGGB then maybe acting like some kind of presence filter, giving the illusion of more space and air?

 

The only way to find out what is going on, I guess, would be the kind of proof on concept I recently suggested: comparing an analogue mastertape with a digitized file through MScaler as well as PGGB. One of these two should sound more similar to the analogue tape. My bet would be MScaler, even though I can see that there's potential for improvement. 

 

 

 

 

 

 

Link to comment

Thank you all for your replys. You may be right, even though I tend to doubt the notion, a battery driven source into galvanically isolated USB input of DAVE or noise shaping from 32 to 24 bit inside DAVE should be the reason for the PGGB files to aquire the perceived unnatural brightness. But I will try again, next time a step further, recording a LP and then compare it with the files through the same headphone amp. Let‘s see which of them

sounds more similar to the vinyl. 
 

Link to comment
Quote
17 minutes ago, Zaphod Beeblebrox said:

I assume this latest test was also via PGGB.IO? (thanks for trying it)

 

 

Yes, indeed, because I currently have not enough computing power at hand to do it myself.

 

 

17 minutes ago, Zaphod Beeblebrox said:

If you are able to download and try PGGB, there are a few settings I can suggest you to try. If that is not feasible,   you can email me the track (share a link via Dropbox or wetransfer) and I will be happy to provide you the same file with 4 different settings to see what you think.

 

Very much appreciated. I will send you the file tomorrow!

 

 

17 minutes ago, Zaphod Beeblebrox said:

Edit: Only other thing I can suggest is (as a way to fully bypass Mscaler), try SRC-DX if you are able to do so.  

 

Ok, maybe. This is unfortunately a rather expensive device for a relatively simple function. 

 

 

15 minutes ago, Fourlegs said:

The biggest flaw I can see in your method is that you were playing files through the MScaler on the so called pass through setting.

 

When I recently played them via USB directly into DAVE I was critizised for hereby not doing an apple to apple comparison. So I had to pass them trough Mscaler.

Link to comment
4 minutes ago, chrille said:

Hmm, I suspect you still consider LP a better reference point than the actual live sound in the hall?

 

Hello, no offense intended, but how on earth would you make the live sound in the hall a reference point of comparison for different upsampling methods? I suppose you slightly missed the point of my trial. :)

 

Apart from that: there is no such thing as "the actual live sound in the hall". On every seat in the hall you have a different sound. And microphones do not "hear" what you would hear, when you are at the same point in the hall. In addition to this in the hall you listen with your eyes as well. 

Link to comment
5 hours ago, Zaphod Beeblebrox said:

Purely based on Math, 32bit noise shaped track direct to DAVE via USB 'should' sound better than truncated 32bit  track to MScaler. Have you tried the 32bit noise shaped version of the LP track I sent direct to DAVE's USB? If yes, it will help to know in what way it was different? Some have reported a slightly 'softer' transient when 32bit 16FS is played direct via DAVE's USB,  while I did not notice it myself (and in fact found it hard to believe), after listening to 16FS 24 bit tracks via SRC-DX to DAVE's USB, when I go back to direct to DEVE's USB, I feel the same way now.

 

Yes, I tried the 32bit noise shaped version as well. I found it very slightly less "clean" and sharper. I wouldn't have said it has "softer" transients, because it was not only slightly less brilliant, but also slightly harder. But then this may very well be the result of softer transients.

Link to comment
  • 1 month later...

I have not posted in quite a while, because I had ended up confused by placebo effects and went back to the more convenient way of just using m-scaler, which at times sounded better to me than PGGB. But honestly, I had come to a point where I did not trust my perception anymore. 

 

But hearing about the recent progress made me update my PGGB software and prepare a short test track in full 256bit. I was told that window function is not being used anymore, so I got a bit confused still reading on the homepage that "long tracks benefit from better reconstruction accuracy when remastered using PGGB 256". I always thought that had to do with PGGB using a very long window of the full length of a music track.

This is actually one thing that had thrown me off a year ago, because I did not get the "combine tracks" function to work. And I did not understand the reasoning behind it, because, logically thinking, what would be the sense of say a 27minutes window for the 4th movement of Beethoven 9th, when the track itself consists of several recording takes edited together into one. Logic would have me assume that the window then should cover only the duration of one unedited passage of music. Or what do I miss here? 

And what still continues to confuse me is that the "long tracks benefit from better reconstruction accuracy" basically means, that the uninterrupted full 2 hours of music of 1st act of Wagner's "Götterdämmerung" would lead to technically superior reconstruction results in PGGB than your average 2minutes song. Can this be really true? I can already hear Rob Watts laughing in the back of my mind. I would really be grateful if somebody could explain to me why a window function is not used anymore but still track length is a decisive factor and furthermore, how this works with music, which in most cases will be edited together from several takes into a continuous track anyway. 

 

Thank you so much!

 

P.S. As my license upgraded to "128 perpetual" I do not understand the meaning of "256GB bit processing is limited to 2 tracks at a time". Does is just mean I have all options, but can process slower compared to "256 perpetual"?56 bit processing is limited to 2 tracks at at time.

Link to comment
20 minutes ago, Zaphod Beeblebrox said:

If a recording is continuous, then yes using the whole recording is better than chopping it into pieces or editing it into shorter pieces and post-processing during production would would make it impossible to combine them.

 

Thank you so much for taking the time to explain things, which helps a lot! With one thing though there seems so be a misunderstanding: its one thing to combine two tracks into one (which would make sense in the 4th movement of Beethoven 9, often spread onto two tracks). The other thing is - and there my doubts were hinted at - that basically most music tracks have been patched together in production anyway from different takes in the recording studio. So this 4th movement of Beethoven 9 may have a uninterrupted length of roughly 27 minutes, it still has been edited together from several takes into one. Shouldn't the time window only cover the length of an unedited passage of music to be effective? That of course would be impossible because we don't know where the edits are, because they are in most cases inaudible. Or does this not make sense and the time window is doesn't "care" about the edits inside any given track of music?

 

10 minutes ago, Zaphod Beeblebrox said:

will do only do two tracks at a time at 256 bit precision

 

Sorry, I still don't understand, what this means, two tracks at a time. Is it the same, but only takes longer, because I cannot process 10 tracks at the same time?

Link to comment
16 minutes ago, hanshopf said:

Or does this not make sense and the time window is doesn't "care" about the edits inside any given track of music?

 

P.S. Adding to this: recordings are often edited together from several live concerts or sessions in a studio, sometimes sessions several months apart. Of course the engineers try to position the microphones at the same spots and make the acoustic environment as similar as possible. Still it often sounds different. 

If the time window now is spread over several edits with slightly changed acoustics - which it automatically will be with most recordings of classical music -  how should it then be possible to gather more exact information about room ambience etc what you mention, if the room ambience inside one track of uninterrupted music changes slightly at every edit point? Sometimes the acoustics are so different that one can clearly hear the edit point. Logically it does not make sense to me to have a time window overlapping with edits in music. Or what am I getting wrong here?

Link to comment
27 minutes ago, Zaphod Beeblebrox said:

Everything I said assumes a continuous recording not one that was edited together in which case not much can be done without knowing how to be spilt them.

 

Thank you again for helping to understand. I really appreciate that! But then how much sense makes the assumption of a continuous recording which hasn't been edited together, when this is almost never the case? - At least not in the field of classical music recording, where you nowadays often have literally hundreds of edits. Even in analogue times you often had countless edits of takes from different recording sessions patched together.

Would a smaller time window under these circumstances not make more sense? Assuming a one-minute long time window: in that case you would cover many passages without edit, especially in older recordings, when you wouldn't edit all the time. Of course this would as well be guesswork, but at least you'd get it right sometimes. But a huge time window in length of a track will almost never cover a continuous recording without edits and therefore must be flawed from the start because it cannot work perfectly under most circumstances. Would there then still be arguments for going this path instead of using a smallish time window? What's the length of the time window Rob Watts is using? 

I am really sorry for the fuss, my questions are curiosity driven and not in bad intention!

Link to comment

I just now made a listening test myself, m-scaler vs 128bit vs 256bit. Unfortunately I was not able to come to a reliable result. More often than not m-scaler sounded best to me. But that may be due to confirmation bias or just because I am used to it. But then I felt that 128bit sounded rounder and fuller than 256bit. But this should be impossible and therefore I can only conclude that I must further test repeatedly over several weeks.

 

Anyway: a Rob-Watts-like time window of 1.5 to 25s makes more sense to me because then at least the majority of the windows will not contain an edit. The base assumption of long unedited music passages does unfortunately not correspond with reality. You will basically have a hard time finding unedited music tracks at all.

If you then argue that listening test have shown otherwise, then I hope the test listeners are better than me, which very well maybe the case (even though I am a trained listener working professionally in the field with still good hearing). I would have hoped that there would be a less vague explanation for using a time window in length of a whole music track. My guess is that a compromise might offer a better solution. A time window of let's say 1 minute length would still give you 32M taps, but would as well allow to capture most probably the majority of unedited passaged of music inside a music track. 

I would really like to trust the ears of your test listeners, but that's not easy when I feel that I cannot even trust my own ears. I wish it would be easier to decide mathematically what is more correct. As it seems to me right now, Rob Watts is still in the game, even though I hoped a software solution could be far ahead.

 

Link to comment

Thank you all for your kind and insightful replies! I will do my homework and test repeatedly over the next weeks. Anyway, I adapted the volume correctly in today's test and went into DAVE directly via USB. Unfortunately I cannot do a blind test, because of need to change the set up, but I really would like to test this blindly. It's a bit frustrating not to hear what I want to hear (and of course I want to prefer PGGB because some 10000€ for the next generation M-scaler is out of question for me). Puzzling to me as well was that 128bit sounded more natural and 256bit more compressed to my ears today. But this simply cannot be. Placebo effects and confirmation bias can be hard to handle. That's what makes me wonder how it can be so much easier for others. And I do not doubt at all that other might be simply better in making out finest differences. DAVE alone versus DAVE with m-scaler was much easier, a huge improvement. 

I will come back to you, but first it's on me now to do a really thorough testing!

Link to comment
4 hours ago, Zaphod Beeblebrox said:

Rob Watts discusses the need for 'billions and billions of taps'

 

Thank you so much for the link, very intriguing!! Will listen to it tomorrow. 

 

I prepared files for my further listening and applied -2,8db to exactly match m-scaler. In addition I made one file "dither only" and 0db for comparing with m-scaler's noise shaper. 

 

I could send you my original file, and, if technically possible, you could make a version with a small window (like 25s) in 256bit. To circumvent bias one could even create a second file with current track length window, so that I do not know, which is which. Could be fun to test this blindly. 

 

Anyway, I will report back after further testing. Quick first impression today was that indeed some aspects like timing sounds more accurate with PGGB, but paradoxically soundstage appeared to be smaller than on m-scaler (I expected it to be more spacious with more accurate timing) and timbre had a more "processed" character. But these were only initial impressions. 

Link to comment
8 hours ago, austinpop said:

Am I understanding this correctly? You are arguing that every edit point within a track represents a discontinuity, and could/should be treated as a separate fragment for PGGB to handle? 


Yes, that would seem logical to me, especially when different takes come from sessions with slightly changed acoustics (for example because they were days apart and the microphones had to be set up again or because corrections were made in an empty hall after the audience left etc)

 

8 hours ago, austinpop said:

how achievable and reliable is it for an analysis tool to accurately detect these discontinuities? I suspect not easy at all. Just dropping down to 23 second fragments does not feel very satisfying either.


It would be impossible for the majority of edit points. 
Short windows would guarantee that the majority of them would not overlap with edit points. I fear the only way to find out if this makes sense and could lead to further improvements would be listening tests.

Link to comment
15 hours ago, Zaphod Beeblebrox said:

So you can run an experiment.


So I took the first four minutes of a track, devided them into eight parts of roughly 30s each, will do everything in 256 bit, put them into one album and replay with gapless playback. This gives me small windows in best quality for comparison with complete track length window. 

Link to comment
  • 1 month later...
On 4/7/2023 at 9:42 AM, hanshopf said:


So I took the first four minutes of a track, devided them into eight parts of roughly 30s each, will do everything in 256 bit, put them into one album and replay with gapless playback. This gives me small windows in best quality for comparison with complete track length window. 

 

So, I wanted to report back on my listening test, for which I separated one music track into several 30s parts to find out which audible effects a smaller then full track length window would have in PGGB (following the idea that a full track length window - in most cases covering several edit points in the music - could potentially be a compromised solution).

 

Even though I repeated my test several times over a span of six weeks, I wasn't able to come to a helpful solution. There are obvious audible differences, the shorter window tracks presenting a slightly darker and more defined quality of sound, but I could not decide if one of these two variants sounds superior. My initial impressions favored the shorter windowed tracks, but later on I wasn't sure about that anymore and kept staying undecided. If this topic should be of any interest to somebody else I can only advise to repeat this experiment. Maybe different ears with different equipment are able to come to a less ambiguous result. 

 

 

Another matter: I am not happy about the fact that PGGB'd tracks don't play gapless in Audirvana (and probably everywhere else). This is especially annoying in classical music, where an uninterrupted piece of music is divided into several tracks on a regular base. Is there a way to circumvent this problem? The "combine-tracks" function is no realistic solution. First of all I couldn't make it work and secondly it is not an option to manually rearrange 20% of my classical music albums. I really would be extremely grateful for a practical solution to this problem. 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...