The Optimal Sample Rate for Quality Audio

Julf · May 9, 2012

Yeah, helluva reason to do it huh? Because we can.

Just like we need a 6-litre V8 in our car...

Julf · May 9, 2012

Academic papers on the subject by Kunchur conclude people can discriminate such transients at about half the rise time of a 22.05kHz harmonic wave, indicating double the CD sample rate may be necessary to reproduce them *in the mathematically ideal case with perfect filtering*.

A fair amount of academic researchers have expressed concerns about Kunchur's work. There has been a number of attempts at replicating the results, and they have concluded that *if you remove the intermodulation distortion that causes the ultrasound to produce intermodulation components in the audible spectrum*, there are no audible difference if the ultrasonic component is removed.

- The Lavry paper first says there is no reason to retain any information above the audible range, then notes such information can have a substantial effect within the audible range. These two statements, it seems to me, are contradictory. If real world instruments produce ultrasonic information (and they do) which, as the Lavry paper admits (in fact emphasizes), affects the audible sound, then removing the ultrasonics alters the original audible sound, i.e., introduces distortion.

It might be the other way around - the ultrasonic information might cause intermodulation distortion that is audible in the sonic frequency range.

Julf · May 9, 2012

Chris,

I can find several engineers and AES Fellows who contradict much of what Dan says. My point is there's not one set of facts.

What part of what Dan says do you disagree with? The thing about the timing resolution is pretty much basic sampling theory and is well supported by people working in the field (a good example is the Meridian white paper by J. Robert Stuart often quoted here on CA). For basic, undithered signals, the time resolution of a sampled signal is the sample interval divided by the number of digital levels. Dithering improves it further. Thus the time resolution of a 44.1kHz / 16 bit signal is 1/(44100*2^16) s, or 0.34 ns.

Julf · May 9, 2012

My point is simply the original poster writes as if Dan's word is the only word on the subject.

I am definitely with you on that part - hyperbole never helps.

Julf · May 9, 2012

I'd appreciate references if you have them handy.

Griesinger, Perception of mid-frequency and high-frequency intermodulation distortion in loudspeakers, and its relationship to high definition audio (http://www.davidgriesinger.com/intermod.ppt)

Yes, absolutely. My point was that the electrical impulse representing the ultrasonic and audible range waves cannot "know" whether their intermodulation creates distortion or a correct representation of an original event in which both ultrasonics and audible range waves were involved. Once one grants that ultrasonics affect audible frequencies, one cannot then simply declare that such interaction is always distortion and by declaring make it so.

The Griesinger experiment shows that the intermodulation distortion was only audible when driving the amplifier hard.

Julf · May 11, 2012

Julf, I don't understand what you intend by this.

My apologies - I made my posting while on the road, from a car ferry crossing from Holland to England, using a very slow satellite internet connection. As a result, I didn't recheck the Kunchur documents, and confused his findings with those of Kiryu and Ashihara. While Kiryu and Ashihara actually claim to show some ability of the human ear to hear frequencies above 22 kHz (addressed by the Griesinger presentation), Kunchurs experiments only shows that the ear can detect quite small timing differences. The problem with Kunchur is that he makes statements that, especially when taken out of context, imply that a 44.1 kHz sample rate system can't reproduce those timing differences (despite the fact that his experiments don't address that issue at all).

So, again, my apologies for a misleading link.

Julf · May 11, 2012

Not to rehash old discussions, but: Kunchur's experiments reported in 2007 did indeed focus as you say on timing differences between two continuous signals. For audible signals, Shannon-Nyquist proves 44.1kHz is adequate to handle such timing differences to any arbitrarily brief length of time.

Indeed, full agreement here.

But certainly by 2010, Kunchur was focusing more on the audibility of inharmonic transients with such a steep rise in so short a time that 44.1kHz sampling rates are not adequate to reproduce them.

The only fairly recent work I can find is the AES panel contribution you already linked to. I find it interesting that the only actual experimental research work he refers to is the original work about temporal differences - all the rest is pretty much just theoretical speculation. It might be right, but it is still pretty much unconfirmed by any actual observations.

Julf · May 11, 2012

I will say that the evident care with which he designed, set up, and conducted his experimental work published in 2007 (building his own equipment in some critical cases to ensure it met the desired specs) went some way in convincing me he wasn't just making stuff up, but was discussing something about which he had some actual data.

I agree that the 2006/2007 work seems very solid, but I will personally continue to be a bit sceptical about his more recent thoughts until he publishes some actual data. As to the ears/brain being able to process ultrasound, I would definitely find work from actual neuroscientists more interesting (and credible), but it is definitely way outside my area of expertise.

Julf · May 12, 2012

Last time I checked, the Nyquist-Shannon theorem only applied to infinite duration, continuous signals.

Indeed. Fortunately what happens with real, non-infinite signals is well understood. Because of finite amplitude resolution / signal-to-noise ratio, time resolution is also finite - but still more than sufficient. And yes, filters are always compromises. With infinite processing power, you can definitely avoid ringing, but most of us don't like the idea of a supercomputer bolted on to our DACs.

It is interesting that while people can discuss capacitor choice and cable quality until the cows come home, I see very little discussion of the processing power of the various DSP processors used in the DACs...

Julf · May 12, 2012

Hmm, still haven't figured out how to actually delete a duplicate posting... :-/

Julf · May 12, 2012

No you can't, if the ratio of passband vs Nyquist frequency is too small. I've spent enormous time to optimize filter to be as short as possible while still being "perfect", however it's just about math and it's impossible to optimize the filter to 1-tap and make the transition band with 144 dBFS attenuation to fit into 20 - 22.05 kHz band.

Sure, within those constraints it is impossible. Or, as a computer scientist would say, "hard"

But if you remove some of the constraints - allow massive oversampling and a practically-endless number of taps (both for filtering and for compensation), you have a situation where throwing CPU power at the problem is actually a solution.

Then there is the entirely separate debate about how harmful the ringing actually is - as long a it is post- and not pre-ringing.

Julf · May 12, 2012

Interesting post, interesting comments. As is often the case in the world today everything comes down to "objective data", and conjecture-diagreement as to what it all "means". This "means", comes about because you always end up with a pesky human being variant at the end of the data stream. :

Absolutely. But the measurements exist pretty much exactly to take that pesky human out of the loop . There will always be disagreements about what sounds good and what doesn't, and people will hear things differently. But what a design/electronics engineer has to do is try to find the common ground - because without common, objective criteria, every piece of equipment would have to be a bespoke design for a specific customer.

Somebody around here said you have your system just right, when you start tapping your foot

That's a great criteria for music, but not very good for equipment - because the right music gets my foot tapping even when played through an old transistor radio, a cheap car stereo or a portable PA speaker.

Now I like and enjoy objective analysis, but don't get so wound up in it, you forget to tap your foot... :0)

My problem is trying to type while tapping my foot. I would never have been any good as a drummer.

Julf · May 12, 2012

So, my personal listening experience matches precisely what Bob Stuart told Robert Harley in the interview that I linked earlier in the thread.

That interview covers a fair number of different topics, so I am not sure which part you are referring to.

Julf · May 13, 2012

It starts at the bottom of page 2:

And it continues onto the next page:

Ah, OK! My bad - I was looking for comments on ringing, whereas what you were talking about was listening tests.

The replies from Bob Stuart made so much sense to me that, recently, I started reading up on psychoacoustics myself ("Psychoacoustics - Facts and Models" 3rd ed. by Hugo Fastl & Eberhard Zwicker, and "Auditory Neuroscience" by Jan Schnupp, Israel Nelken & Andrew King). Especially the chapters on binaural hearing have been very informative to me, even though I have to admit my scientific and technical knowledge is mostly limited to the world of IT.

Have to agree, the psychoacoustics are fascinating reading - and give a very different perspective to round reproduction. I am also impressed how all the work that has been going into lossy encodings has actually helped us further our understanding of how both our ears and our brains work.

Julf · May 14, 2012

Almost all DACs do oversampling to reach some much-higher-than-RedBook rate (e.g., 352.8 or 384kHz) before filtering. Doesn't this constitute at least a tacit, if not explicit, admission by most audio engineers who've worked on the problem that filtering at these much higher rates produces better results than filtering at RedBook sample rates?

Not sure I would call it an "admission". Yes, oversampling is the simplest and most common solution to the steep filtering issue. I don't think anyone would claim the steep filtering isn't an issue.

This being so, what would be the preference of most on this board for attaining these higher rates - start at RedBook rates and interpolate most values through sample rate conversion, or start at rates that are as high as possible and interpolate few or even no values?

If we can start out with a high-sample-rate recording, and disk space and network bandwidth isn't an issue, it is probably best to use a consistently high sample rate instead of upsampling - with the small "but" that some amplifiers and speakers might not handle the high frequencies very gracefully, and might actually show a decrease in sound quality. In real life, the additional disk space and bandwidth has to be weighted against the possible difference in sound quality.

If, on the other hand, we start out with a recording that has been recorded in 44 or 48 kHz, 16 bit, there is only a very small benefit in upsampling earlier in the chain - the possibility of using more advanced upsampling algorithms than is possible with the processing power in the DAC.

Julf · May 14, 2012

So? The ESS SABRE³² Reference ES9018 chip can upsample 24-bit 192 kHz material to no less than 1536 kHz

At what point do you start needing a ham radio license?

Julf · May 14, 2012

Is it more difficult to use 8x sample rates with traditional interfaces like AES/EBU? More jitter? Is sending an AES stream from something like a Mykerinos card more difficult at 8x? I also wonder if async USB is a completely different story when it comes to 8x.

A very good point, Chris. Red book only needs 1.4 Mbit/s, while 384/24 requires 18.5 Mbit/s - pretty serious transmission speeds. Remember original Ethernet was 10 Mbit/s. Another issue is disk space - a red book CD is 0.6 GB, the same album in 384/24 is 8 GB. Yes, disk capacities are constantly increasing, and prices are decreasing, but still... Again, perhaps justifiable if those extra bits actually contain real information, but if the music is just upsampled, it is just fluff - better do the upsampling at the DAC instead of wasting bandwidth and disk space.

Julf · May 15, 2012

Protocol-wise, sample rates above 192Khz is not a problem with AES/EBU -- it can theoretically support any sample rate as long as clock can be recovered from the data. Bandwidth is usually not an issue as it is common to run AES/EBU over good old 75 Ohm RG6U coax or cat 5e (350MHz) cables.

Yes, I should have been more precise. Raw analog bandwidth is not the issue, reliably achievable digital bandwidth is. On the other hand, USB 3.0 is supposed to give us 5 Gbit/s.

AES/EBU was designed to handle long distance transmission in less than ideal environment, rather than sheer transmission speed. With an active equalizer (distribution amp), you can easily go 2-3 football fields over cat 5.

It is amazing what you can do with properly balanced, impedance-matched and equalized stuff - a lot of that technology came out of the work done for transatlantic cables. And then we agonize over 1.5 m interconnects...

Julf · May 15, 2012

Seeing the measurable differences between sample rate converters at SRC Comparisons

That link seems to illustrate a downsampling from 96 to 44.1 kHz. I am not sure that is a valid illustration of the possible harmful effects of a factor-of-2 upsample.

eliminating SRC in the chain to the extent possible may be a worthwhile goal.

It may be a worthwhile goal, but I am afraid it would require getting the recording industry to record *and distribute* all the music we want in true hi-res. Not sure there are enough of us to make it financially viable for the record industry.

Julf · May 15, 2012

Barry,

While I'd love to see more work being done at higher rates, the overwhelming majority of multitrack recordings and mixes I've seen from most studios have been 24/44.1. While the interfaces can often handle higher rates (most up to 24/96 and some up to 192 - though fewer do this well, than have that number in their spec sheets) it appears to be computer power that is lacking most often.

That is sad to hear - I was actually under the (clearly false) impression most studios had upgraded. There really is no excuse - I can understand the computing power being a limitation 5 years ago, but with the latest generations of (multicore) CPUs, the required power has become really affordable.

Sure, I still (vaguely) remember how proud we were of being able to do 2 channels in real time at 12 bit / 32 kHz, but that was 25 years ago

perhaps more investment is needed in faster computer systems.

Perhaps

But that would require there to be a "business case" for it (= demand)

Julf · May 15, 2012

when applying EQ or other processing, I find the results sound better at higher rates.

Absolutely - it definitely makes sense to do as much of the processing as possible at higher resolution (in order not to throw away information because of lack of precision and "headroom"), and only do the sample rate conversion at the last possible stage.

Julf · May 15, 2012

Jud - I'm just suggesting that when we're thinking of what our computers should feed our DACs, and what our DACs should be able to process, to the extent practical we ought to aim for equipment and software that can preserve hi-res material in its original format all the way through the DAC;

That is definitely something I agree with. Preserve the original format as far as possible.

and furthermore, that we ought to aim for material that needs little or no oversampling once it hits the DAC, since I'm guessing SRC in the DAC may not work as well as something like, e.g., iZotope.

If that implies upsampling everything to the highest rate the DAC is capable of, I don't see much point in just adding empty air into the audio files. And anyway, if you remove the role of the upsampling algorithm in the DAC, what can we then spend our time tweaking and arguing about?

Julf · May 15, 2012

That wasn't actually what I was trying to say.

In that case my apologies for misunderstanding what you were saying.

I think the interpolation facilities of the best SRC software are good enough that while the additional samples are not actual recorded samples, they are not quite "empty air" either.

I have to agree that the best non-realtime sample rate conversion algorithms are probably better than the on-the-fly ones used in the current crop of DACs. The engineer in me still thinks "early upsampling" is solving the wrong problem. With the processing power of the chips used increasing constantly, the non-realtime algorithms used on a PC today will be implemented in real time in a DAC in a couple of years. Do we want to increase the cost of our equipment significantly to overcome that temporary performance gap? Some people will say "yes", but I would say that money on those resources are probably better used elsewhere in the chain.

What I was referring to is that it would be quite nice to have material recorded in 176.4/192 or even 352.8/384, or DSD, that would require only 2x (in the case of 176.4/192) or no oversampling at all in either the computer or the DAC. There is at least some non-negligible amount of material available in 176.4/192 and DSD, though one could wish for both more and cheaper.

And it will be a cost issue - and again I am questioning (but that is just my personal opinion) if the potential improvement in going beyond 96/24 is justified in terms of complexity and cost, compared to concentrating the effort to areas where the returns are much clearer (speakers and room). For some people it might be, for others not.

Julf · May 16, 2012

Quite pathetic speed, goes just fine even over WLAN transmission link. Or easily at eight channels over gigabit ethernet.

Not all of us have gigabit ethernet to our homes yet - somewhat ironically, as I spent a lot of time 10 years ago preaching the benefits of gigabit ethernet as a delivery media instead of SONET/SDH.

Why would it ever go to disk? I'm performing upsampling on the fly during playback.

Because that is what a lot of the discussion was about - pre-upsampling (or better, recording in high resolution in the first place) versus upsampling at playback time. Architecturally what you are doing is moving the processing from the DAC to the main computer, but it is really just a different way to do the DAC processing - a slightly different thing from what Jud was discussing.

Julf · May 16, 2012

Those of us who were into sampling from the stone age (remember the 8-bit Ensoniq Mirage or original Emu Emax?) can attest to what quantization error does to sound quality. Going from 8 to 12 then to 16 bit sampling tremendously improved sound quality.

Oh yes . I remember how, after having played a bit with the 8-bit Emulator, I got my hands on the 12-bit Yamaha TX16W - such an improvement in sound quality (once you had loaded the OS from floppy disks).

Speaking of 8-bit sampling, my favourite performance is still Peter Langston's version of Some Velvet Morning ("by Eedie & Eddie And The Reggaebots")

(done on the ancient DECtalk speech synthesizer (made famous by Stephen Hawking) , an Ensoniq Mirage, a Casio CZ-101, and a set of classic Yamaha gear (DX7, TX816 and RX11))

The Optimal Sample Rate for Quality Audio

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Create an account or sign in to comment

Create an account

Sign in