Has there ever been a conclusive listening test?

Norton · November 1, 2018

Has there ever been a methodologically robust blind listening test of digital audio formats, codecs etc that resulted in subjects being able to distinguish between, identify and/or express a preference for particular formats to a statistically significant degree? For example, has any test demonstrated conclusively that subjects could distinguish between 16/44 and 24/96, and/or were able to identify which was which, and/or expressed a preference for 24/96?

esldude · November 1, 2018

Almost. There was a very well done university test in which people could distinguish a purist recording done at 96/24 from the same file downsampled to 44/16. However, they failed to distinguish a concurrent recording of the same music done natively at 44/16. Which leads to the conclusion the downsampling process was audible. Perhaps the natively done recordings weren't. I may be messing up small details. It may have been 44/24. In any case, the results were pretty clear the downsampled file was audible. I'll see if I can find the link to that.

BTW, such testing rarely asks for a preference. Normally it looks for someone able to discern a difference only.

Here is my opinion of the issue. There is never going to be a bright sharp dividing line between clearly audible levels of digital and absolutely inaudible levels of digital audio. Because people's hearing ability varies, how to hear certain artifacts varies, the type of music and listening conditions vary. So it is always going to be a soft wide dividing line.

An example is MP3 versus CD. High bit rate MP3 is difficult to discern for much music and most people. Certain kinds of music with excellent listeners and listening gear it is definitely discernible.

Next how about CD versus higher sample rates and deeper bit depths? Hmmm, that has never been clearly and conclusively shown to be so with any gear, listeners or music unless some obvious unusual circumstance was involved. Like the resampling algorithm spoken of above.

So I wouldn't feel absolutely certain saying no one under any circumstances can hear a difference with 44 or 48 khz rates versus 96/24. But I'd feel confident saying almost never is such the case. Further if there is a difference some can hear, it must be a very, very small difference. Here we are more than 30 years after the introduction of the CD and no one has been able to cleanly show it is possible. That could only be true if the difference were tiny. And if fact it may be no one can hear the difference.

If the difference were as large as that between SD and HDTV, then there'd have been agreement long ago. So the audible difference is either damnably small or non-existent. Virtually any other factor you could alter would make a bigger difference. The quality of mastering, quality of recording, treatment of your listening room, better speakers, better speaker positioning etc. etc. All of these and many more totally swamp any difference in CD and 96/24. So a superbly recorded, superbly mastered CD is missing at most a tiny sliver of quality vs 96/24. And the tremendous majority of recordings you can lay hands on (99% at least) have issues in recording or mastering that mean there is no chance to hear such difference.

esldude · November 1, 2018

You can download the full text PDF here:

https://www.researchgate.net/publication/257068631_Sampling_Rate_Discrimination_441_kHz_vs_882_kHz

They were comparing 88.2 vs 44.1 and 44.1 downsampled. All sample rates were 24 bit. The gear used was top quality. You will rarely buy recordings that are this pure and unmolested in some way.

I might pick a few nits with a couple statements in their conclusions. There are a couple oddities in the results.

None of this argues against my idea that 88 or 96 might be audible, but not enough to be a big difference. If you did everything at 44 or 48 you'll be losing very little if anything. It is not like a recording where the only difference is the sample rate produces one wonderful recording at the higher rate, and a horrid hard to listen to result at the lower sample rate.

Norton · November 4, 2018

Thanks, I thought I’d wait for any contrary replies before responding. My impression was just so, that I’d never read of a conclusive listening test comparing formats, although it’s not a subject I’d particularly seek out.

But it does beg the questions: is this because:

a. all formats beyond ( and maybe including) RBCD are just a “confidence game” where the consumer equates numbers with SQ in the abence of a legitimate quality reference point, what the cutural theorist Roland Barthes called the quantification of quality; or

b. because listening tests have been flawed to date? I’m guessing that many involve an often small number of subjects having to make relatively quick decisions with unfamiliar ( and maybe disliked) music and equipment. I wonder whether something more conclusive might result if subjects were allowed to thoroughly familiarise themselves with the samples on their home system for a week or two before then being tested blind, again at home. Subjects and sample set used broadly aligned by musical taste. Of course the system would then be a variable.

It does also point up though that it is highly selective to use inconclusive listening tests to dismiss any one format in particular, when likely the same would result for any format beyond higher quality MP3.

esldude · November 4, 2018

2 hours ago, Norton said:

Thanks, I thought I’d wait for any contrary replies before responding. My impression was just so, that I’d never read of a conclusive listening test comparing formats, although it’s not a subject I’d particularly seek out.

But it does beg the questions: is this because:

a. all formats beyond ( and maybe including) RBCD are just a “confidence game” where the consumer equates numbers with SQ in the abence of a legitimate quality reference point, what the cutural theorist Roland Barthes called the quantification of quality; or

I think (a.) just about describes it as it is.

2 hours ago, Norton said:

b. because listening tests have been flawed to date? I’m guessing that many involve an often small number of subjects having to make relatively quick decisions with unfamiliar ( and maybe disliked) music and equipment. I wonder whether something more conclusive might result if subjects were allowed to thoroughly familiarise themselves with the samples on their home system for a week or two before then being tested blind, again at home. Subjects and sample set used broadly aligned by musical taste. Of course the system would then be a variable.

It does also point up though that it is highly selective to use inconclusive listening tests to dismiss any one format in particular, when likely the same would result for any format beyond higher quality MP3.

There have been some tests, and your wondering about spending more time being better etc. etc., and the results such as they are have shown shorter listening segments works more reliably and to smaller levels of discernment.

On the flip side, actual levels of degradation do respond to training. MP3 is a good example. Artifacts of the encoding can be demonstrated to listeners at very low sample rates like 92 kbps. At this level they are obvious. Then at 128 kbps. And then 160 kbps so on and so forth. Listeners can then hear artifacts with some suitable material at the higher rates when they would have missed them prior to the training.

One could use that approach with sample rates. Start with an 8 khz rate which everyone could listen to and hear as different vs 88.2 or 96 khz rates. Then raise the rates a little at a time and see where people no longer hear a difference vs those hirez rates. Is it only at those high rates or lower? I don't know of that being done by someone like McGill University or some other academic outfit. That would be a much better approach. Test some young trained listeners and see where they don't hear a difference anymore.

I don't consider that I was using an inconclusive listening test to dismiss a format. There have been many inconclusive tests. That leads me to think the reason is there is very little to no difference. A preponderance of evidence if not proof. There have been many blind listening tests done with MP3 which is detected vs 44.1 uncompressed sound. So the method works. MP3 was developed by such testing.

To my knowledge testing like that wasn't used to create the CD standard. Rather experience with even earlier digital recording methods with sample rates of 30khz, 32 khz, and 37 khz showed they weren't enough. So they just designed the CD standard by spec to what was known about human hearing limits, and how PCM systems worked. It would appear even at this late date they didn't choose too badly whether it is capable of 100% audible fidelity or just 99%.

Norton · November 4, 2018

13 minutes ago, esldude said:

I don't consider that I was using an inconclusive listening test to dismiss a format

Thanks, my final comment above was not aimed at anything in this thread, but rather at a wider pointing to lack of discernment in listening tests as being evidence of a specific format not being worthwhile, carrying the erroneous implication that such listening tests did validate other formats.

esldude · November 7, 2018

I had wanted to include this test with its unusual method and results. I couldn't find it, but tripped over someone else discussing it today.

http://www.extra.research.philips.com/hera/people/aarts/RMA_papers/aar07pu4.pdf

It is for a surround setup test of mechanical sounds. It has even more curious results. Over both conditions of the test the average results indicate no winner in DXD vs 44 rates when using direct analog as a reference. The two conditions were a system with 100 khz bandwidth on the microphones and playback gear, and one having microphones and playback gear limited to 20 khz. DXD is 355 or 384 khz rates at 32 bits.

They had a good surround setup with really good gear. They put microphones in an anechoic chamber and had microphones for each channel of the surround rig with no processing available as a direct analog real time feed. They concurrently had a DXD ADC/DAC available to listen to and a 44 khz ADC/DAC to listen to. Listeners could use the direct analog as a reference and had to choose which of two unknown digital systems were closest to the analog reference.

In the 100 khz system listeners selected at a significant level a preference for 44 khz sounding like analog direct over DXD. These were forced choice tests you had to select one or the other. In the 20 khz limited system listeners had a significant level of preference for the DXD system. Odd and curious results don't you think? And not one that unambiguously points to high sample rates being better in either paper. Would have been nice to include an 88 or 96 rate shootout as well.

There is an edge case where the results make some sense. With filtering very near the audible band, and using microphones and speakers that also trail off at that point you have in a sense compounded filters at that frequency. And there are interactions that can be audible when that happens especially when part of the filtering is not linear which with speakers and maybe microphones that is the case. Which would point to 88 or 96 moving those effects apart so there may sometimes be a benefit to those. It doesn't necessarily indicate more is better or that there is any benefit to more than 96 rates. And even then it may only sometimes be a difference depending upon the quality of the recorded sound and playback gear.

Norton · November 9, 2018

Thanks, that certainly is an unusual test with maybe a counter intuitive outcome, although the author still seems to consider that the results justify the use of DXD over 16/44 for archiving.

Overall, it doesn’t do much to dispel my impression that either 1. listening tests as conducted to date are simply not a meaningful tool to compare digital formats or 2. that there really is no significant difference between formats. I tend to think the former is more likely, the author does note: “One can argue that providing unfamiliar music or sonic environment has made it more difficult for listeners to effectively judge audio quality”. I suspect that factor has played a significant role in the apparent lack of conclusive results hitherto from listening tests in general.

KingRex · November 10, 2018

I have found the server and DAC influence the sound way more than the source resolution. While I might think a high resolution rip sounds slightly this or that to the native CD rip, a better DAC or server upgrade always portrays a obvious heard improvement, or shall I say heard alteration in sound quality.

Rt66indierock · November 13, 2018

Norton,

About ten of us earlier did a listening test of The Portland State University Chamber Choir's latest "The Doors of Heaven." All of of us are Portland State alumni, like the music and are familiar with Magnepan 1.7i speakers. The group preference out of FLAC 24/88.2, CD, MP3 and iTunes was the iTunes. This album was recorded by Stereophile editor John Atkinson.

I've done and been around enough recordings that the extra room of 24 bits is nice to work with but listening to good CD recordings has always been enough. I might be biased a bit because the only way I'm going to listen to Americana and Alt Country is to record them in high resolution myself.

And as I remind people always "You are going to find high resolution a very hard sell."

Sign In

Has there ever been a conclusive listening test?

Question

Norton

Link to comment

9 answers to this question

Recommended Posts

esldude

Link to comment

esldude

Link to comment

Norton

Link to comment

esldude

Link to comment

Norton

Link to comment

esldude

Link to comment

Norton

Link to comment

KingRex

Link to comment

Rt66indierock

Link to comment

Create an account or sign in to comment

Create an account

Sign in

Activity

Immersive

Subscriptions

My Details