Audio Blind Testing

mevdinc · January 6, 2018

First, I wish everyone a Happy New Year.

Not sure if this has been covered here before but just came a cross this article on audio blind testing with a link to the original archives with lots of samples and information.

https://headfonics.com/2018/01/seven-keystones-accurate-audio-blind-testing/

Speedskater · January 6, 2018

A problem with blind testing is that they try to do a big test with many listeners and many repetitions. They should start with a very small informal test. That would be one interested audiophile, where he chooses the music, the volume and the sample time. If he can't easily hear a difference then there's no reason to expand the test.

Ralf11 · January 6, 2018

Thx - writing style is odd; was it translated?

gmgraves · January 6, 2018

7 hours ago, Speedskater said:

A problem with blind testing is that they try to do a big test with many listeners and many repetitions. They should start with a very small informal test. That would be one interested audiophile, where he chooses the music, the volume and the sample time. If he can't easily hear a difference then there's no reason to expand the test.

Actually, I think that you do need a fairly large participant pool. What we hear, and to what we attribute what we hear, is largely predicated on such variables as how we feel at that particular time, our familiarity with the sound source (audio system, room) and whether or not we are tuned to hear the differences that are being investigated by the test. The idea of the "golden-eared audiophile" is no myth. While we don't actually have hearing that is more acute than that of the average joe, we have trained ourselves to be sensitive to characteristics in reproduced music that your average Joe neither knows or cares about. Many such things most people won't even notice. OTOH, different audiophiles tend to focus their critical sensibilities upon those characteristics that particularly interest them. Some focus on soundstage, and don't particularly care whether or not their audio system has ruler-flat frequency response, while another might be very sensitive to even small changes in perceived distortion. In order to statistically level the "listening field" it is necessary, in my opinion, to incorporate a number of different listeners and to use written private ballots to register whether or not differences were heard in each individual trial.

IOW, just because I don't hear a particular difference, don't mean that you won't hear it either.

Kal Rubinson · January 6, 2018

8 hours ago, Speedskater said:

A problem with blind testing is that they try to do a big test with many listeners and many repetitions. They should start with a very small informal test. That would be one interested audiophile, where he chooses the music, the volume and the sample time. If he can't easily hear a difference then there's no reason to expand the test.

4 minutes ago, gmgraves said:

Actually, I think that you do need a fairly large participant pool.

I agree. How is the "one interested audiophile" selected and qualified? The whole significance of statistical assessment is to not rely on any single anecdote.

Ralf11 · January 6, 2018

True, if a magazine evaluation, etc. (or for Magneplanar deciding if they want to change an existing product).

These would be 'cells' in an analysis of variance design.

For a person testing equipment for themself, the number of interested audiophiles is 1. Multiple tests should still be done.

fas42 · January 6, 2018

The biggest problem in audio at the moment is, "Testing equipment issues". The Foobar200 ABX mechanism is hopelessly flawed, because its internal processing changes what is being tested, and degrades the quality substantially; Lacinato ABX has the potential to be far superior, but I haven't assessed it either.

All the other things mentioned are irrelevant if the core functionality is not right - good for debating sessions, only ...

gmgraves · January 6, 2018

1 hour ago, Kal Rubinson said:

I agree. How is the "one interested audiophile" selected and qualified? The whole significance of statistical assessment is to not rely on any single anecdote.

Absolutely. There must be some consensus of opinion on differences heard, and how can there be a consensus of one?

GUTB · January 7, 2018

I have nothing to add to the topic of audio blind testing -- except to say that it won't work unless there are very large differences in sound. Not to any degree of acceptable mathematical rigor, anyway. This is due to the ear-brain issues that are difficult to control for.

I just wanted to post that whenever I see "audio blind testing" I want to read it as "audio bling testing", which would be a much more entertaining thread.

Ralf11 · January 7, 2018

which ear-brain issues are difficult to control for?

GUTB · January 7, 2018

1 minute ago, Ralf11 said:

which ear-brain issues are difficult to control for?

Phenomena like stress-induced loss of listening discernment and audio memory.

Kal Rubinson · January 7, 2018

3 hours ago, GUTB said:

Phenomena like stress-induced loss of listening discernment and audio memory.

What stress? I do it for fun.

motberg · January 7, 2018

4 hours ago, Ralf11 said:

which ear-brain issues are difficult to control for?

inability to unlearn an auditory illusion

gmgraves · January 7, 2018

3 hours ago, GUTB said:

4 hours ago, GUTB said:

I have nothing to add to the topic of audio blind testing -- except to say that it won't work unless there are very large differences in sound. Not to any degree of acceptable mathematical rigor, anyway. This is due to the ear-brain issues that are difficult to control for.

I just wanted to post that whenever I see "audio blind testing" I want to read it as "audio bling testing", which would be a much more entertaining thread.

One again GUTB exhibits a profound ignorance of the subject being discussed. It has been proven beyond a doubt that double-blind testing (where no one, not the assembled listeners, and not the people performing the test have any idea of which device under test is being played at any given instant) is the only way to reliably hear differences in components. when nobody knows, other than by a designation such as "A" or "B" or "1" or "2" what they are listening to. This absolutely removes from the listening equation any sighted or expectational bias. IOW, you won't pick your new $800/pair interconnects as sounding better simply because you just paid $800 for them! Believe me that kind of thinking colors any other kind of evaluation. Many would include ABX testing in this, but I don't simply because nobody has been able to ever convince me that the ABX comparator doesn't color the results. The only way is to manually swap out those devices under test.

sandyk · January 7, 2018

8 minutes ago, gmgraves said:

Many would include ABX testing in this, but I don't simply because nobody has been able to ever convince me that the ABX comparator doesn't color the results. The only way is to manually swap out those devices under test.

I agree.

The only way is to have somebody else not connected with the listening decisions part, to manually swap out the devices under test behind the scenes.

Ralf11 · January 7, 2018

4 hours ago, GUTB said:

Phenomena like stress-induced loss of listening discernment and audio memory.

I'd be interested in any citations to the literature finding either of the above.

GUTB · January 7, 2018

6 minutes ago, Ralf11 said:

I'd be interested in any citations to the literature finding either of the above.

These are areas I don’t think which have undergone any clinical research. They’re very well-known phenomena. @gmgraves admits to have faced it himself, but he chooses to believe the differences he heard were in his head after being influenced by the stress of blind testing. I’ve experienced these phenomena myself in ABX testing and just regular A/B-ing where I do it enough times.

Fremer of Analog Planet did a public shoot-out of a bunch of cartridges ranging from $100 or so up to around $1k (as I recall) on a fairly modest turntable and tonearm. He digitized the output of each cartridge, kept secret which file was from which cartridge, and put out a public poll asking which one people liked best. The results showed that, as expected, the tendency was to prefer the more expensive cartridge, with the most expensive (Ortofon Quintet Black?) being the most popular. But what was super interesting was that opinion was not unanimous. I can’t imagine anyone can’t tell the difference between the bundled StudioTracker that came with my Studio Deck and the AT-OC9ML/II that I upgraded to, let alone prefer the former, yet, Fremer’s poll showed just that and worse; some people actually preferred the cheap budget junk the best. What happened very likely was that people listening to all these files one after the other started to mix them together in thier heads. The difference between some cheap elliptical MM and a highly regarded high performance line contact MC isn’t a minor one. You can’t blame setup because it was set up by one of the leading authorities in turntable setup.

Another example of audio memory at work. Have you ever listened to some familiar music through some good headphones and you picked up a detail you hadn’t noticed before? From then on you will be able to hear the detail; you’re brain has decided it should be there, so it’s there. The sound IS there, but just much less obvious due to a speaker’s poor low level sound production.

In the Music Server forum I recently complained that a tweak (low noise regulator on SSD) I went through a lot of trouble and time to implement resulted in a huge downgrade. I immediately noticed a big increase in glare / sibilance and collapse of soundstage, issues that were fixed after removing the regulator. Shouldn’t I have convinced myself by then that I was opening the gates to audio nirvana? Why did I instead perceive a big reduction in quality? I’m sure I could ABX the difference with certainty — but could I do it after 20 plays? 30? 50? I’m not sure! I say this because I’ve experienced first hand how ABX testing quickly mixes everything together.

esldude · January 7, 2018

7 hours ago, GUTB said:

snip

I just wanted to post that whenever I see "audio blind testing" I want to read it as "audio bling testing", which would be a much more entertaining thread.

We already know how important the bling factor is to you.

esldude · January 7, 2018

2 hours ago, GUTB said:

These are areas I don’t think which have undergone any clinical research. They’re very well-known phenomena. @gmgraves admits to have faced it himself, but he chooses to believe the differences he heard were in his head after being influenced by the stress of blind testing. I’ve experienced these phenomena myself in ABX testing and just regular A/B-ing where I do it enough times.

Fremer of Analog Planet did a public shoot-out of a bunch of cartridges ranging from $100 or so up to around $1k (as I recall) on a fairly modest turntable and tonearm. He digitized the output of each cartridge, kept secret which file was from which cartridge, and put out a public poll asking which one people liked best. The results showed that, as expected, the tendency was to prefer the more expensive cartridge, with the most expensive (Ortofon Quintet Black?) being the most popular. But what was super interesting was that opinion was not unanimous. I can’t imagine anyone can’t tell the difference between the bundled StudioTracker that came with my Studio Deck and the AT-OC9ML/II that I upgraded to, let alone prefer the former, yet, Fremer’s poll showed just that and worse; some people actually preferred the cheap budget junk the best. What happened very likely was that people listening to all these files one after the other started to mix them together in thier heads. The difference between some cheap elliptical MM and a highly regarded high performance line contact MC isn’t a minor one. You can’t blame setup because it was set up by one of the leading authorities in turntable setup.

Another example of audio memory at work. Have you ever listened to some familiar music through some good headphones and you picked up a detail you hadn’t noticed before? From then on you will be able to hear the detail; you’re brain has decided it should be there, so it’s there. The sound IS there, but just much less obvious due to a speaker’s poor low level sound production.

In the Music Server forum I recently complained that a tweak (low noise regulator on SSD) I went through a lot of trouble and time to implement resulted in a huge downgrade. I immediately noticed a big increase in glare / sibilance and collapse of soundstage, issues that were fixed after removing the regulator. Shouldn’t I have convinced myself by then that I was opening the gates to audio nirvana? Why did I instead perceive a big reduction in quality? I’m sure I could ABX the difference with certainty — but could I do it after 20 plays? 30? 50? I’m not sure! I say this because I’ve experienced first hand how ABX testing quickly mixes everything together.

Okay so Mikey didn't match volume levels. I mean didn't even attempt to match them. The files you can download have a level difference that ranges over 10 db. Unless you carefully match levels this is completely useless. Of those who did try and match levels how many did matching by ear? Worse than useless because they will think levels are matched (and they will be within 1 or 1.5 db), but in fact the remaining differences are plenty to skew the results dramatically. Oh and just for good measure two of the files have clipping in them.

Oh and results. The two loudest (and clipped files) and the loudest non-clipped file garnered the most votes.

Speedskater · January 7, 2018

17 hours ago, Kal Rubinson said:

I agree. How is the "one interested audiophile" selected and qualified? The whole significance of statistical assessment is to not rely on any single anecdote.

You start with an audiophile that reports that he can hear a difference. Only than can we move on.

Statistical assessment doesn't mean much. Those with a vested interest in blind tests not working, will always find fault in any statistical test. The real question should be 'can some listeners hear a difference'? If they can then we should investigate why it is that they hear the difference.

Kal Rubinson · January 7, 2018

21 minutes ago, Speedskater said:

You start with an audiophile that reports that he can hear a difference. Only than can we move on.

Statistical assessment doesn't mean much. Those with a vested interest in blind tests not working, will always find fault in any statistical test. The real question should be 'can some listeners hear a difference'? If they can then we should investigate why it is that they hear the difference.

"Why" they hear a difference is not the issue until we find out "if" they really do hear a difference and that cannot be determined from an anecdotal report.

(And, in case this inspires a "back at you" response, I am not excluding my own reviews which I do not purport to be doctrine but my honest opinion based on my reported experiences.)

Speedskater · January 7, 2018

24 minutes ago, Kal Rubinson said:

"Why" they hear a difference is not the issue until we find out "if" they really do hear a difference and that cannot be determined from an anecdotal report.

Isn't the "if" part, the first reason for doing the blind test? Only than can we move on to the "why" part. How did anecdotal reports sneak in?

Kal Rubinson · January 7, 2018

8 minutes ago, Speedskater said:

Isn't the "if" part, the first reason for doing the blind test? Only than can we move on to the "why" part. How did anecdotal reports sneak in?

I agree. Where did we determine if there is a difference? So far, we have only subjective and anecdotal opinions.

Ron Scubadiver · January 7, 2018

There is a well known blind test which showed people could not tell the difference between SACD's and CD's. Now either they cheated somehow, or a lot of people are wasting money on SACD's. You can find this on the Wikipedia page for SACD. There is another one where one set of gear was a consumer DVD player hooked up to $200 class AB amplifier (A500) with a $5 interconnect and the other set of gear was $12k of high end CD transport and so forth. A slight preference was shown for the DVD player and about a third of the subjects had no preference. Some nice stand mounts were used as speakers. Note the input level controls were used on the A500 instead of a preamp and these have documented measurable problems with introducing distortion.

It kind of makes me wonder...

Ralf11 · January 7, 2018

Ron - do a search for SACD + meta-analysis here for a cite I posted some time ago. It appears there a marginal improvement with SACDs over CDs.

GUTB - Thx for posting what you had. But, it just isn't enough...

Audio Blind Testing

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Create an account or sign in to comment

Create an account

Sign in