The walking dead and "blind listening"

Sal1950 · October 18, 2015

It's puzzled me for years why some always try to discredit an audio review by calling it invalid if it wasn't a
"blind listening" test.

What's so hard for you subjective listeners to understand?

It's very simple,

Without blind listening confirmation any review will all ways be just that persons opinion.

You can protest all you like and tell the world what "golden ears" you have, but any review has no more validity than the next guys without proof of blind tests and measurements.

sandyk · October 18, 2015

You can protest all you like and tell the world what "golden ears" you have, but any review has no more validity than the next guys without proof of blind tests and measurements.

Very few have the ability to organise properly conducted and scientifically verifiable UNBIASED Blind Tests.

Properly organised Blind Tests take a lot of organising and correctly setting up, and are rarely performed due to the time and cost constraints involved.

Many of those who keep insisting on Blind Tests couldn't even organise a "Chook Raffle" in a Pub !!!

Sal1950 · October 18, 2015

Many of those who keep insisting on Blind Tests couldn't even organise a "Chook Raffle" in a Pub !!!

Cheap insults add no validity to your position.

kumakuma · October 18, 2015

Very few have the ability to organise properly conducted and scientifically verifiable UNBIASED Blind Tests.
Properly organised Blind Tests take a lot of organising and correctly setting up, and are rarely performed due to the time and cost constraints involved.

Many of those who keep insisting on Blind Tests couldn't even organise a "Chook Raffle" in a Pub !!!

I understand that a blind test may take a bit longer to perform but what are the "cost constraints" involved?

sandyk · October 18, 2015

I understand that a blind test may take a bit longer to perform but what are the "cost constraints" involved?

Suitably high quality amplification and speakers, optimised room acoustics, highest quality source material ,( NOT via flawed USB either) and optimised seating for all participants, (no big heads/upper bodies in the way) and to be scientifically valid, numerous repeats without generating listener fatigue which will result in false negatives.

The switching would need to be seamless, preferably using well implemented remote controlled relay switching instead of a piss poor test like using Foobar 2K which is far from "The Gold Standard" as a software player.

If the S/W route was used you would probably need to use something like Miska's HQ player.

Obviously, it would need a lot more care in setting up than at a typical HiFi show with tiny rooms etc. and your typical home listening room is far from suitable.

kumakuma · October 18, 2015

Suitably high quality amplification and speakers, optimised room acoustics, highest quality source material ,( NOT via flawed USB either) and optimised seating for all participants, (no big heads/upper bodies in the way) and to be scientifically valid, numerous repeats without generating listener fatigue which will result in false negatives.
The switching would need to be seamless, preferably using well implemented remote controlled relay switching instead of a piss poor test like using Foobar 2K which is far from "The Gold Standard" as a software player.

If the S/W route was used you would probably need to use something like Miska's HQ player.

Obviously, it would need a lot more care in setting up than at a typical HiFi show with tiny rooms etc. and your typical home listening room is far from suitable.

Other than the switching, everything you mention would seem to apply to sighted tests as well.

sandyk · October 18, 2015

Other than the switching, everything you mention would seem to apply to sighted tests as well.

But we aren't talking about sighted tests here. Even at our regular Sydney listening sessions, there is so much gear powered up to avoid warm up problems, that some of us aren't even aware which actual component was being used until told later.

( I usually don't know , as after the Spinal op. last year, I remain seated while they fiddle around behind the equipment, changing interconnects etc.)

kumakuma · October 18, 2015

But we aren't talking about sighted tests here. Even at our regular Sydney listening sessions, there is so much gear powered up to avoid warm up problems, that some of us aren't even aware which actual component was being used until told later.
( I usually don't know , as after the Spinal op. last year, I remain seated while they fiddle around behind the equipment, changing interconnects etc.)

Sounds like you are already doing blind testing. May not be "scientifically verifiable" but good enough for 99.9% of the folks out there.

mmerrill99 · October 18, 2015

I understand that a blind test may take a bit longer to perform but what are the "cost constraints" involved?

I believe the "cost constraints" are the time inputs from various people needed for a properly conducted blind test

I saw this description of blind testing from ThorstenL on DIYaudio once & took note of it:

First - Statistics, understand that the underlying analysis method is statistics. The issue is that we try to determine if the test result was likely due to chance or an actual difference being present.

Such statistical analysis is subject to two possible errors, one is a "Type A" error, where we conclude in error that a difference is present when the test results were caused by chance and a type B error where we conclude that no difference exists when in fact one is present.

Les Levental presented the required math to work out the likelihood of both errors. The upshot is simple, with small datasets (the N I have been referring to) and a high significance level (such as commonly demanded by the ABX/DBT Mafia) will result in a very large risk of Type B errors.

IN PRINCIPLE one may perform a much more advanced statistical analysis that gives more meaningful results, but the ABX/DBT Mafia in their never ending quest to rid audio from the Bogey Man and Voodoo refuse to engage in such practices, as this would stop returning reliably "null" results from their tests, which invariably feature very small N and high significance.

So much for the statistics. It means if you want a test that will not reliably gloss over existing differences and you want to put one over on the ABX/DBT Mafia you require a very large N. If you do the test with very few participants and sensibly low numbers of trials you are playing "their" game and you will return "null" results.

So much on statistics.

Second, human perception. It is fickle and deceptive.

NB1 - Test situations create stress (remember when you did you any final exams).

Any stress mitigates against perception sensitivity. As we cannot avoid the knowledge of a "test situation" we need to reduce stress.

Long test sequences show increasing stress. Usually five sequential presentations in one go tend to be an observed limit, before attention flags off and stress rises and sensitivity goes to hell.

So make sure to have multiple breaks, preferably activity filled that can re-set stress & attention levels, between presentations. As you are going to have very few participants the best you can do is run multiple tests with the same participants. While in my view it does not really adds to confidence levels, it is acceptable to the ABX/DBT Mafia and hence you may as well employ it, to maximise your chances.

NB2 - beware of expectation bias.

Expectation bias is the result of having expectations about the outcome while undertaking the test. I once did a blind test at an audio society meeting, comparing my own modifications of a Marantz CD-67 to a stock machine. I knew my mods were better. I knew my machine would win.

(Note, Sy's measurements would have pegged them "not different", Note 2 - Yes it is the CD-67 from the TNT-Audio modification article)

Well, once the "blinds" were down I could NOT hear any difference. I literally could not. But over 70% of those present (a group of maybe a little under 20, mostly seated very sub-ideally) could hear differences with good reliability (4/5) and expressed a preference for the modified unit.

Intrigued by my own failure I did another test, in which I told the participants we would test mains cable sound differences. The group was very small four plus me, so statistical significance is low, we may treat this as "anecdotal evidence".

Most had no particular opinion on this, all cables were cheap to make DIY Mains cables and should they prove effective, anyone could build their own. One participant was a EE with BBC background very vocal cable sceptic. I professed openly, that I had not listened either and I had no idea if there was a difference or not, but would like to find out.

What I really did was to reverse the polarity of one speaker in the stereo set. Everyone heard the differences correctly 10/10, except out "cable sceptic" WHO SCORED RANDOM! Due to the small N no real significance, but one may assert that in this case expectation bias was sufficient to mask something as gross as wiring one speaker out of polarity.

third - Calibrate your tests.

Just as I ask that any measurements are accompanied by demonstrations of the limits the equipment can resolve and as proof that the tests are competently implemented, any listening test should be calibrated with "known audibly different" stimulae, to ensure sufficient sensitivity to at least distinguish known audible phenomenae.

I suggest as a minimum polarity reversal of one channel, 1dB level difference and 0.3dB level difference, I personally feel that at times adding both channel polarity reversal helps to screen out "clothears", but this is not generally acknowledged as "audible", consider this as one to do for "extra credits" only.

In fact, the routine lack of any such calibration in the ABX/DBT Mafia's testing is one of my strongest arguments (together with "awfully bad statistics") towards the need to simply disregard their extensive set of published "null" results as evidence.

Fit the fourth - let's get down to the dirty work

I would suggest that you should try for an N no lower than 100, This allows you to select a significance level that has about equal risks of type A & B errors. With 20 participants and 5 trials each this requirement is satisfied.

As we would like some "calibration" and would like to remove "clothears" but keep "goldenears" you probably will need to start with a larger number of participants and not to be rude, you should not dismiss the "clothears" early.

So perhaps 50 Participants with the 20 who score highest in the preliminary calibration tests having their actual tests included in the final analysis (you should not cherry-pick people who score highly in the final real tests, but excluding participants that show a low sensitivity to known audible stimulae is acceptable).

Now, you can of course use much smaller numbers and all that, but not only do you find that the significance (that is the general applicability of your test results) is very low, but equally the risk of type B errors (erroneously returning null results) becomes unacceptably large.

If following all that you do not feel much like undertaking a "controlled listening test", I cannot blame you.

kumakuma · October 18, 2015

I believe the "cost constraints" are the time inputs from various people needed for a properly conducted blind test

I saw this description of blind testing from ThorstenL on DIYaudio once & took note of it:

Thanks for sharing this. Makes a lot of sense.

Jud · October 19, 2015

Thanks for sharing this. Makes a lot of sense.

I think there's even more to it than this, involving some specifics regarding the science of human hearing. Been reading academic papers and hope to write an article eventually. But Real Life is busy and it's proving very hard to find the time, so no one hold should be holding his/her breath in anticipation.

Sal1950 · October 19, 2015

"If following all that you do not feel much like undertaking a "controlled listening test", I cannot blame you."

Then you fall right back to any review done without these controls is strictly an opinion with no proven scientific validity.

You can't avoid scientific confirmation and call your proclaimed results fact. It just don't work that way.

speavler · October 19, 2015

Suitably high quality amplification and speakers, optimised room acoustics, highest quality source material ,( NOT via flawed USB either) and optimised seating for all participants, (no big heads/upper bodies in the way) and to be scientifically valid, numerous repeats without generating listener fatigue which will result in false negatives.
The switching would need to be seamless, preferably using well implemented remote controlled relay switching instead of a piss poor test like using Foobar 2K which is far from "The Gold Standard" as a software player.

If the S/W route was used you would probably need to use something like Miska's HQ player.

Obviously, it would need a lot more care in setting up than at a typical HiFi show with tiny rooms etc. and your typical home listening room is far from suitable.

all participatants? i would never participate in a listening test unless i was smack dab in the sweet spot at all times

Teresa · October 19, 2015

"...Then you fall right back to any review done without these controls is strictly an opinion with no proven scientific validity.
You can't avoid scientific confirmation and call your proclaimed results fact. It just don't work that way.

That is not what an audio review is! I have never read an audio equipment review that claimed to have any proven scientific validity in regards to the sound the reviewer heard in their system. The only thing scientific are any measurements they made, the rest is the opinion of the reviewer.

If a reviewer recommends a component in a review they might say it might be worth your audition but never go so far as to try to tell you how it will sound to you. If they did I would be screaming bloody hell. So….

Teresa · October 19, 2015

What's so hard for you subjective listeners to understand?
It's very simple,

Without blind listening confirmation any review will all ways be just that persons opinion.

You can protest all you like and tell the world what "golden ears" you have, but any review has no more validity than the next guys without proof of blind tests and measurements.

You are way too gullible IMHO. Even with blind listening confirmation it is still only be that individual reviewer's opinion since we all hear differently, have different listening rooms, etc. One person cannot listen for another person, that's impossible!

I am skeptical of all reviews, specifications and tests, I have to hear the thing with my ear/brain system, with my audio system, in my room, with a wide variety of music over many weeks before I can determine if I like it or not.

This can be done blind to calm any fears you might have if you have someone willing to assist you. However, avoid quick A-B'ing as your brain will defeat you. I explained this in more detail in Post 26

Blind or sighted AB or ABX testing does not work as it fails to reveal anything expect very, very large differences. The biggest problems being:

Cognitive bias - your brain will fill in missing information thus making both sound the same on repeated switching.
Listener Fatigue - switch back and forth too much and both will sound like crap.

To conclude: there is no replacing long term listening with a iron-clad money-back guarantee. A/B’ing doesn’t work with any of the five human senses as our brain tends to equalize things. Comparing anything audio is one of the hardest and time consuming things I have ever done, I abhor it! I prefer listening to music for pleasure, and only compare when I absolutely have to.

Jud · October 19, 2015

Blind or sighted AB or ABX testing does not work as it fails to reveal anything expect very, very large differences. The biggest problems being:

Cognitive bias - your brain will fill in missing information thus making both sound the same on repeated switching.

Listener Fatigue - switch back and forth too much and both will sound like crap.

There is solid peer-reviewed scientific literature supporting what you say here.

r_w · October 19, 2015

LOL...

I believe it's also called "having a life".

;-)

I have the answer...invite them over for a beverage of their choice and have a listen to their favorite tracks. Then go to their place and have a listen with your favorite beverage and your favorite tracks. Then go together to a concert and discuss the music over your favorite beverages.

It's called friendship...

John

mmerrill99 · October 19, 2015

"If following all that you do not feel much like undertaking a "controlled listening test", I cannot blame you."

Then you fall right back to any review done without these controls is strictly an opinion with no proven scientific validity.

You can't avoid scientific confirmation and call your proclaimed results fact. It just don't work that way.

This is a silly, unscientific attitude - you want to accept a finding just because it has a smell of science about it but when it's pointed out what you need to do to make it truly scientific you balk at the requirements?

Results are either valid & rigorous or not. Both sighted & pseudo-scientific "blind" tests are neither.

Live with it!

Daudio · October 19, 2015

Then you fall right back to any review done without these controls is strictly an opinion with no proven scientific validity.
You can't avoid scientific confirmation and call your proclaimed results fact. It just don't work that way.

"None so deaf as those that will not hear. None so blind as those that will not see"

mmerrill99 · October 19, 2015

I think there's even more to it than this, involving some specifics regarding the science of human hearing. Been reading academic papers and hope to write an article eventually. But Real Life is busy and it's proving very hard to find the time, so no one hold should be holding his/her breath in anticipation.

Hey, just give us an executive summary or links, please - very interested in this, Jud.

firedog · October 19, 2015

I have no possibility of auditioning even a small portion of the components I'm interested in. Reviews help me narrow down the components I decide to make a special effort to hear. If I'm also familiar with how someone writes and what they like, I can be pretty sure I will like/prefer components they prefer. At the moment, there are even a couple of reviewers who have the same DAC and speakers as me, so I give extra weight to their reviews.

So measurements are welcome, but I'm more interested in hearing what these people think of various components. I've heard lots of devices with great measurements that I personally didn't like (others did).

mmerrill99 · October 19, 2015

I have no possibility of auditioning even a small portion of the components I'm interested in. Reviews help me narrow down the components I decide to make a special effort to hear. If I'm also familiar with how someone writes and what they like, I can be pretty sure I will like/prefer components they prefer. At the moment, there are even a couple of reviewers who have the same DAC and speakers as me, so I give extra weight to their reviews.

So measurements are welcome, but I'm more interested in hearing what these people think of various components. I've heard lots of devices with great measurements that I personally didn't like (others did).

Indeed & this is the sensible approach to evaluating what works for you - a personal evaluation in your own playback system.

Anybody who decides (or ignores) on a purchase of audio equipment based solely on audio reviewers (or "blind tests") is operating on the basis of a belief system founded on very shaky premises

Paul R · October 19, 2015

What's so hard for you subjective listeners to understand?
It's very simple,

Without blind listening confirmation any review will all ways be just that persons opinion.

You can protest all you like and tell the world what "golden ears" you have, but any review has no more validity than the next guys without proof of blind tests and measurements.

Actually, you make a perfectly valid point.

The question back to you is, so what's wrong with taking the opinion of someone who is both talented and skilled in listening to audio equipment?

In fact, those are exactly what you want to gather, opinions. Then listen and form your own.

Would you say that it makes more sense to buy a piece of kit based purely on the technical specifications, without listening to it? Or does it make more sense to buy a piece of kit based upon a few expert opinions, again without listening to the gear first?

Me? I want the specs first, then the opinions, then a listening trial of my own. Perhaps two or three trials even.

-Paul

davide256 · October 19, 2015

Quote on the wall in Albert Einstein's Princeton office

"Not everything that counts can be measured. Not everything that can be measured counts"

christopher3393 · October 19, 2015

On this issue, Tyll Hertsen's recent summary of "Big Sound 2015" sounds very sensible to me overall. It includes subsections on blind and sighted listening. Would be interested in responses:

Big Sound 2015 Wrap: What I Learned | InnerFidelity

The walking dead and "blind listening"

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Create an account or sign in to comment

Create an account

Sign in