Jump to content
IGNORED

Does BIAS affect audio test results?


Recommended Posts

7 minutes ago, Audiophile Neuroscience said:

 

Paul, I am merely pointing out you may be suffering from the very condition you are exploring. I applaud your interest in bias. However, the conclusions you shared in the opinion piece and head fi post are just opinions that would appear to support your confirmation bias IMO.

 

If you care to discuss individual studies, okay. But the whole "clinical studies show you will have brighter teeth in just two weeks" doesn't cut it, you must examine each study. You were the one stating it was easy. That's not my experience

 

You obviously want me out of this thread. Bedtime here, anyway. 🙂

 

 

Ok, you keep diagnosing my condition :) and I'll keep asking for actual information. Not sure where I stated anything is easy. The whole reason I started this thread was to find what, if anything, can be done about proper testing methodology to eliminate obvious biases. In so far as you think any of the shared studies are mistaken, I'd love to hear why and how, and to see any evidence that demonstrates this. Accusing me of bias doesn't help, since I'm not the one doing these studies, I'm just citing them.

Link to comment
Just now, Audiophile Neuroscience said:

 

I didn't know that inviting discussion on individual studies is not appropriate in an objective thread. These studies appear to support "Blind test or it didn't happen"

 

I also feel that questioning confirmation bias in a thread about bias is germane to the topic.

 

 

 

 

If you can provide objective information, it's welcome in this thread. We have a complete forum for all other information and I encourage you to use it. You can even create your own thread and post a link to it in this one. 

Founder of Audiophile Style | My Audio Systems AudiophileStyleStickerWhite2.0.png AudiophileStyleStickerWhite7.1.4.png

Link to comment
1 minute ago, The Computer Audiophile said:

If you can provide objective information, it's welcome in this thread. We have a complete forum for all other information and I encourage you to use it. You can even create your own thread and post a link to it in this one. 

 

I invited Paul to pick a study, any study that he cited. He did not. As there was over 15 cited, I was offering that he should choose rather than me cherry pick. Again, I do not see how this offer to discuss bias and cited studies fits into *your* description of, "complete forum for all other information ".

 

The OP wants me out, so as said, I will respect his wishes.

Sound Minds Mind Sound

 

 

Link to comment
1 minute ago, Audiophile Neuroscience said:

 

I invited Paul to pick a study, any study that he cited. He did not. As there was over 15 cited, I was offering that he should choose rather than me cherry pick. Again, I do not see how this offer to discuss bias and cited studies fits into *your* description of, "complete forum for all other information ".

 

The OP wants me out, so as said, I will respect his wishes.

 

I don't have a favorite, you pick one. I shared them as I came across them. Start with the first one, if you'd like.

Link to comment
On 5/23/2020 at 12:25 AM, pkane2001 said:

This thread is specifically about bias in audio testing, how it may affect the results, and any mitigation strategies that can help deal with it.

 

 

Specifically about the last point, Paul, I'll mention the general methods I use when assessing:

 

First of all, the word "preference" is meaningless to me - that concept is alien to how I think ... either a system is acceptably accurate to the recording, or it's not. If the former, then if comparing two genuinely very highly performing systems then I might favour one over the other - but this situation has never arisen for me.

 

What I find I'm always doing is determining whether a setup is acceptably accurate. And how I go about that is as follows: I have a range of recordings that have very distinct characteristics; they have 'signatures' which may prove difficult for a particular rig to produce acceptably - or, they may have little trouble doing so. When faced with a new replay setup, I'll throw an almost random set of recordings at it, see what first impressions tell me. Then, based on what the feedback there is, I'll narrow down on just a few recordings which most clearly highlight where I feel the rig is not working at its best ... I will play these at various volumes, explore what further information that gives me. The better a particular system functions, the more 'difficult' the recordings I will use, the louder I will listen to them, and the 'deeper' into the sound image I will focus, to see if the finest details still come across correctly.

 

IOW, I'm always attentive to any signs of misbehaviour; any failures of the playback to retrieve what's on the recording to an acceptable standard ... by this approach I feel that bias plays no part - the concept is to merely identify what is incorrect in the playback.

Link to comment
On 6/19/2020 at 7:47 PM, fas42 said:

 

Specifically about the last point, Paul, I'll mention the general methods I use when assessing:

 

First of all, the word "preference" is meaningless to me - that concept is alien to how I think ... either a system is acceptably accurate to the recording, or it's not. If the former, then if comparing two genuinely very highly performing systems then I might favour one over the other - but this situation has never arisen for me.

 

What I find I'm always doing is determining whether a setup is acceptably accurate. And how I go about that is as follows: I have a range of recordings that have very distinct characteristics; they have 'signatures' which may prove difficult for a particular rig to produce acceptably - or, they may have little trouble doing so. When faced with a new replay setup, I'll throw an almost random set of recordings at it, see what first impressions tell me. Then, based on what the feedback there is, I'll narrow down on just a few recordings which most clearly highlight where I feel the rig is not working at its best ... I will play these at various volumes, explore what further information that gives me. The better a particular system functions, the more 'difficult' the recordings I will use, the louder I will listen to them, and the 'deeper' into the sound image I will focus, to see if the finest details still come across correctly.

 

IOW, I'm always attentive to any signs of misbehaviour; any failures of the playback to retrieve what's on the recording to an acceptable standard ... by this approach I feel that bias plays no part - the concept is to merely identify what is incorrect in the playback.

 

While maybe that's true about you,  Frank, it may not be true for most people :)

 

When you say there is no preference except for what is accurate, this is certainly not true in all cases, and for one, it would be great if you could demonstrate that this is true with some objective evidence. Do we even know what "accurate reproduction" means for the whole audio system, including speakers, the room, and the listener? How can you be sure that what you think is "accurate reproduction," is not in fact, seriously distorted? Just because you think it is, doesn't make it so. As you can see in many of the studies cited here, even highly trained professionals fall for simple sighted bias and prefer something based on the appearance and expectation rather than on the sound, despite being aware of bias and being taught to avoid it.

 

You may also find interesting @Archimago's last blind test results. There's a result with a p value of better than 0.05 that points to the test subjects preferring a little distortion (-75dB THD) with their music over the completely undistorted version. This is not proof of anything, but it does point to the possibility that what sounds better to many isn't necessarily what is most accurate.

 

 

Link to comment
2 hours ago, pkane2001 said:

 

While maybe that's true about you,  Frank, it may not be true for most people :)

 

I agree it's not true for most audio enthusiasts - generally, the intent is for certain, prescribed recordings to sound impressive, and most else much less so - cutting myself off from so much good music is not particularly appealing, 😉.

 

2 hours ago, pkane2001 said:

 

When you say there is no preference except for what is accurate, this is certainly not true in all cases, and for one, it would be great if you could demonstrate that this is true with some objective evidence.

 

That it's true that most people would prefer, or not prefer, accurate playback?

 

2 hours ago, pkane2001 said:

Do we even know what "accurate reproduction" means for the whole audio system, including speakers, the room, and the listener? How can you be sure that what you think is "accurate reproduction," is not in fact, seriously distorted? Just because you think it is, doesn't make it so. As you can see in many of the studies cited here, even highly trained professionals fall for simple sighted bias and prefer something based on the appearance and expectation rather than on the sound, despite being aware of bias and being taught to avoid it.

 

Accurate reproduction can only be assessed at the interface between the speakers and the listening environment - headphones are of course the classic means for doing this, but this is not the ideal listening situation for many people. How I get around this is by what I've mentioned many times - listening to an individual speaker as if it were one half of a headphone ... strangely enough, the physical principles in place are actually rather similar ... 😜.

 

How it works in practice is, that if one can use a single speaker like a headphone driver, then the SQ is of a sufficient standard. Seriously distorted replay is completely obnoxious to listen to in this manner; makes it trivially obvious that the replay chain is faulty.

 

2 hours ago, pkane2001 said:

You may also find interesting @Archimago's last blind test results. There's a result with a p value of better than 0.05 that points to the test subjects preferring a little distortion (-75dB THD) with their music over the completely undistorted version. This is not proof of anything, but it does point to the possibility that what sounds better to many isn't necessarily what is most accurate.

 

 

 

I mentioned in his thread on this forum about that test, why this something like this might be happen - purely speculation on my part; I haven't listened to his samples on a well enough performing rig to check further.

Link to comment

I've previously posted the ITU-R recommendation on proper testing procedures for small audio impairments, in other words, for testing for small audible differences. But what if the differences are, say, "night-and-day" or "not even subtle" as is often stated by audiophiles and the press? Turns out there's a detailed recommendation for proper objective testing in these cases, as well: 

 

ITU-R BS.1534-3, 10/2015

https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1534-3-201510-I!!PDF-E.pdf 

 

This recommends the MUSHRA testing methodology: a double-blind multi-stimulus test method with hidden reference and hidden anchor(s). This method is designed to help weed out test subjects that are either not sensitive to real differences, or those who might be biased in one way or another. More about this testing methodology in another post.

 

What I found interesting in this recommendation (quoting the salient parts)

 

4 Selection of assessors 

The higher the quality reached by the systems to be tested, the more important it is to have experienced listeners. 

 

4.1 Criteria for selecting assessors 

Only assessors categorized as experienced assessors for any given test should be included in final data analysis. 

 

5.1 Description of test signals 

It is recommended that the maximum length of the sequences be approximately 10 s, preferably not exceeding 12 s. This is to avoid fatiguing of listeners, increased robustness and stability of listener responses, and to reduce the total duration of the listening test. 

 

5.2 Training phase 

In order to achieve reliable results, it is mandatory to train the assessors in special training sessions in advance of the test. 

 

5.5 Recording of test sessions 

In the event that something anomalous is observed when processing assigned scores, it is very useful to have a record of the events that produced the scores 

 

The recommendation also goes into the details of recommended statistical analysis of the results and even into how to present those results in a report. Very thorough!

 

 

Link to comment

There are often objections raised that blind audio test results are invalid because it's too difficult to do them right. That's often used by those who haven't tried blind tests or want to disprove the results of such tests, and usually without evidence. So, let's see what professionals do to reduce possible errors in DBTs.

 

As most research and studies already cited here demonstrate, there's a very large, measurable error that's caused by sighted testing. In some cases the result is wrong by as much as 40%, as reported in one of the studies. So, if blind tests need to be used to reduce the effect of this error, then what, if anything, can we do to make blind sensory tests more accurate? 

 

Here are a few slides from a presentation already shared earlier. This talks to what a test administrator should do to improve the quality of the test results. Since that presentation is on sensory preference testing not specific to audio, I edited some of their examples out that didn't relate to audio:

 

(I highlighted a few that I find are often overlooked in audio testing, both blind and sighted)

 

Training a panel

Expectation error –any information a panelist receives influences the outcome

Panels finds what they are expected to find

•Trick –provide only enough information for panelist to be able to do the test

Try not to include people already involved in the experiment (single blind)

•Avoid codes that create inherent bias (1,A etc)

•Motivated panelists 

•Leniency error –rate products based upon feelings about researcher

Suggestion effect –response of other panelists to product (need to isolate panelists and keep them quiet)

 

Testing times

Must not be too tired

•Late morning or mid afternoon are good

•Early AM bad for testing 

•Late day –lack of panelist motivation

 

Stimulus Error

•Influence of irrelevant questions

Try to mask unwanted difference (e.g. colored lights)

Logical error –associated with stimulus error –tendency to rate characteristics that appear to be logically associated. Control by masking differences

 

Halo and Proximity Effect

Halo effect –caused by evaluating too many factors at one time. Panelists already have an impression about the product when asked about second trait –will form a logical association (e.g. dry-> tough)

•Best to structure testing so that only one factor is tested at a time (difficult to do)

•Proximity error –rate more characteristics similar when they follow in close proximity.

 

Convergence Effect

Convergence effect –large difference between two samples will mask small differences between others. 

This causes results to converge. So use random order to reduce this.

•Next slide shows how flavor interactions impact this.

 

Positional Effect and Contrast Effect

Positional effect –tendency to rate second product higher or lower

•2 products very different –panelists will exaggerate differences and rate ‘weaker’ sample lower than would otherwise

Use random order. Use all possible presentation orders

 

Controls

•Include reference sample in test as part of mix

•Use random numbers

•Balanced order of presentation to reduce physiological and psychological effects

 

  

 

Link to comment
1 hour ago, Summit said:

You keep saying the same thing over and over again. I know of no genuine objections raised that blind audio test is good for testing bias. The question is if those bias testes and how they are conducted are good for testing the subtle SQ difference of audio gear. Audio gear that many people feel that they need to evaluate for a long time to get to know how they sound.

 

Interesting. As I believe I said (and you quoted me): "There are often objections raised that blind audio test results are invalid because it's too difficult to do them right." And the post you were responding to was all about the steps that professionals in sensory testing take to ensure that the test results are more statistically valid, i.e., done right.

 

Your main objection seems to be that longer term audio evaluation is more sensitive than short term to small differences. That's an often brought out hypothesis, but I've yet to see any sort of objective evidence to support it. Can you cite some studies that demonstrate this increased sensitivity? By the way, your hypothesis in no way invalidates blind testing, it simply proposes a method of doing it.

 

Link to comment
5 hours ago, pkane2001 said:

Your main objection seems to be that longer term audio evaluation is more sensitive than short term to small differences. That's an often brought out hypothesis, but I've yet to see any sort of objective evidence to support it. Can you cite some studies that demonstrate this increased sensitivity? By the way, your hypothesis in no way invalidates blind testing, it simply proposes a method of doing it.

 

 

That is correct my problem have never been about testing SQ blind, I have done it myself a few times. The problem is the methodology commonly used in all type of tests to determine sound quality of different gear.

 

The biggest problem is that these tests are not conducted in the way that I and many audiophile listen and evaluate audio gear sighted. To be able to hear subtle difference between audio equipment I need to be familiar with the room, audio system as well as the recordings. A blind test or a sighted A/B test is only difficult to do if the point is to achieve statistical significant proof by many fast repeated switching of gear.

 

Our Auditory memory is very short so a test there they change gear every 5-30 second test won’t let us hear the SQ just the change between the gear, and to some degree their overall sound signature.

 

To properly evaluate any audio gear we need to be able to first “calibrate” to the sound and then compare it to our long-term memory of real none recorded sound. Echoic memory or other short time memories are not valuable for this task. Then I test two audio gear in my stereo I will compare how the bass, drums, the guitar, piano, voices etc. sound to how they normally sound like in real life and to do that I need to use my long time memory. Even when I compare two audio gear I will use my memory of how those instrument sound compare to both the other audio gear and the references, the non-recording memories I have.

 

I believe that to understand why most audio tests are flawed we need to understand how we hear and compare sound. Here is a start.

 

“Each type of memory is tied to a particular type of brain function. Long-term memory, the class that we are most familiar with, is used to store facts, observations, and the stories of our lives. Working memory is used to hold the same kind of information for a much shorter amount of time, often just long enough for the information to be useful; for instance, working memory might hold the page number of a magazine article just long enough for you to turn to that page. Immediate memory is typically so short-lived that we don’t even think of it as memory; the brain uses immediate memory as a collecting bin, so that, for instance, when your eyes jump from point to point across a scene the individual snapshots are collected together into what seems like a smooth panorama.”

 

https://brainconnection.brainhq.com/2013/03/12/how-we-remember-and-why-we-forget/

Link to comment
1 hour ago, Summit said:

 

That is correct my problem have never been about testing SQ blind, I have done it myself a few times. The problem is the methodology commonly used in all type of tests to determine sound quality of different gear.

 

The biggest problem is that these tests are not conducted in the way that I and many audiophile listen and evaluate audio gear sighted. To be able to hear subtle difference between audio equipment I need to be familiar with the room, audio system as well as the recordings. A blind test or a sighted A/B test is only difficult to do if the point is to achieve statistical significant proof by many fast repeated switching of gear.

 

Our Auditory memory is very short so a test there they change gear every 5-30 second test won’t let us hear the SQ just the change between the gear, and to some degree their overall sound signature.

 

To properly evaluate any audio gear we need to be able to first “calibrate” to the sound and then compare it to our long-term memory of real none recorded sound. Echoic memory or other short time memories are not valuable for this task. Then I test two audio gear in my stereo I will compare how the bass, drums, the guitar, piano, voices etc. sound to how they normally sound like in real life and to do that I need to use my long time memory. Even when I compare two audio gear I will use my memory of how those instrument sound compare to both the other audio gear and the references, the non-recording memories I have.

 

I believe that to understand why most audio tests are flawed we need to understand how we hear and compare sound. Here is a start.

 

“Each type of memory is tied to a particular type of brain function. Long-term memory, the class that we are most familiar with, is used to store facts, observations, and the stories of our lives. Working memory is used to hold the same kind of information for a much shorter amount of time, often just long enough for the information to be useful; for instance, working memory might hold the page number of a magazine article just long enough for you to turn to that page. Immediate memory is typically so short-lived that we don’t even think of it as memory; the brain uses immediate memory as a collecting bin, so that, for instance, when your eyes jump from point to point across a scene the individual snapshots are collected together into what seems like a smooth panorama.”

 

https://brainconnection.brainhq.com/2013/03/12/how-we-remember-and-why-we-forget/

 

Well, these are all conjectures or possible explanations for why it may work this way. And I don't necessarily disagree with anything you said here (I'm open to hearing evidence for one way or the other). But I still would like to see some studies or properly conducted blind tests that demonstrate that longer term listening can be more sensitive than shorter-term, 8-10 seconds switching when evaluating minor differences. That's certainly not the way the industry conducts subjective listening tests, although I've not found a clear indication of whether it's because short-term switching is just easier to conduct and more convenient, or because echoic memory limits our ability to evaluate minor audio differences beyond a few seconds. In conversations with some audio testing professionals, they did indicate that shorter-term, quick-switching was the way to detect minor SQ differences. But, again, that's just someone saying it, and what I'm looking for is objective evidence for whether it's true or not.

Link to comment
6 hours ago, Summit said:

Okay, before setting up and conducting a test you have to decide what to measure and how it can be measured.

 

Again, some interesting hypotheses, and I appreciate you taking the time to describe them in detail. Before doing any tests myself, I would want to still see if there's already existing objective evidence presented to support or contradict these. But let's discuss your logical evidence, in the absence of objective one :) Before we dive into it, let me state something that may be a misunderstanding:

 

The goal of this thread, and the proper design of tests listed here is to eliminate UNRELATED biases from determining preference or discrimination ability. These biases affect both types of tests, but certainly there are some that affect only subjective or only discrimination testing. So, yes, it's important to avoid BIASES in sensory testing that have nothing to do with the sense being tested.

 

6 hours ago, Summit said:

Sound quality is subjective in nature and is therefore very difficult to measure because people have preference as well as bias.

.

Sound quality is subjective. Absolutely. And yet, sound quality and the subjective measures of such is what nearly every study presented here so far is trying to address. In fact, these demonstrate that our preferences and subjective likes are not that different from each other. When biases related to other senses are removed or controlled in the experiment, there is often a high statistical correlation between. For example, different people liking the sound quality of the same speakers. Or the same headphones, or even the same type of distortion. It may be more difficult to do a proper study of subjective preferences, but not impossible, and there are industry-standard procedures published to do so with audio, also cited in this thread.

 

In fact, @Archimago just completed an internet blind test on harmonic distortion that included both, discrimination measures as well as subjective preferences. Interestingly, there was a statistically significant preference shown to a low level harmonic distortion compared to no distortion. While I can't say it's a perfectly conducted test, it does address your objection to using different equipment, room, etc.: in fact, nobody in this experiment used the same equipment or the same room.

 

7 hours ago, Summit said:

The method used in all published blind tests I seen is maybe suited to measure discrimination, but not for preference of sound quality or people’s ability to hear them.

 

Please review all the cited studies in this thread. Nearly every one is a subjective preference study. But as part of the test, a proper sensitivity testing/consistency has to be shown, otherwise the results can't be statistically valid. If someone can't consistently identify the same device as having certain SQ qualities, then perhaps they fail the discrimination test, even though the test is one designed to measure preferences. That's where proper statistical analysis has to be performed, and the test (such as MUSHRA, for example) is designed to detect test subjects with poor discrimination ability, or those who just guess randomly or have a hidden bias.

 

7 hours ago, Summit said:

I have presented logical  “evidence” that the test method commonly used is significantly different to how audiophiles normally evaluate sound quality between audio gears. This alone should be enough to questioning all these sound quality testes IMO

 

Mmm, no. In scientific research, it's not enough to just bring up possible issues with the process. You need to demonstrate that they are indeed an issue and this requires objective evidence. For example, the claim that longer term evaluation is more sensitive to small differences than short switching is not at all obvious, and begs to be tested. Without evidence, it's just a conjecture, and there are thousands of those floating around just this forum, for example.

 

7 hours ago, Summit said:

All things that can affect the result of the study/test should be presented and explained

 

Presented, explained, and substantiated by evidence. There was a software engineer working for me a long time ago who claimed the bugs in his software were caused by glitches generated by cosmic rays hitting the computer. He was serious (!) and wouldn't accept that he could've possibly made a mistake or missed something in programming. So, yes, he had an explanation, but without evidence, it was not something I'd ever accept or even start to consider :) 

 

7 hours ago, Summit said:

I know of no one that goes to a HIFI shop and listen to a song by changing audio gear every 5-8 second.

 

I'm not arguing for quick switching. I'd like to see objective evidence that long-term evaluation is more sensitive. Regardless of how audiophiles do their evaluation at the shop or at home, this is not objective evidence, it's just an anecdote.

 

I in no way want to make it sound like proper subjective (or even discrimination) testing is easy. In fact, this whole thread is about the complications that must be dealt with in such experiments.

But for those of us who actually want to understand and learn what is really going on, this is not an impossible task and, in my opinion, worth pursuing in an objective way, rather than accepting as truth what someone else said they think.

Link to comment

Over on the BAS site (Boston Audio Society), they spend time talking about non-biased testing (listening testing) and have dome quite a bit of it. Another that does that is SMOOTMS (Southeast Michigan Woofer and Tweeter Marching Society - not making this up). Both show that sighted listening is inherently biased

Current:  Daphile on an AMD A10-9500 with 16 GB RAM

DAC - TEAC UD-501 DAC 

Pre-amp - Rotel RC-1590

Amplification - Benchmark AHB2 amplifier

Speakers - Revel M126Be with 2 REL 7/ti subwoofers

Cables - Tara Labs RSC Reference and Blue Jean Cable Balanced Interconnects

Link to comment
12 hours ago, Summit said:

 

Okay, before setting up and conducting a test you have to decide what to measure and how it can be measured. Sound quality is subjective in nature and is therefore very difficult to measure because people have preference as well as bias. Sound quality also depends on many external factors like the quality of the room, audio system, recordings and how they are setup etc..

 

 

Yes, decide what you want to measure ... IME many ambitious rigs are about developing various types of tonality seasoning, and then of course subjectivity is everything in the assessment of what one hears. But "sound quality" is not subjective, in my book - it is the degree to which there is no significant audible adulteration of what's on the recording by the playback chain - how one assesses is by listening for faults in the sound, clearly evident misbehaviour of the replay.

 

To use the dreaded car analogy, 😜, most audiophiles compare by saying things like, I prefer the MB ambience over the BMW variety ... I say, I note that car 1 develops a slightly annoying vibration while accelerating, at a certain engine speed; whereas car 2 doesn't. Therefore, car 2 is the better quality car.

Link to comment
On 6/28/2020 at 5:25 AM, pkane2001 said:

 

Well, these are all conjectures or possible explanations for why it may work this way. And I don't necessarily disagree with anything you said here (I'm open to hearing evidence for one way or the other). But I still would like to see some studies or properly conducted blind tests that demonstrate that longer term listening can be more sensitive than shorter-term, 8-10 seconds switching when evaluating minor differences. That's certainly not the way the industry conducts subjective listening tests, although I've not found a clear indication of whether it's because short-term switching is just easier to conduct and more convenient, or because echoic memory limits our ability to evaluate minor audio differences beyond a few seconds. In conversations with some audio testing professionals, they did indicate that shorter-term, quick-switching was the way to detect minor SQ differences. But, again, that's just someone saying it, and what I'm looking for is objective evidence for whether it's true or not.

 

Hi Paul,

Your intent in establishing this blog is honourable, and your patience while attempting to enlighten the naysayers such as Alex has been exemplary.

 

Unfortunately the majority of people (not all) put their emotions and beliefs first and will often cherry pick or alter the facts to match them. Very few take a rational approach to a subject that has feelings associated with them. Just look at the rebound in the sharemarket, which assumes that the Pandemic will not affect the economy, despite the evidence that things are gong to get a lot worst before they get better.

 

There is a book by physiologist Mark Manson "Everything if F*cked, a book about Hope" that explores why people adopt unhealthy beliefs and behave as they do despite the facts.

 

https://www.amazon.com.au/Unti-Manson-2/dp/0062888439

 

I would also commend those looking for a more rational approach to audio reproduction to view Archimago's Musings' blog and the Audio Science Review website. I find they give a good balance to the more subjective views you find here.

 

All the best,

 

Ajax

LOUNGE: Mac Mini - Audirvana - Devialet 200 - ATOHM GT1 Speakers

OFFICE : Mac Mini - Audirvana - Benchmark DAC1HDR - ADAM A7 Active Monitors

TRAVEL : MacBook Air - Dragonfly V1.2 DAC - Sennheiser HD 650

BEACH : iPhone 6 - HRT iStreamer DAC - Akimate Micro + powered speakers

Link to comment

Following on from what @Summit and many others have said (myself included) its all about the methodology. So, for any methodology of any test or test procedure a pivotal question is how good is it at telling you what you want to know. For good tests there are objective ways to determine this and one example being aware of the false negative rate the test produces.

 

If any test has a false negative rate of 90% that would render it useless. If OTOH the false negative rate is 5% then you can say, well the test is telling me something but there is a 1 in 20 chance it is getting it wrong. In either case you know objectively where you stand.

 

In all the studies cited and AFAIK other blind listening studies, including Toole's, there is no definitive false negative rates declared ( I genuinely stand to be corrected here). It seems that it is assumed that the false negative rates are negligible....but how do you/they know that? Definitive statements are made such as "blind tests show..bla bla". ...but what if the false negative rate is 90%?

 

Now, as pointed out, there are recommendations, official recommendations, made to mitigate these false negatives like take your time, avoid fatigue, drink more water, whatever. However, these are just (mostly) sensible guidelines that in and of themselves have not been validated with experimental evidence that results in revealing the actual false negative rates by adhering to these guidelines. Guidelines are guidelines, if you compare European vs American guidelines on various medical conditions you will find differences of opinion. Why? Because the experimental evidence is not conclusive and the jury remains out.

 

Bottom line -Tell me the false negative rate of any test otherwise I have no confidence the test is not giving me a wrong result.

Sound Minds Mind Sound

 

 

Link to comment
14 hours ago, botrytis said:

Over on the BAS site (Boston Audio Society), they spend time talking about non-biased testing (listening testing) and have dome quite a bit of it. Another that does that is SMOOTMS (Southeast Michigan Woofer and Tweeter Marching Society - not making this up). Both show that sighted listening is inherently biased

 

Any pointers to those tests?

 

It's not hard to demonstrate that sighted testing introduces many variables into testing that have nothing to do with actual audio or hearing. Some of the findings already mentioned in this thread illustrate it quite nicely:

 

  • The hearing aid test where the subjects were told it has some new technology heard major improvements, while those who thought it was old technology did not (same exact hearing aid was used)
  • Subwoofer effect. While the subwoofer was visible to the test subjects, they heard an obvious low-end extension to the sound even though it wasn't hooked up. When the sub was hidden from view, the effect disappeared
  • Speaker comparisons that showed a very high preference for a high-end speaker when sighted, even among trained audio testers, that became a very different (and consistent among test takers!) selection when the identity of the speaker was hidden

 

 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...