Jump to content
pkane2001

Does BIAS affect audio test results?

Rate this topic

Recommended Posts

From the chart, I see a number of suggestions to reduce bias, all of which I would presume to require in a well-designed test:

Randomize temporal distribution.

Use blind listening tests.

Avoid multi-stimulus tests.

Avoid presenting some stimuli more often than others.


Kal Rubinson

Music in the Round

Senior Contributing Editor, Stereophile

Share this post


Link to post
Share on other sites
26 minutes ago, pkane2001 said:

Preference testing, aka, qualitative testing: testing to see which DUT the subject prefers, which one sounds better, often ranked on a scale, say one to ten

That's a true meaning of the word 'subjective' ears only test.

* * * * * * * * * * *

The kind of real 'subjective' test that Floyd Toole and Sean Olive often write about.

Share this post


Link to post
Share on other sites
25 minutes ago, Kal Rubinson said:

From the chart, I see a number of suggestions to reduce bias, all of which I would presume to require in a well-designed test:

Randomize temporal distribution.

Use blind listening tests.

Avoid multi-stimulus tests.

Avoid presenting some stimuli more often than others.

 

Right! The article (I thought I added the link to it, but it seems to have disappeared) has a lot more detail on each of the biases and proposed ways to address them in the full text:

 

http://www.acourate.com/Download/BiasesInModernAudioQualityListeningTests.pdf

 

Share this post


Link to post
Share on other sites
31 minutes ago, Speedskater said:

That's a true meaning of the word 'subjective' ears only test.

* * * * * * * * * * *

The kind of real 'subjective' test that Floyd Toole and Sean Olive often write about.

 

Subjective in this case is indicating a personal preference. Toole and Olive have argued vehemently for blind subjective testing precisely because of some of the biases that sighted testing can introduce. From the list in the Zieliński, et al, article that particular bias is referred to as Bias due to equipment appearance, listener expectations, preference, and emotions. They do mention Toole's speaker demonstrations of this bias, but also some other interesting studies, such as the hearing aid one where exactly the same hearing aid was preferred by 33 out of 40 test subjects when labeled as "digital" compared to the one not labeled. Hearing aids were exactly the same, and only 4 out of 40 said there was no difference:

 

F. E. Toole and S. Olive, “Hearing Is Believing vs. Believing is Hearing: Blind vs. Sighted Listening Tests,

and Other Interesting Things,” presented at the 97th Convention of the Audio Engineering Society, J. Audio Eng.

Soc. (Abstracts), vol. 42, p. 1058 (1994 Dec.), preprint 3894

 

R. Bentler, D. Niebuhr, T. Johnson, and G. Flamme, “Impact of Digital Labeling on Outcome Measures,”

Ear and Hearing, vol. 24, pp. 215–224 (2003).

 

Share this post


Link to post
Share on other sites
1 hour ago, pkane2001 said:

Threshold testing: testing for audibility thresholds of human hearing. This is similar to discrimination testing, but the DUT in this case is the listener with the goal of determining the limits of audibility of different signals or distortions in human perception

I don’t think that’s correct.  Pure tone threshold determination is an incredibly well studied test that is as objective as any other biologic test with which I’m familiar.  There is no difference being detected between two alternatives - what’s being tested is the threshold of a biologic response to a single stimulus.  The only device under test is the auditory system of the subject.
 

Done in properly designed and equipped environments with calibrated, certified devices, it is measurable, highly reliable, and consistently repeatable both among multiple subjects and for the same subject in repeated administrations.  It’s not of as much use as a determinant of hearing ability as most believe, but this is not relevant to the topic here. 
 

Comparative audiophile auditions aren’t in the same country, let alone the same ball park.

Share this post


Link to post
Share on other sites
16 minutes ago, bluesman said:
Quote

Threshold testing: testing for audibility thresholds of human hearing. This is similar to discrimination testing, but the DUT in this case is the listener with the goal of determining the limits of audibility of different signals or distortions in human perception

 

I don’t think that’s correct.  Pure tone threshold determination is an incredibly well studied test that is as objective as any other biologic test with which I’m familiar.  There is no difference being detected between two alternatives - what’s being tested is the threshold of a biologic response to a single stimulus.  The only device under test is the auditory system of the subject.
 

Done in properly designed and equipped environments with calibrated, certified devices, it is measurable, highly reliable, and consistently repeatable both among multiple subjects and for the same subject in repeated administrations.  It’s not of as much use as a determinant of hearing ability as most believe, but this is not relevant to the topic here. 
 

Comparative audiophile auditions aren’t in the same country, let alone the same ball park.

 

I'm not sure which part was incorrect, can you please clarify? 

 

The audibility threshold testing I mentioned is not just about pure tone audibility. Far from it. Tests can be constructed to determine the audibility of any known sound characteristic or distortion (jitter, harmonic, TIM, compression, phase, filter ripple, spatial perception, signal masking, etc., etc.) If you have good references to any of those studies, these may be worth discussing as well. That is an extremely interesting subject, but possibly for another thread if it isn't related to biases.

 

 

Share this post


Link to post
Share on other sites
1 hour ago, pkane2001 said:

 

Right! The article (I thought I added the link to it, but it seems to have disappeared) has a lot more detail on each of the biases and proposed ways to address them in the full text:

 

http://www.acourate.com/Download/BiasesInModernAudioQualityListeningTests.pdf

 

 

Discussion about the listener and possible biases in section 2.3 and 3.2:
 

image.png.1438a992caab5eda41a2aab694b697d9.png

image.png.3c42d1acbec5310a80848cca85b46605.png

Share this post


Link to post
Share on other sites
46 minutes ago, pkane2001 said:

I'm not sure which part was incorrect, can you please clarify? The audibility threshold testing I mentioned is not just about pure tone audibility.

Pure tone threshold determination is the name of the basic audiometric test that most people consider a “hearing test”.  It’s the kind of hearing testing (or what passes for testing...) that you find on the internet.  It appears that you’re using that term to mean something else - but “audibility threshold testing“ is what it is and nothing more, in the world of otology and audiology.

Share this post


Link to post
Share on other sites
1 minute ago, bluesman said:

Pure tone threshold determination is the name of the basic audiometric test that most people consider a “hearing test”.  It’s the kind of hearing testing (or what passes for testing...) that you find on the internet.  It appears that you’re using that term to mean something else - but “audibility threshold testing“ is what it is and nothing more, in the world of audiology.

 

Got it. No, this is used as a much more general term in audio science related to all audibility thresholds, not just the pure tone, audiometric testing. Audiometric testing is certainly one of the tests in this category.

Share this post


Link to post
Share on other sites

A paper that comes up frequently in discussions about double-blind tests. Specifically, this is discussing the second type of bias in the Zieliński's paper (listener bias due to expectations, appearance, preference, emotions), although listener bias due to training is also touched on.

 

F. E. Toole and S. Olive, “Hearing Is Believing vs. Believing is Hearing: Blind vs. Sighted Listening Tests,

and Other Interesting Things,” presented at the 97th Convention of the Audio Engineering Society, J. Audio Eng.

Soc. (Abstracts), vol. 42, p. 1058 (1994 Dec.), preprint 3894

 

40 listeners, both experienced and inexperienced, participated in both, blind and sighted testing in this study.

 

Part of the conclusion the authors draw:

image.thumb.png.8df6c065378c283cf922c75a231ffdca.png 

Share this post


Link to post
Share on other sites

The short answer is yes BIAS will affect the audio test result.

 

But it’s not that easy that those BIAS testes therefore should be seen as some kind of litmus paper for SQ. All type of subjective SQ tests* also have an effect on the results, because the test methods force us to listen and come to conclusions about the quality of a product in an (for many) unnatural way. The tests themselves have to be done very different to how many people normally listen to and evaluate audio gear** to be scientifically correct. It’s not so much the stress IMO, as it is about the many repeat and short time to hear how it sounds like, compared to then listen for longer time in a familiar system.

 

BIAS tests are good if we for example want to know if a particular cable can sound different to another cable. Not so good if we also want to know which cable people prefer the SQ of. So in other words BIAS tests can be done for “Discrimination testing” but not for “Preference testing”. Threshold test are not for audio gear, it’s for testing the human hearing.

 

*A/B test, ABX test, DBT, blind testes etc.

 

**Familiar system, room, many well-known but different sounded records.   

Share this post


Link to post
Share on other sites
1 hour ago, Summit said:

The tests themselves have to be done very different to how many people normally listen to and evaluate audio gear** to be scientifically correct.

 

**Familiar system, room, many well-known but different sounded records.   

This is often repeated, but incorrect.

Share this post


Link to post
Share on other sites
3 hours ago, Summit said:

So in other words BIAS tests can be done for “Discrimination testing” but not for “Preference testing”. Threshold test are not for audio gear, it’s for testing the human hearing.

 

None of the studies I mentioned so far were used for discrimination testing: all were testing for listener preference. And none required short samples and fast switching during the test. As SAM said, this is an often used argument against blind testing, but perhaps that's an issue only with those who don't know how blind tests are conducted (and I don't mean just subjectivists here, objectivists often have their own misconceptions). This is why I'd like to have this discussion where actual findings and facts can be discussed and referenced instead of generally used but unsubstantiated arguments.

Share this post


Link to post
Share on other sites
3 hours ago, Summit said:

Threshold test are not for audio gear, it’s for testing the human hearing

I think you may have misunderstood me.  “Pure tone auditory threshold determination” is a specific hearing test. But you can test for the threshold of audibility of anything - noise, jitter, 5th harmonic distortion, cosmic rays, or whatever else interests you. Just call it what it is, eg “jitter auditory threshold testing”. And you’d have to control for the subjects’ hearing acuity, since a 30dB threshold shift at the frequency or frequencies of the test parameter would be expected to shift its threshold too.

Share this post


Link to post
Share on other sites
2 hours ago, pkane2001 said:

 

None of the studies I mentioned so far were used for discrimination testing: all were testing for listener preference. And none required short samples and fast switching during the test. As SAM said, this is an often used argument against blind testing, but perhaps that's an issue only with those who don't know how blind tests are conducted (and I don't mean just subjectivists here, objectivists often have their own misconceptions). This is why I'd like to have this discussion where actual findings and facts can be discussed and referenced instead of generally used but unsubstantiated arguments.

 

Testing for listener preference is one thing, it can be done by A/B test, ABX test, DBT, blind testes etc.

 

To test for BIAS is another thing. How would we know if the test group or some in the test group really liked one gear over another or if it was because of confirmation bias?

 

All type of subjective tests have an effect on the results, not only blind tests. 

 

I see that you find my reasoning to be misconceptions and unsubstantiated arguments so I will not post here anymore.  

Share this post


Link to post
Share on other sites

 

18 minutes ago, Summit said:

I see that you find my reasoning to be misconceptions and unsubstantiated arguments so I will not post here anymore.

 

Sorry, that wasn't addressed to you. This was a general statement about why I started the thread, and the misconceptions I referred to are by both, subjectivists and objectivists, as mentioned in that same statement.

 

18 minutes ago, Summit said:

To test for BIAS is another thing. How would we know if the test group or some in the test group really liked one gear over another or if it was because of confirmation bias?

 

Do two tests. One where the subjects are aware of, say, very high-end equipment being present in the playback chain. Another where they can't see the equipment, and yet another where they see the equipment but it's not actually playing. If preferences change significantly in the test where the equipment identity is known, it is due to visual bias. Exactly what the subwoofer test found just prior to your post, and what Olive and Toole found with speakers. Knowing the identity of the speaker changes preferences by as much as the change in any other audible characteristic of the speaker, even if the listener is an experienced and trained one:

 

image.thumb.png.8df6c065378c283cf922c75a231ffdca.png

Share this post


Link to post
Share on other sites

More on biases in this audio testing book:

 

Bech, Søren & Zacharov, Nick. (2006). Perceptual Audio Evaluation - Theory, Method and Application

 

The book is a good experimenter's guide to audio testing and includes a list of experimental biases. The following is a short summary, more details in the book, section 4.2.4:

  • Visual Bias

    The effect introduced when subjects can see the DUTs that are being tested
     

  • Perceptual Oversensitivity:

    If specific attributes are being tested, it must be ensured that the subjects can actually hear those artifacts or are not overly sensitive

     
  • Halo Bias

    The effect that describes the influence that a very positive evaluation of stimulus for one of the rated

    attributes can have when the other attributes are evaluated for that stimulus (for example, spatial properties of a speaker evaluation may have a significant effect on timbral or other, unrelated qualities)
     

  • Expectation Bias

    This effect is related to the expectations of the subject, for example, if a subject has an expectation that a

    certain loudspeaker will be included in the tests or if a fixed order of stimuli is used, so the subjects knows (or expects) what the next stimulus will be
     

  • Cross-Modality Bias

    Some context effects are the results of input in other modalities, for example, vision and smell. So, the visual input, both the intended, which might be a part of the test, and the unintended in the surroundings, is also important in listening tests. 

 

Share this post


Link to post
Share on other sites

You do know that a subwoofer in the room (that is, with a real LF driver inside of it) just plays along with the music, right ?

(IIRC some LF designs even depend on this phenomenon)

 

Not that I attribute real value to it regarding the tests you refer to, Paul, but I would be careful with that one.


Lush^2      Blaxius^2      Ethernet^2     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Share this post


Link to post
Share on other sites
28 minutes ago, PeterSt said:

You do know that a subwoofer in the room (that is, with a real LF driver inside of it) just plays along with the music, right ?

(IIRC some LF designs even depend on this phenomenon)

 

Not that I attribute real value to it regarding the tests you refer to, Paul, but I would be careful with that one.


The point of that report was that the subwoofer was not playing anything at all — it wasn’t hooked up and yet the listeners thought they heard more low end extension when they saw the subwoofer. That changed when the subwoofer was no longer visible.  Visual bias.

Share this post


Link to post
Share on other sites

Next up is an industry standard recommendation for audio quality testing:

 

Methods for the subjective assessment of small impairments in audio systems, ITU-R BS.1116-3 (02/2015) 

 

A lot of interesting recommendations and basic quality standards defined in this recommendation. This is again, a qualitative evaluation standard (subjective) meant to determine which of the DUTs sounds "better". This is a standard used by broadcasting and audio industries for proper evaluation of listener preferences. If you search for the recommendation number, you'll find a number of companies and third-party labs that provide testing services that adhere to this standard.

 

Obvious throughout, but not stated explicitly, is the need to control for both, listener and experimenter bias. Here are some testing regimens recommendations that I found interesting:

 

Section 2: Experimental Design: 

  • Experiment should be set up in randomized fashion
  • Designed so as to not overload the listener with too many choices
  • Visual presentation should be avoided, unless important to the test
  • Test should include hidden controls (references) so that listeners can be implicitly tested for their ability to tell the difference

Section 3: Selection of listening panels

  • Expert listeners with training are to be used
  • Pre- and post-screening tests are used to eliminate listeners and their scores if determined not to be capable of detecting known audible differences (audiometric tests, previous experience, inconsistent results). 
  • Size of listening panel: 20 is recommended for statistically significant results

Section 4: Test Method

  • The most sensitive test for small differences was found to be of the form "double-blind triple-stimulus with hidden reference". The subjects can pick among three stimuli: A = the known reference and unknown, B and C to compare to each other, their identity hidden. Identities of B and C are unknown. They can be the reference and the other DUT, both can be reference, or both can be the other DUT. The subjects will need to pick which one sounds better to them, B or C. The subject can pick to listen to A, B or C at any time at their discretion, and repeat as often as necessary. No time limit is specified.
  • Familiarization or training phase is included in the test to allow the subjects to get very familiar with the testing method and format

Section 5.4: Advanced sound system specifies what parameters are used to grade the quality of the system by subjects

  • Basic audio quality
  • Timbral quality broken into two properties:
    • sound colour, e.g. brightness, tone colour, coloration, clarity, hardness, equalization, or richness
    • sound homogeneity, e.g. stability, sharpness, realism, fidelity and dynamics
  • Localization quality: this attribute can be separated into horizontal localization quality, vertical localization quality and distant localization quality
  • Environment quality: spatial impression, envelopment, ambience, diffusivity, or spatial directional surround effects

There's a lot more to this document. I'll post some additional tidbits (like the minimum quality of the playback system) a bit later.

 

 

 

Share this post


Link to post
Share on other sites
7 hours ago, pkane2001 said:

Next up is an industry standard recommendation for audio quality testing:

 

Methods for the subjective assessment of small impairments in audio systems, ITU-R BS.1116-3 (02/2015) 

 

There's a lot more to this document. I'll post some additional tidbits (like the minimum quality of the playback system) a bit later.

 

Playback system characteristics are also defined for testing. Some of the interesting ones (to me):

  • Speakers
    • Frequency response between 40-16kHz should be within tolerance of 4dB measured at 0°
    • Frequency response of various loudspeakers should be matched to better than 1db between 250-2kHz
    • Nonlinear distortion should be less than -30B (3%)  for frequencies < 250Hz and
      -40dB (1%) for f >= 250Hz
    • Decay time measured to about 0.37 of peak value should be < 5 / frequency
    • Time delay between channels should be less than 100μs
    • Dynamic range > 108dB RMS
    • Acoustic noise level < 10dBA
  • Headphones
    • Time delay between channels less than 20μs
  • Room specifications (dimensions, reverb time)
  • Listening arrangement for stereo:
    image.png.46fb1b43099ec175f428975f6a4013d8.png

There's a ton more details on room setup and listening arrangements, test conditions, statistical analysis, scoring, testing of listeners, etc. Examples of training phase, instructions to listeners, are also provided to ensure consistency.

 

Share this post


Link to post
Share on other sites

A chapter from an introductory statistics course , discussing experimental design and controls. Not directly related to bias or audio, but making an important distinction between observational studies, experiments, and anecdotal evidence. Something relevant to a lot of discussions in audio (https://www.pitt.edu/~nancyp/stat-0200/slides/bpX1textfieldslecture03-designingstudies.pdf)

 

image.png.5c56bf164bac8d919579add46952f3b1.png 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...