Jump to content
IGNORED

Does BIAS affect audio test results?


Recommended Posts

On 6/17/2020 at 10:32 PM, pkane2001 said:

This head-fi thread doesn't rise to the level of a real study (or meta-study),

 

Agreed (premise 1) ^^^

 

 

On 6/17/2020 at 10:32 PM, pkane2001 said:

but summarizes a lot of the blind tests and attempts at blind tests, especially ABX, that have been tried.

 

Agreed (premise 2) ^^^

 

On 6/17/2020 at 10:32 PM, pkane2001 said:

 

 These mostly demonstrate that the audiophile-reported huge differences between cables, DACs, amps, and various hi-res formats are not so huge or obvious when proper bias controls are put in place:

 

 

Conclusion ^^^?

See premise 1

Sound Minds Mind Sound

 

 

Link to comment
5 minutes ago, pkane2001 said:

 

Of course everyone's guilty of bias.

 

Maybe, at least potentially

 

 

5 minutes ago, pkane2001 said:

That's what the studies so far clearly illustrate.

 

 

The 'studies' (like the Head FI thread or opinion piece you linked) don't clearly demonstrate anything beyond opinion, IMO....and biased conclusions

 

 

5 minutes ago, pkane2001 said:

Despite this, there is a way to move forward, and that's one of the main reasons for this thread:

 

see post 64

 

5 minutes ago, pkane2001 said:

 

1. Help recognize that our preferences often are affected by things that have nothing to do with sound that are often subconscious and not under our control

 

Sometimes, if borne out by actual real experiments

 

5 minutes ago, pkane2001 said:

2. Figure out a way to remove as many biases as possible from affecting the outcome of a preference test

 

Yes , if done in an unbiased way that is methodologically valid

Sound Minds Mind Sound

 

 

Link to comment
2 minutes ago, pkane2001 said:

Read the individual tests and their results.

 

None that I read seemed convincing to me. I am happy to reconsider on a case by case basis.

 

 

2 minutes ago, pkane2001 said:

 

The conclusion was not mine, it's a summary of the comments in the post I shared.

 

Who cares apart from the individual forum posters.

On 6/17/2020 at 10:32 PM, pkane2001 said:

This head-fi thread doesn't rise to the level of a real study (or meta-study),

Sound Minds Mind Sound

 

 

Link to comment
Just now, pkane2001 said:

 

Considering that I didn't claim anything about their validity,

 

Then why mention them?

 

Just now, pkane2001 said:

  But perhaps you can comment on the other 15+  (published and peer-reviewed) studies posted in this thread that demonstrate similar findings? 

 

Pick any one you like and happy to discuss

Sound Minds Mind Sound

 

 

Link to comment
3 minutes ago, pkane2001 said:

 

Why? Because they are all actual data points

 

Yet you said they were not necessarily valid. Are we supposed to be swayed by invalid data points?

 

3 minutes ago, pkane2001 said:

These results are easily predicted by the other, more serious and controlled studies already cited, so they seem to fit into the overall pattern of this thread: biases have a strong effect on preferences in uncontrolled listening testing.

 

If easy, pick one and show us how easy it is

Sound Minds Mind Sound

 

 

Link to comment
1 minute ago, The Computer Audiophile said:

If you can provide objective information, it's welcome in this thread. We have a complete forum for all other information and I encourage you to use it. You can even create your own thread and post a link to it in this one. 

 

I invited Paul to pick a study, any study that he cited. He did not. As there was over 15 cited, I was offering that he should choose rather than me cherry pick. Again, I do not see how this offer to discuss bias and cited studies fits into *your* description of, "complete forum for all other information ".

 

The OP wants me out, so as said, I will respect his wishes.

Sound Minds Mind Sound

 

 

Link to comment
  • 2 weeks later...

Following on from what @Summit and many others have said (myself included) its all about the methodology. So, for any methodology of any test or test procedure a pivotal question is how good is it at telling you what you want to know. For good tests there are objective ways to determine this and one example being aware of the false negative rate the test produces.

 

If any test has a false negative rate of 90% that would render it useless. If OTOH the false negative rate is 5% then you can say, well the test is telling me something but there is a 1 in 20 chance it is getting it wrong. In either case you know objectively where you stand.

 

In all the studies cited and AFAIK other blind listening studies, including Toole's, there is no definitive false negative rates declared ( I genuinely stand to be corrected here). It seems that it is assumed that the false negative rates are negligible....but how do you/they know that? Definitive statements are made such as "blind tests show..bla bla". ...but what if the false negative rate is 90%?

 

Now, as pointed out, there are recommendations, official recommendations, made to mitigate these false negatives like take your time, avoid fatigue, drink more water, whatever. However, these are just (mostly) sensible guidelines that in and of themselves have not been validated with experimental evidence that results in revealing the actual false negative rates by adhering to these guidelines. Guidelines are guidelines, if you compare European vs American guidelines on various medical conditions you will find differences of opinion. Why? Because the experimental evidence is not conclusive and the jury remains out.

 

Bottom line -Tell me the false negative rate of any test otherwise I have no confidence the test is not giving me a wrong result.

Sound Minds Mind Sound

 

 

Link to comment
3 hours ago, pkane2001 said:

 

This seems to be your main objection to blind testing, from what I gather. But, if you read the studies and proposed testing methodologies, you'll see how hidden controls and pre-testing are used to determine false negatives. The testing recommendations include eliminating scores from those testers that show no ability to discriminate or discriminate randomly.  Testing on a larger sample will eliminate the variables related to individual's performance that day or at that time, or whatever other individual circumstances specific to a person.

 

For example, MUSHRA testing recommendation from ITU-R includes the following (there's more to each section, I cut out the statistical analysis that's part of the recommendation)

 

image.png.450c65db3c28c26cf4388657bbcd7b01.png

image.png.9ed8bca038d5b3a30d225f2483b032ba.png

 

Toole's findings related to errors caused by sighted evaluation compared to blind, even with trained listeners:

 

image.png.47d3c49d04cfa58d9019ab34b117d734.png

 

 

 

 Hi Paul,

any false result, a false positive or false negative is a concern but false negatives are certainly relevant when looking at failures identifying a condition/situation. It is a complex area of statistics (for me) but you need to know sensitivity figures, specificity figures and calculate the true postives, true negatives, false positives and false negatives and look and positive and negative predictive values. Sometimes expressed in the form of 2x2 tables. I mentioned this in my first ever post on AS (CA)

 

Neither of the examples cited actually specify false negative rates as in actual objective measurements.

 

I do have the whole ITU paper and it is at least more rigorous in identifying good quality listeners but this is not the same as stating what the actual fasle negative rate is for the test procedure for experienced or any other listeners.

 

There are obviously just assertions in the Toole description cited, not any actual figures for false negatives. I have this paper and many/most other Toole publications collected over the years and nowhere have i seen published false negatives rates. he just seems to assume they are negligible.

 

FWIW the Toole publications tend to favor that both experienced and inexperienced listeners are both good 'discriminators' for discerning difference, experienced listeners just more efficient, needing less to get a statistically significant result. I also believe this conclusion is flawed (as in understatement) but (if true) it would tend to invalidate the ITU screening procedure to some extent.

 

What is required is actual objective measurement figures for the test at hand, not ways that someone has tried to mitigate false results, you need actual measurements of false results expressed as a percentage (the "anchor" test does not give you the false negative rate).

Sound Minds Mind Sound

 

 

Link to comment

Hi Paul. I would agree with you that the ITU recommendations are a step in the right direction in attempting to reduce errors. Arguably the best we have got so far.

 

ITU-R BS.1534 ( cited in post #102) uses “Multi Stimulus test with Hidden Reference and Anchor (MUSHRA)”, is a " Method for the subjective assessment of intermediate quality level of audio systems". It is "not intended for assessment of small [audio] impairments". The paper does not mention "false negatives" and grading of an anchor signal on an ordinal scale from 1 to 5 compared to peers does not substitute or exclude the need for nominating a known false negative rate.

 

 

ITU-R BS.111-3 (2015) is for "Methods for the subjective assessment of small impairments in audio systems" and is therefore putatively more appropriate to audiophile listening tests. Kudos because it does state some limitations like "perceptual  objective  assessment  methods  are  being  developed  for  testing  the sound quality of sound systems", and "further studies [are needed ] of the characteristics of listening rooms and reproduction devices for the advanced sound system are needed".

 

In addition to those caveats it is also still worth mentioning that nowhere in that paper is the term "false negatives" mentioned or offered as a measure. In relation to "screening tests" they talk about uncovering "tendencies" but there is no quantitative measure offered or suggested in that section. They sensibly advise "caution" given "subjects have different sensitives to different artefacts". Anchors are not used unless all systems are found transparent, and as a means to see how (allegedly) good the subjects are at discerning known differences. Again that should not be confused as yielding a false negative rate for the test at hand. My interpretation FWIW oversimplified is that 'hey, these guys do not have golden ears, let's get a bunch that do'

Sound Minds Mind Sound

 

 

Link to comment
7 hours ago, pkane2001 said:

@Summit proposed one (long-term vs short-term evaluation). This can be turned into a proper blind test, although it's not going to be easy to conduct if it demands a week or a month to be spent with each DUT at a time. It could take years 

 

Definitely something along the lines that Summit and others have suggested  with listening conditions more representative of real world experience.

 

I am also sure you are aware of Thomas Lund from Genelec and his "Slow Listening" and his AES papers on Slow Listening and "On Human Perceptual Bandwidth and Slow Listening" ( I have them if needed). I am not pushing it as a solution but just something that needs to be further explored in that direction.

 

I absolutely agree that it could take years and as well, may be costly and probably require considerable research expertise.

 

Sound Minds Mind Sound

 

 

Link to comment
36 minutes ago, pkane2001 said:

 

Thomas Lund's AES paper has no experimental data, so no evidence, just conjectures. In the paper, slow listening is defined as "hours" not days or weeks or months, so should be testable in much less than a year :)

 

Here's an AES presentation by James Johnston, an audio research scientist and author of many papers and presentations:

 

http://www.aes-media.org/sections/pnw/ppt/jj/highlevelnobg.ppt

 

A few slides that apply to this (thread) discussion:

image.png.357ede2d93436af7c9bb11797a7585c8.png

image.png.36831a2694e4c3797898aedfa671f2af.png

 

image.png.239e28212b9d0e5af56fefc1c1fe7dab.png

 

Lund's stuff basically suggests much longer listening time intervals as others have also suggested...he talks about "dedicated listening for 8 hours per day" and "In case what is tested for is unfamiliar, slow listening could take as long as it would for the subject to learn a new language, maybe more." Importantly, I am not saying he is right anymore than Toole or anyone else is right, just it is a very different approach

 

AES paper also attached

 

----

JJ Johnston first came under my radar when Wes Phillips interviewed him at ATT quite some years ago regarding PSR (Perceptual Soundfield Reconstruction) which he was apparently working on. The article  said "Forget everything you've heard about multichannel. Compared to PSR, it's all a joke -- from Quad to all the way up to five-channel SACD. Oh, Ambisonics works well enough, if you're willing to put up with a listening room that looks like a tornado hit a music store and a single-person-head-in-a-vice sweet spot. But PSR does all that with a minimal amount of equipment -- and does it better, to boot."

AES Time for Slow Listening.pdf

Sound Minds Mind Sound

 

 

Link to comment
12 hours ago, pkane2001 said:

 

Thanks for the paper. Perhaps Lund has done some research in this area, but unlike Toole's papers (and his book), Lund's contains no experimental data, no analysis, no evidence to substantiate his claims. But I'm not aware of where Toole has studied (or even just recommended) the length of the evaluation, so I don't know if they'd disagree.

 

 

J_J is an interesting character, who's been doing audio research for quite a long time. His findings often contradict the 'common sense' beliefs of audiophiles. I've had a few discussions with him on ASR regarding the audibility of distortions and on other topics. Here's a collection of presentations and papers that I found very educational in my research:

 

http://www.aes.org/sections/pnw/jj.htm

 

Also, some interesting recordings of talks (including by JJ) worth watching and hearing:

http://www.aes-media.org/sections/pnw/pnwrecaps/index.htm

 

A good presentation on perception of audio:  

 

Hi Paul

thanks for the links on JJ.

 

Re Toole vs Lund.

I have not seen experimental evidence provided by Lund but I may have missed it. My point was not that he is right or wrong but that he provides certain arguably factual information, and as you call it, conjectures, that may need to be accounted for in any hypothesis you or I or anyone might form on our topic of interest. It may be that we decide to dismiss his (non experimental) evidence and/or conjectures and provide good reasons to do so. Good hypotheses are well reasoned and do not bias inclusion or exclusion of information but seek all potentially relevant information. As Lund has 2 AES papers related to the topic I simply offered them as potentially relevant.

 

Toole has many publications. Yes there is experimental evidence presented. The question is therefore do we accept this as wholly correct and shut up shop and be done with it. A great many people would I believe say YES. The guy has unequivocally "proven" this or that. Trouble is there is a great many that say NO, I'm not buying that.

 

I won't go back into the same loop here but essentially Toole, and many others, predicate their experimental evidence on blind listening tests as used in certain methodologies. I have explained the issues and controversies about this. Suffice to say that every conclusion is based on the validity of these tests --> outcome measures. Logically, if you are 100% on board with those outcomes it is game over.....and also logically, the great debate will continue unresolved.

 

I hope that makes some sense.

 

 

Sound Minds Mind Sound

 

 

Link to comment
  • 3 weeks later...
2 hours ago, Chris987654321 said:

It’s a really sad time in audio history with high tech dominating the discussion, mainly by utilizing the double-blind test in the favor of selling tech to maintain salaries worthy of the engineer’s higher educations. The proclaiming that biases are at the root of enjoyment so we must eliminate them to accurately know something, is short sighted. 

 

The blind test: now that it has been used much

more frequently for over a decade.. I’ve come to realize is foolish as it engages only the ear, nothing else. It is difficult to understand a moment where only the ear shall decide the quality. Speech communication perhaps? Hearing aids? There I see an argument. Music is neither. It is ENTERTAINMENT.

 

in the area of entertainment, it is very important to stimulate as many senses as possible. And mainly the 6th sense... the sense of engagement.  
 

In addition to sound, by stimulating the visual cortex and the also with physical touch, it can trigger impressions in the engagement of the user, and make them enjoy the experience more. Or turn them off if both visual and physical appear to be poor quality. 
 

As you can see, we’ve been going down a foolish path, one that offers little enjoyable benefit. Maybe a poor person can convince themselves they now have the best? There’s a benefit I guess. What I’ve seen is a lot of unwarranted and unneeded criticism toward more expensive equipment, using the blind test as a means to disrupt and sell more quantity of lower quality products. Apple iPod and later iPhone comes to mind, since the original poor quality compressed audio paid for their rescue. And also had a hand in much of the double blind science worship coincidentally. 
 

as you can see.. the money of a company worth $1trillion has an interest in audio. They are getting better, I don’t really care..Someday I won’t be surprised if Apple wants to sell audiophile gear, and magically all the science will be against double blind testing. Currently if someone wants to waste $40k on snake oil, that’s their right. And they probably are enjoying the experience more than anyone. 
 


 

 

Hi Chris

and welcome! Your inaugural post raises a number of issues that resonate with myself and others but divide the audio world in two halves.

 

There is no doubt that bias affects audio tests. The real debate is on what follows: how much, when, does it matter, and how to deal with it if it does matter.

 

Blind listening tests have their limitations and I think you have touched on some. There will be no agreement though as some see them as the already established gold standard measure without any possible down sides other than to expose fools, which is to say there is no downside in their minds. Not all "objectivists" are on that particular merry-go- round and some are just after the truth, wherever it falls. The irony is that bias can sometimes get in the way of the truth for all of us.

 

That said, as you said, this is about enjoying music, not about finding a vaccine for Covid. If it "engages" you and touches you with its beauty, that's all that matters. Science and engineering take a huge back step. The problem is that without science and engineering, making the music sound more beautiful and engaging over bits of hardware and software becomes problematic, some would say impossible.

 

My spin is that there has to be a balance between the subjective and the objective ( note I did not say subjectivist or objectivist) working hand in glove. People will focus on one at the detriment of the other and bias will shift that focus this way or that.

Sound Minds Mind Sound

 

 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...