Expectation Bias

manisandher · September 28, 2022

2 hours ago, pkane2001 said:

Properly done DBTs are the gold standard for bias-controlled testing and is the main thing that is used as evidence of anything related to perception.

Yes, but when they give 'inconvenient' results, they're ignored.

Remember the red/blue pill thread? As a reminder:

1. I was convinced that I could hear differences between bit-identical replay (two different buffer settings in the playback software)

2. I was confident that I could demonstrate what I was hearing

3. I set up a blind test (essentially double blind) - Mans controlled the replay from my home study, two doors and a corridor away from the listening room, where I sat

We did 3 tests in total - the first two were non-ABX, and the third ABX. Why were the first two non-ABX? Because I was so confident that I'd be able to correctly identify A and B, that I wouldn't need an AB reference before the X. I was totally wrong.

We captured the digital input into the DAC in real time during the test. All 54 samples (about 15 seconds each in length) were shown to be bit-identical.

In any event, here are the results:

The first non-ABX was a disaster - I'd never done any listening tests before. The second was better - I was getting used to the test and how to differentiate between A and B. The ABX was stellar - 1% probability of guessing (way beyond the generally accepted 5% threshold).

I think any reasonable person would conclude that there's definitely something going on here. Or at the very least, that there's something worth exploring further. But all the hardcore objectivists could come up with was "the ABX result must have been a fluke". Yeah, right.

I set up a (double) blind listening test having never been involved in one before. I agreed that the results would be published on this site for everyone to see, irrespective of the outcome. I essentially put my neck on the line. I achieved 9/10 in the ABX (1% probability by guessing alone), and was told that it must have been a fluke.

And the objectivists wonder why audiophiles are reluctant to undertake DBTs to demonstrate that they really do hear the things they say they do???

Mani.

botrytis · September 28, 2022

I appreciate that Mani. That is why, in those instances one should sample more. The reason being to determine if three was an outlier or just part of a trend.

pkane2001 · September 28, 2022

1 minute ago, manisandher said:

Yes, but when they give 'inconvenient' results, they're ignored.

Remember the red/blue pill thread? As a reminder:

1. I was convinced that I could hear differences between bit-identical replay (two different buffer settings in the playback software)

2. I was confident that I could demonstrate what I was hearing

3. I set up a blind test (essentially double blind) - Mans controlled the replay from my home study, two doors and a corridor away from the listening room, where I sat

We did 3 tests in total - the first two were non-ABX, and the third ABX. Why were the first two non-ABX? Because I was so confident that I'd be able to correctly identify A and B, that I wouldn't need an AB reference before the X. I was totally wrong.

We captured the digital input into the DAC in real time during the test. All 54 samples (about 15 seconds each in length) were shown to be bit-identical.

In any event, here are the results:

The first non-ABX was a disaster - I'd never done any listening tests before. The second was better - I was getting used to the test and how to differentiate between A and B. The ABX was stellar - 1% probability of guessing (way beyond the generally accepted 5% threshold).

I think any reasonable person would conclude that there's definitely something going on here. Or at the very least, that there's something worth exploring further. But all the hardcore objectivists could come up with was "the ABX result must have been a fluke". Yeah, right.

I set up a (double) blind listening test having never been involved in one before. I agreed that the results would be published on this site for everyone to see, irrespective of the outcome. I essentially put my neck on the line. I achieved 9/10 in the ABX (1% probability by guessing alone), and was told that it must have been a fluke.

And the objectivists wonder why audiophiles are reluctant to undertake DBTs to demonstrate that they really do hear the things they say they do???

Mani.

Mani, we've had this discussion repeated many times over the years. You didn't have a protocol designed or agreed to. Between you and Mans, you didn't work out the details of what was being tested, why, or how or what result would be acceptable as "proof". These are rookie mistakes, and I understand how it all happened and don't blame you or Mans -- you're not professional scientists doing perception studies.

But you, like some others in this thread, seem to happily ignore my actual statements. So, I'll repeat (and this applies to the test you conducted by you and Mans):

Quote

Audio enthusiast-done DBTs are not going to rise to the level of a scientific study. That's not surprising or unexpected. Once again, the value here is in proving things to yourself rather than to someone else.

musicjunkie917 · September 28, 2022

1 hour ago, pkane2001 said:

Is it naive to expect some validity to the claims made about often very expensive pieces of equipment?

What are these claims you think they are making? Are these claims invalidated by measurements? Otherwise, it is all subjective.

The Computer Audiophile · September 28, 2022

11 minutes ago, botrytis said:

I appreciate that Mani. That is why, in those instances one should sample more. The reason being to determine if three was an outlier or just part of a trend.

Sample more, or until you get the result you want? 🤣

pkane2001 · September 28, 2022

2 minutes ago, musicjunkie917 said:

What are these claims you think they are making? Are these claims invalidated by measurements? Otherwise, it is all subjective.

But of course! Personally I do both, measurements and DBTs. This allows me to correlate my preferences and audibility thresholds with device measurements.

musicjunkie917 · September 28, 2022

21 minutes ago, pkane2001 said:

But of course! Personally I do both, measurements and DBTs. This allows me to correlate my preferences and audibility thresholds with device measurements.

Except the things that we measure have very little correlation to sound quality. Again, you are being naive.....

pkane2001 · September 28, 2022

4 minutes ago, musicjunkie917 said:

Except the things that we measure have very little correlation to sound quality. Again, you are being naive.....

Curious how you've determined this. Please present your findings.

botrytis · September 28, 2022

32 minutes ago, The Computer Audiophile said:

Sample more, or until you get the result you want? 🤣

No to get enough data to show some trends. The results shown, did not show any trends, that is why larger sample sizes help in this regard. It is basic statistics and sampling theory.

manisandher · September 28, 2022

11 minutes ago, botrytis said:

The results shown, did not show any trends...

Ahem...

How much clearer could the trend be?

12 minutes ago, botrytis said:

... that is why larger sample sizes help in this regard.

Have you ever been involved in a listening test? I listened to 54 samples in total... of the same 15 second music clip. The concentration required was exhausting. I think continuing the test would have been pointless - exhausion would have invalidated the results.

Mani.

The Computer Audiophile · September 28, 2022

17 minutes ago, botrytis said:

No to get enough data to show some trends. The results shown, did not show any trends, that is why larger sample sizes help in this regard. It is basic statistics and sampling theory.

@pkane2001 says a sample size of one is good 🙂

botrytis · September 28, 2022

Just now, The Computer Audiophile said:

@pkane2001 says a sample size of one is good 🙂

Depends on what/who is testing. 😇

pkane2001 · September 28, 2022

5 minutes ago, The Computer Audiophile said:

@pkane2001 says a sample size of one is good 🙂

It's a perfect sample size for the audience of one.

manisandher · September 28, 2022

You know, what really strikes me is the total lack of curiosity. I really expected people to be interested in exploring the results further, but they just weren't.

To my mind, there's nothing particulary strange going on here. We were feeding an spdif signal into a crappy DAC. It seems perfectly reasonable to me that changing the buffer setting in the software player might be changing the noise profile reaching the DAC down the spdif cable. It might have been interesting to explore this hypothesis further, but we never even got to the hypothesis stage... it was all just a fluke.

The whole of science relies on intuition and curiousity, which seems to be seriously lacking in the objectivist audio community.

Mani.

musicjunkie917 · September 28, 2022

34 minutes ago, pkane2001 said:

Curious how you've determined this. Please present your findings.

What measurements do manufacturers provide that readily correspond to sound quality??? Do you think that an amp that measures at .0001 THD automatically sounds better than one that measures with .01 THD?

pkane2001 · September 28, 2022

9 minutes ago, manisandher said:

You know, what really strikes me is the total lack of curiosity. I really expected people to be interested in exploring the results further, but they just weren't.

To my mind, there's nothing particulary strange going on here. We were feeding an spdif signal into a crappy DAC. It seems perfectly reasonable to me that changing the buffer setting in the software player might be changing the noise profile reaching the DAC down the spdif cable. It might have been interesting to explore this hypothesis further, but we never even got to the hypothesis stage... it was all just a fluke.

The whole of science relies on intuition and curiousity, which seems to be seriously lacking in the objectivist audio community.

Mani.

It really was a poorly designed test, Mani. It was not clear until weeks after what was being tested, and even then, the buffer setting was not documented or well defined by PeterSt as to what it controlled or what effect it had on the digital signal. As you may recall, I actually spent quite a bit of time on the analysis at that time, but there were problems with recordings and different versions were generated after the test was completed.

pkane2001 · September 28, 2022

7 minutes ago, musicjunkie917 said:

What measurements do manufacturers provide that readily correspond to sound quality??? Do you think that an amp that measures at .0001 THD automatically sounds better than one that measures with .01 THD?

I actually know this answer for various distortion levels and signals for myself, because I spent the time testing and studying this. Do you?

manisandher · September 28, 2022

12 minutes ago, pkane2001 said:

It really was a poorly designed test, Mani. It was not clear until weeks after what was being tested, and even then, the buffer setting was not documented or well defined by PeterSt as to what it controlled or what effect it had on the digital signal. As you may recall, I actually spent quite a bit of time on the analysis at that time, but there were problems with recordings and different versions were generated after the test was completed.

Paul, you're conflating two things here.

The first is the listening test. All we were testing for was whether two bit-identical playback means could sound audibly different. The mechanism at play was irrelevant. It could have been anything, as long as we could show that things remained bit-identical at all times, which we did.

The second is the mechanism at play. Here I agree with you. It was difficult to figure out exactly what was happening. But that's exactly why it would have been interesting to explore things further. I even stated that I would have been prepared to repeat the test, but this time with a clear hypothesis. And I would have done this... if people had been more curious about the result of the initial test, and not dismissed it out of hand as a fluke.

Mani.

pkane2001 · September 28, 2022

2 minutes ago, manisandher said:

Paul, you're conflating two things here.

The first is the listening test. All we were testing for was whether two bit-identical playback means could sound audibly different. The mechanism at play was irrelevant. It could have been anything, as long as we could show that things remained bit-identical at all times, which we did.

The second is the mechanism at play. Here I agree with you. It was difficult to figure out exactly what was happening. But that's exactly why it would have been interesting to explore things further. I even stated that I would have been prepared to repeat the test, but this time with a clear hypothesis. And I would have done this... if people had been more curious about the result of the initial test, and not dismissed it out of hand as a fluke.

Mani.

The test hypothesis going in to the test was that USB bit-identical signals will be indistinguishable. The test was for something completely different, where transmission jitter could have easily been the issue. Which is why a proper protocol design is important.

I’m personally not very curious about audibility of jitter. I know at what levels it becomes audible for me. But since jitter was not measured in your test, there is no way to confirm or deny this after the test.

musicjunkie917 · September 28, 2022

30 minutes ago, pkane2001 said:

I actually know this answer for various distortion levels and signals for myself, because I spent the time testing and studying this. Do you?

Of course I know the answer. The answer is that you cannot tell how good an amp sounds based on THD. There are amps that measure great and sound terrible and there are amps that measure not nearly as good that sound great. Humans can't hear distortion much below 1%. So there is no way you can hear the difference between .0001, .001, and .01 THD.

manisandher · September 28, 2022

2 minutes ago, pkane2001 said:

The test hypothesis going in to the test was that USB bit-identical signals will be indistinguishable.

No.

I had listed a number of things that we could try, including different USB cables. Mans wanted to capture the digital input into the DAC in real time during the test (correctly, in my opinion). I only had the means to do this with spdif, so we went with software buffer settings instead. This was agreed upon a couple of days before the test.

Today, I'd be able to do it with USB easily.

7 minutes ago, pkane2001 said:

... transmission jitter could have easily been the issue.

Agreed. But never followed up on because the results of the listening test were not accepted.

pkane2001 · September 28, 2022

29 minutes ago, musicjunkie917 said:

Of course I know the answer. The answer is that you cannot tell how good an amp sounds based on THD. There are amps that measure great and sound terrible and there are amps that measure not nearly as good that sound great. Humans can't hear distortion much below 1%. So there is no way you can hear the difference between .0001, .001, and .01 THD.

How did you come by this answer, was my question. You make some pretty strong claims, but I doubt you understand them. THD is a very simple measure that is quick to look at but not enough to judge anything about the "sound of an amp", if there is any such thing.

pkane2001 · September 28, 2022

32 minutes ago, manisandher said:

No.

I had listed a number of things that we could try, including different USB cables. Mans wanted to capture the digital input into the DAC in real time during the test (correctly, in my opinion). I only had the means to do this with spdif, so we went with software buffer settings instead. This was agreed upon a couple of days before the test.

Today, I'd be able to do it with USB easily.

Agreed. But never followed up on because the results of the listening test were not accepted.

Certainly all the discussion on the forums prior to your tests was all about USB. The test design would, by necessity, be different for SPDIF. Neither you nor Mans were thinking about how to test for this properly. A digital recording of SPDIF signal wouldn't reveal jitter and so wouldn't be very useful if that was a likely source of error.

As to you doing it with USB easily, I doubt it. There can be some pathological cases where this might happen due to various tells (clicks, delays on start up, initial distortion, actual bit losses in transmission, etc.) and possibly due to poor USB interface design causing ground loops carrying PC-generated noise into the DAC. But otherwise, I really don't think so.

That your test results were accepted by others or not isn't what's important. You trying to figure out what you heard and why is much more interesting. Assuming you are curious, you'd try to find the answer by doing more tests, coming up with hypothesis on what caused the difference, and then testing for it. I know I would.

manisandher · September 29, 2022

8 hours ago, pkane2001 said:

A digital recording of SPDIF signal wouldn't reveal jitter and so wouldn't be very useful if that was a likely source of error.

Again, you're conflating two things. The listening test was only to determine if differences in bit-identical playback were audible. The digital recording of the spdif signal was necessary to show that the differences (in this case buffer settings) remained bit-identical at all times.

8 hours ago, pkane2001 said:

As to you doing it with USB easily, I doubt it.

No, I meant I can easily capture the digital input into a USB DAC in real time, to show that things remain bit-identical (because I have DACs with USB inputs and spdif outputs). So, repeating the test using USB instead of spdif would now be possible.

I have a bunch of USB cables here, and am confident I hear differences between them. However, I would say that there is zero correlation between price and SQ. One of my favourite-sounding ones is a 1.5m USB-certified cable that I paid a few pounds for many years ago. The most expensive I have here I paid a stupid amount for (in my more gullible days), and sounds terrible... IMO.

Now, how's that for expectation bias?

8 hours ago, pkane2001 said:

There can be some pathological cases where this might happen due to various tells (clicks, delays on start up, initial distortion, actual bit losses in transmission, etc.) and possibly due to poor USB interface design causing ground loops carrying PC-generated noise into the DAC. But otherwise, I really don't think so.

I have a number of modern, well-measuring DACs here (Okto dac8 PRO, SMSL DO200, RME ADI-2 Pro, MOTU UltraLite-mk5). I'm certain I hear differences between USB cables with them all.

8 hours ago, pkane2001 said:

That your test results were accepted by others or not isn't what's important. You trying to figure out what you heard and why is much more interesting. Assuming you are curious, you'd try to find the answer by doing more tests, coming up with hypothesis on what caused the difference, and then testing for it. I know I would.

Yep.

I haven't stopped thinking about these things over these last few years. I've come up with a method that I'm hoping will show differences at the analogue output of the DAC with bit-identical changes upstream. But it's going to take a lot of time and effort to do, which I just haven't been able to find to date.

If the method works and differences are indeed detectable, I'll then like to start exploring possible mechanisms at play, and perhaps setting up more listening tests.

I'm not a manufacturer, a dealer or a reviewer. I have nothing to gain from this endeavour... other than to satisfy my curiosity 😉.

firedog · September 29, 2022

10 hours ago, manisandher said:

The whole of science relies on intuition and curiousity

Sort of. It's a necessary but not sufficient condition.

People who are totally unscientific also have intuition and curiosity.

Science is a method for finding out repeatable objective results and producing hypotheses that get us closer to the truth about the natural world.

That's why eliminating all bias from testing and discovering something that's repeatable is important.

Your "experiment" is interesting, but is only a first step.

But it shows why this stuff is almost never done in audio. The comments about experiment design and the need to do more are spot on. It's difficult and expensive to do right, and the incentive really isn't there. There's little academic interest, and commercial interest is mostly against it.

Expectation Bias

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Create an account or sign in to comment

Create an account

Sign in