Jump to content
IGNORED

Who's afraid of DBTs


Recommended Posts

So what've we established so far by way of double blind testing and the scientific method?

 

- I asked a few days ago for any studies anyone had establishing the efficacy of DBT for audio, and in particular discussing false negatives. I haven't had any time the last few days and have just briefly glanced at this thread, but I don't think anyone's come up with such a study. (Please alert me if someone's come up with such a citation and I missed it.) So as of the moment, we can't say any investigation based on DBT is scientific if the object is to determine whether various types of audible differences exist. (As I've noted before, it appears small loudness differences are relatively easy to detect, though not necessarily *as* loudness differences; often they're detected as "better," not "louder." It appears the same may be true for various levels of certain types of distortion, e.g., some people's preference for NOS DACs fed by Redbook resolution files. Or perhaps it's a dislike for the interpolation filtering in OS DACs. Or perhaps it's both, or neither. Hard to tell without some form of reliable scientific testing.)

 

- Why isn't DBT "good enough" for audio if it's plenty scientific for, e.g., drug studies? Here's why: No "false negative" problem with drug tests. There are objective measures to determine whether the drug is working. The problem DBTs in drug testing guard against is the placebo effect, which is a false positive effect, not a false negative.

 

- I can think of various experimental protocols that may help determine a false negative rate for DBT in audio, involving independent sighted preference testing as a control. Would be delighted to discuss these further when I have more time.

 

- What we've heard about human auditory memory seems to me at least to militate strongly against the efficacy of DBT for audio. More than a two second difference and the accuracy of memory starts to decay? If we're limited to two second selections with no time between for comparison, then effectively we're talking about test tones rather than music. No chance to listen to a singer's phrasing, for example. Hardly a chance to even recognize instruments. Essentially, what I gather from the discussion of DBT and auditory memory is, the more like actually listening to music the test experience is, the less effective a DBT comparison should be. Am I correct or incorrect here?

 

- Something else I want to talk more about when I have time: I laid out the results of my sighted tests of USB cables. My preferences followed neither price nor appearance of the cables. Dennis said this was still expectation bias, because I expected differences and so the results came out different, just random. Within the Audioquest line, my preferences did follow price, so I suppose the explanation would be that my expectations were non-random in that regard. But do you see what's happening? If my preferences would have followed price and appearance, they could have been attributed to expectations. They mostly didn't, so what to attribute that to? Expectations. And the one aspect in which they did follow price? Why, expectations of course. What we call this in law is a hypothesis that "proves too much;" or in philosophy of science terms, as Bill Scott has talked about, it's non-falsifiable, i.e., non-scientific. And what's the reason we might want DBT in audio, despite its potential disadvantages? To guard against expectation bias. But if preferences for at least some people some of the time are not based on expectation bias, then what problem is DBT potentially solving? In other words, if I can save my $10k or $12k or whatever it is with sighted listening, and enjoy the shopping/listening experience as well, then why not?

 

- Bottom line: So far as I'm aware (happy to be made aware, though), there's no reliable proof that DBT is a scientific method of comparing audio components. It can guard against positive expectation bias (expecting to hear that the more expensive or better looking component is better), if that's a problem. It cannot guard against negative expectation bias (expecting to hear no difference). There's been no reliable study, again so far as I'm aware, of false negative rates with DBT. Accepting the auditory memory discussion that's taken place here, that would seem to indicate the more like the actual experience of listening to music a DBT or any other listening test is, the less reliable it is. Frankly, since I don't find myself running out and buying expensive stuff for the faceplates, I don't see why I shouldn't just enjoy myself when doing comparisons - it's part of the fun of the hobby. No, it's not scientifically reliable, but then neither is DBT as far as we can tell. We'd all like scientific reliability in our equipment comparisons, but I'm afraid we haven't yet achieved it in audio beyond actual physical measurements; and it appears the audible impact of at least some of those measurements will have to remain up for discussion for the time being.

One never knows, do one? - Fats Waller

The fairest thing we can experience is the mysterious. It is the fundamental emotion which stands at the cradle of true art and true science. - Einstein

Computer, Audirvana -> optical Ethernet to Fitlet3 -> Fibbr Alpha Optical USB -> iFi NEO iDSD DAC -> Apollon Audio 1ET400A Mini (Purifi based) -> Vandersteen 3A Signature.

Link to comment

Jud, Wow. One quite crucial response from my hand :

 

In other words, if I can save my $10k or $12k or whatever it is

 

It was $8K.

 

Thread can be closed now.

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment
Essentially, what I gather from the discussion of DBT and auditory memory is, the more like actually listening to music the test experience is, the less effective a DBT comparison should be. Am I correct or incorrect here?

 

Incorrect. This is because once the differences are pointed out in advance, they are the most easy to concentrate on *and* remembered. And if someone thinks this is thus expectation bias then he is theorizing only.

Which all this is anyway, if you ask me.

 

Now who is taking on this $1K bet that not only I can do it, but you, no matter who you are, just the same. But be here.

Make it $8K if you're ready to really lose some money.

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment
if I can save my $10k or $12k or whatever it is with sighted listening, and enjoy the shopping/listening experience as well, then why not?

 

Because if your wife finds out that this is for shopping pleasure only, she will spend it on shoes.

Bet again ?

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment
Jud, Wow. One quite crucial response from my hand :

 

 

 

It was $8K.

 

Thread can be closed now.

 

Surely we're approaching a time when the great and good of these forums can put out a statement of their current understanding of whether people are hearing sound differences or not, then we can just refer to and/or build on that statement to hopefully move the hobby forward rather than having endless debates about measurements and DBT. If the science isn't there yet, then the balance of probability is.

 

If the objectivists care so much about the HiFi manufacturers ripping off the public then why aren't they campaigning and hassling them to stop. No anecdotes from them about that are there.

There is no harm in doubt and skepticism, for it is through these that new discoveries are made. Richard P Feynman

 

http://mqnplayer.blogspot.co.uk/

Link to comment
- I can think of various experimental protocols that may help determine a false negative rate for DBT in audio, involving independent sighted preference testing as a control. Would be delighted to discuss these further when I have more time.

 

So someone gets the theories correct ? I really don't see why that would be needed.

So look at yourself; If you listen to the NOS1a for a while (which you did) you have a clear verdict. Your USB cable example, same thing. The verdict is very explicit.

 

So what are you actually doing ? try to prove to others via other means (DBT) that you are right with your verdicts ?

What a STUPID discussion.

 

The whole thing emerges because of a bunch that don't hear differences anywhere anyway. And now they require YOUR proof.

 

Discussion for the discussing. That is this. Nothing better to do. Me no different. ;)

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment
Me no different. ;)

 

Loved this, and +1 from me, my friend.

One never knows, do one? - Fats Waller

The fairest thing we can experience is the mysterious. It is the fundamental emotion which stands at the cradle of true art and true science. - Einstein

Computer, Audirvana -> optical Ethernet to Fitlet3 -> Fibbr Alpha Optical USB -> iFi NEO iDSD DAC -> Apollon Audio 1ET400A Mini (Purifi based) -> Vandersteen 3A Signature.

Link to comment
So someone gets the theories correct ? I really don't see why that would be needed.

So look at yourself; If you listen to the NOS1a for a while (which you did) you have a clear verdict. Your USB cable example, same thing. The verdict is very explicit.

 

So what are you actually doing ? try to prove to others via other means (DBT) that you are right with your verdicts ?

What a STUPID discussion.

 

The whole thing emerges because of a bunch that don't hear differences anywhere anyway. And now they require YOUR proof.

 

Discussion for the discussing. That is this. Nothing better to do. Me no different. ;)

 

Yeah, why am I bothering? Here's why:

 

I think Dennis, trithio, mayhem, etc., would dearly love to have some fairly scientific means of parsing audible differences in equipment. So would I. And so would you of course, for the sake of being able to more efficiently improve your designs, and your own and your customers' listening experiences. In other words, we're all wanting to get to the same place.

 

There are certainly disagreements about how to get there. My contributions in the thread are simply my thoughts to say I don't think DBT has got us there yet, so let's keep searching. Some will agree, some won't; some will seriously consider, some won't; and that's fine.

One never knows, do one? - Fats Waller

The fairest thing we can experience is the mysterious. It is the fundamental emotion which stands at the cradle of true art and true science. - Einstein

Computer, Audirvana -> optical Ethernet to Fitlet3 -> Fibbr Alpha Optical USB -> iFi NEO iDSD DAC -> Apollon Audio 1ET400A Mini (Purifi based) -> Vandersteen 3A Signature.

Link to comment

Jud, I only hope that you did not interpret my "Wow" as sarcasm. Which of course could easily follow from those couple of following posts and "Thread can be closed". Contrary, I meant the "Wow" because at least to me your last post came a across as a serious recap with merits. That's why the "Thread can be closed", really. It is only that you posted a few more things to examine and from those I may think "what a waste of time".

 

Do that for trithio ? finding a high bridge is the better idea.

Dennis ? no problem with that.

mayhem ? have some efficiency from that same bridge you already found.

 

This is nothing personal, but some are open for things and others are not at all. I still will never care, but those who are not open will not progress a bit. And a bit could already help. But I can't decide for that/them and the only thing which may disturb is that you now are the fool. Nothing about agree to disagree. So one half are fools and the other is something else.

Which is whom ?

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment
Yeah, why am I bothering? Here's why:

 

I think Dennis, trithio, mayhem, etc., would dearly love to have some fairly scientific means of parsing audible differences in equipment. So would I. And so would you of course, for the sake of being able to more efficiently improve your designs, and your own and your customers' listening experiences. In other words, we're all wanting to get to the same place.

 

There are certainly disagreements about how to get there. My contributions in the thread are simply my thoughts to say I don't think DBT has got us there yet, so let's keep searching. Some will agree, some won't; some will seriously consider, some won't; and that's fine.

 

Not sure where you see the problem with controlled tests but I am not aware of any other than funny smear campaigns. We do not need to establish anything here, DBTs are just another scientific tool which works in audio same as everywhere. That is, it works good. Or as good as you set it up.

Also not sure where is this stuff about proving things 'for audio' coming from. No one needs to do that cause there is nothing special about audio. And science and its metods work everywhere the same anyway. That's the beauty of it. Not many people see it but nobody will convince those.

Other than that thanks for the few nice and detailed posts.

Link to comment

Wow (again) ... was that you ??

 

There may be a difference with audio compared to many or all other phenomena. That is, when we try to prove things from the distance - or try to tell each other about it; we can not verify it. Contrary, I could tell you to apply such and so cable, this and that grounding, and now watch pixels moving on a 4 meter wide screen. If the set up is the same, you should see them move when I see that happening over here. There's no placebo in order now. And oh, this, while for drugs there again could be. If I tell you that Dummy pill A helps, it may because I say so. Or Real Pill B may help explicitly NOT because I say so, being an untrustworthy person which you know.

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment

Jud, I mostly agree with you but would actual phrase it differently - in well run DBTs there are internal hidden controls to uncover false negatives, it's just that most DBTs are not well run.

 

Here's extracts from "Methods for the subjective assessment of small impairments in audio systems" ITU-R BS.1116-2 (2014)

 

A major consideration is the inclusion of appropriate control conditions. Typically, control conditions include the presentation of unimpaired audio materials, introduced in ways that are unpredictable to the subjects. It is the differences between judgement of these control stimuli and the potentially impaired ones that allows one to conclude that the grades are actual assessments of the impairments.

 

So false negatives are a part of the recommended procedures for this sort of assessment.

 

Why are such negative controls needed - because perceptual testing is a very difficult thing to get right & it is recognised that fatigue & loss of focus quickly set in when one is repeatedly listening to the same piece/snippet of audio through many repetitions:

 

This is recognised again in the recommendations:

A grading session should not last for more than 20-30 min, although the self-paced character of trials advocated here will introduce uncontrolled variability among subjects. Experience suggests that no more than 10 to 15 trials per session should be scheduled to achieve the desired session length.Subject fatigue may become a major factor which would seriously interfere with the validity of judgements. To avoid this, rest periods equal to a duration no less than the session length should be scheduled between successive sessions for each subject.

 

My summary is that DBTs are for research labs & the scientific community, those who have the funding & expertise to conduct such tests. All other blind test NULL results are invalid as we don't know what percentage of the NULL results are false negatives due to fatigue, loss of focus, negative expectation bias, or game playing.

 

A good example of DBT results I've recently seen - Arny Kreuger's results from a recent ABX test (BTW, he claims to the father of ABX testing :)

Let me put up the Foobar ABX results & see if anyone spots the problem with them (no cheating - if you already know)?

 

The playback transducers were the highly regarded Audio Technica ATH-M50 headphones.

 

The ABX log is as follows:

 

foo_abx 2.0 beta 4 report

foobar2000 v1.3.5

2015-01-06 21:04:53

 

File A: 15 Haydn_ String Quartet In D, Op_ 76, No_ 5 - Finale - Presto + cues 256kbps.mp3

SHA1: f24d8c506ae5d38fd7d3a8e7700ee8595cd5e025

File B: 15 Haydn_ String Quartet In D, Op_ 76, No_ 5 - Finale - Presto + cues.wav

SHA1: 961320fa0baa1983130304bed02df943a32cfe25

 

Output:

DS : Primary Sound Driver

 

21:04:53 : Test started.

21:05:18 : 00/01

21:05:39 : 01/02

21:06:39 : 02/03

21:06:45 : 03/04

21:06:47 : 04/05

21:06:50 : 04/06

21:06:54 : 04/07

21:06:56 : 05/08

21:06:58 : 06/09

21:06:59 : 07/10

21:07:01 : 07/11

21:07:04 : 07/12

21:07:05 : 08/13

21:07:08 : 08/14

21:07:10 : 08/15

21:07:31 : 08/16

21:07:31 : Test finished.

 

----------

Total: 8/16

Probability that you were guessing: 59.8%

 

-- signature --

b54eb2a632d09ae60dbb1c13774d4152ee32f110

Link to comment
Not sure where you see the problem with controlled tests but I am not aware of any other than funny smear campaigns. We do not need to establish anything here, DBTs are just another scientific tool which works in audio same as everywhere. That is, it works good. Or as good as you set it up.

Also not sure where is this stuff about proving things 'for audio' coming from. No one needs to do that cause there is nothing special about audio. And science and its metods work everywhere the same anyway. That's the beauty of it. Not many people see it but nobody will convince those.

Other than that thanks for the few nice and detailed posts.

No, this is the usual wrong assumption made by those who do casual DBTs - take a tool, DBT, designed for testing in one field & suggest that it is universally applicable to all fields, while ignoring the necessary procedures to ensure the validity of the results when trying to apply it in the field of perceptual testing..

 

Another hackneyed phrase "there is nothing special about audio." It's not audio that needs to be considered - it's perceptual testing. So do you want to rephrase that to read "there is nothing special about perceptual testing"?

Link to comment
because perceptual testing is a very difficult thing to get right & it is recognised that fatigue & loss of focus quickly set in when one is repeatedly listening to the same piece/snippet of audio through many repetitions:

 

With 2 seconds of it only ?

 

To avoid this, rest periods equal to a duration no less than the session length should be scheduled between successive sessions for each subject.

 

Shoot. That's 2 seconds of rest only !

Oh wait, can be longer.

 

Anyway, you know I am kidding of course. But someone is going to debunk this already because we should rather be listening to test signals of 2 seconds (I'll be fair - 4 seconds max).

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment

If the objectivists care so much about the HiFi manufacturers ripping off the public then why aren't they campaigning and hassling them to stop. No anecdotes from them about that are there.

 

No need......consumers are shutting them out all on their own. Hard enough now to find a B&M store.

Link to comment
Yeah, why am I bothering? Here's why:

 

I think Dennis, trithio, mayhem, etc., would dearly love to have some fairly scientific means of parsing audible differences in equipment. So would I. And so would you of course, for the sake of being able to more efficiently improve your designs, and your own and your customers' listening experiences. In other words, we're all wanting to get to the same place.

 

There are certainly disagreements about how to get there. My contributions in the thread are simply my thoughts to say I don't think DBT has got us there yet, so let's keep searching. Some will agree, some won't; some will seriously consider, some won't; and that's fine.

 

Agreed.

 

And there's no motivating force for the creation and operation of DBTs. Its A LOT of work.

Link to comment
Let me put up the Foobar ABX results & see if anyone spots the problem with them (no cheating - if you already know)?

 

No clue, as I have no experience with Foobar's ABX tool. But if I see it right that the times listed there are the start times (or end times) then I for sure wouldn't listen for various lengths of times. And if the "04/" indicates track (or test) #4 etc., then I also wouldn't repeat something under way. I might repeat all, or maybe a few afterwards (if allowed) but not in between. Or, repeat *all* for the same number.

 

Another issue would be that the 00 is listened for for a relatively long time. Next occasions do not, with the suggestion that I had picked by favorite part from track/test 00, while not giving 00 that same chance (listen to the few secs only).

 

But maybe I'm completely off and didn't understand.

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment
With 2 seconds of it only ?

 

 

 

Shoot. That's 2 seconds of rest only !

Oh wait, can be longer.

 

Anyway, you know I am kidding of course. But someone is going to debunk this already because we should rather be listening to test signals of 2 seconds (I'll be fair - 4 seconds max).

 

I think short auditory echoic memory is something like 5 secs? Whatever, it is short & is the premise that underlies A/B testing.

Link to comment
No, this is the usual wrong assumption made by those who do casual DBTs - take a tool, DBT, designed for testing in one field & suggest that it is universally applicable to all fields, while ignoring the necessary procedures to ensure the validity of the results when trying to apply it in the field of perceptual testing..

 

Another hackneyed phrase "there is nothing special about audio." It's not audio that needs to be considered - it's perceptual testing. So do you want to rephrase that to read "there is nothing special about perceptual testing"?

 

Nope. But thanks for the offer. As mentioned before it's just a tool. May be used right or wrong like any other. In audio same as everywhere else.

Link to comment
Not sure where you see the problem with controlled tests but I am not aware of any other than funny smear campaigns.

 

Then perhaps you should read Jud's just previous 'detailed post' (post #226). And not just the first line or two, or a quick scan, but the entire post, attempting comprehension.

 

 

where is this stuff about proving things 'for audio' coming from. No one needs to do that cause there is nothing special about audio. And science and its metods work everywhere the same anyway.

 

Yeah, science is so cool that it works the same on chemistry and cognition, gravity and electromagnetics, quarks and black holes. Ha !

 

You say you are so hard-core science based (and everyone else here are fools), maybe it's time for you to actually learn some real science (maybe some spelling too). I don't recall you posting anything even slightly technical or 'scientific' for all your supposed reverence for it.

 

A good place to start would be to be quiet and listen, even ask questions, but certainly not to act like an annoying know-it-all, when it is obvious you are not.

 

 

If not, there is always the H2 Audio forum, where your dreaded 'anecdotes' are banned outright. You would probably fit right in, and be very happy there :)

Link to comment
I think short auditory echoic memory is something like 5 secs? Whatever, it is short & is the premise that underlies A/B testing.

 

Oh ? But then it happens that I don't understood the merit of your first post about this in the first place. Well, can happen.

 

A grading session should not last for more than 20-30 min, although the self-paced character of trials advocated here will introduce uncontrolled variability among subjects. Experience suggests that no more than 10 to 15 trials per session should be scheduled to achieve the desired session length.

 

What I saw in this is 30 minutes max of a session and 15 trials max per such a session. To me this tells 2 minutes per "trial" which is a bit more than 5 seconds.

 

??

 

But if you were making fun of it all to begin with, I tend to understand that post.

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment
quarks and black holes. :(

 

Wait. This isn't funny. trithio talked about black wholes earlier on.

Now you've done it.

 

At least I do read. ;)

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment
Nope. But thanks for the offer. As mentioned before it's just a tool. May be used right or worng like any other.

No, you are minimising what I'm saying - it's a tool that should only be used for perceptual testing by research labs & those who have expertise in that area. What you are failing to acknowledge is that 99.9999% of the time it is used incorrectly in home based DBTs

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...