Jump to content

Recommended Posts

On 6/22/2020 at 11:19 AM, cat6man said:

I personally define objective proof in audio as requiring measurements that confirm a hypothesis.  I do not include blind or double blind testing as 'objective' (your mileage may vary) so let's continue on my version of a possibly 'objective' answer. any engineers or scientists here?

As a tenured professor at a university medical center, I and my colleagues rely on DBT as a valid research tool.  Well designed, properly powered double blinded clinical trials have been the basis for many major scientific achievements that have saved countless lives.  If the p value on well chosen, appropriately tests of the delta between a placebo or control cohort and the active study cohort is 0.02, there was a 98% probability that the observed effect occurred because of the intervention and a 2% probability it happened purely by random chance. I don’t understand how you can dismiss this as not being objective.

 

The problem with most amateur DBT is how it’s done, not the principle behind it.

Share this post


Link to post
Share on other sites
5 hours ago, bluesman said:

 

The problem with most amateur DBT is how it’s done, not the principle behind it.

 

DBT proponents seem to see this as feature and not a bug.  Their desired outcome is guaranteed and made bulletproof (so they think) because properly-conducted DBT has been demonstrated to be a valid research tool.  Valid questions about how the test was conducted are ignored.  That's why I've said that those who press for amateur DBTs are actually the ones behaving like snake oil salesmen.  


Digital:  Innuos Zenith Mk3 > Shunyata Sigma USB > Chord Hugo M-Scaler > Wireworld Gold Startlight > OPTO DX > Shunyata Alpha S/PDIF > Chord Hugo TT2 

Amp & Speakers:  Spectral DMA-150mk2 > Aerial 10T

Foundation: Stillpoints Ultra, Shunyata Denali power conditioner, Shunyata Alpha power cords, Shunyata Alpha interconnect, Shunyata Sigma Ethernet, MIT Matrix HD60 speaker cables, ASC isothermal tube traps

Share this post


Link to post
Share on other sites
2 hours ago, kennyb123 said:

Valid questions about how the test was conducted

 

Links please...

Share this post


Link to post
Share on other sites
27 minutes ago, bluesman said:

Maybe they just don't know enough about it to do it right (or that it's not a valid methodology for answering the question being asked). You're talking about a fairly sophisticated methodology that requires specialized knowledge.  When you encounter a specious application of it, you'd be of much more help identifying the flaws in it than simply dismissing it. 

 

Name calling reduces you to the level of those you're criticizing.  As they say at the airport, if you see something.......say something.  But you're only part of the solution if what you say is meaningful and constructive.  Otherwise, you're part of the problem.


False accusations reduce one to an even lower level.  Are you accusing me of not pointing out the flaws?  
 

I agree that name calling is bad but that’s different than employing a metaphor to illustrate the hypocrisy in holding certain positions.  The snake oil salesman pitching pseudo-scientific arguments in favor of a product that he knows isn’t going to produce positive results is very much akin to putting forth a blind test methodology one knows won’t produce positive results.


Digital:  Innuos Zenith Mk3 > Shunyata Sigma USB > Chord Hugo M-Scaler > Wireworld Gold Startlight > OPTO DX > Shunyata Alpha S/PDIF > Chord Hugo TT2 

Amp & Speakers:  Spectral DMA-150mk2 > Aerial 10T

Foundation: Stillpoints Ultra, Shunyata Denali power conditioner, Shunyata Alpha power cords, Shunyata Alpha interconnect, Shunyata Sigma Ethernet, MIT Matrix HD60 speaker cables, ASC isothermal tube traps

Share this post


Link to post
Share on other sites
11 hours ago, bluesman said:

As a tenured professor at a university medical center, I and my colleagues rely on DBT as a valid research tool.  Well designed, properly powered double blinded clinical trials have been the basis for many major scientific achievements that have saved countless lives.  If the p value on well chosen, appropriately tests of the delta between a placebo or control cohort and the active study cohort is 0.02, there was a 98% probability that the observed effect occurred because of the intervention and a 2% probability it happened purely by random chance. I don’t understand how you can dismiss this as not being objective.

 

The problem with most amateur DBT is how it’s done, not the principle behind it.

 

of course DBT is a valid and immensely useful scientific method, and

i agree that most amateur DBT are relatively useless.

 

however in most(?) cases at the medical center, i'd guess that you have some (what i'll call) objective output measure such as blood pressure, survival rate, visual acuity, heart ejection fraction........i.e. things you can measure. 

 

compare that with 'the soundstage is wider' or 'it sounds more real' or 'i hear more breath on the vocals'

 

i never meant to imply that DBT is not valid, but that with objective/measurable criteria (versus subjective opinion, even if in large controlled numbers) it is much clearer what is going on.

i assume DBT on psychological research methodology gets a bit messier?

Share this post


Link to post
Share on other sites
2 hours ago, cat6man said:

of course DBT is a valid and immensely useful scientific method, and

i agree that most amateur DBT are relatively useless.

 

however in most(?) cases at the medical center, i'd guess that you have some (what i'll call) objective output measure such as blood pressure, survival rate, visual acuity, heart ejection fraction........i.e. things you can measure. 

 

Yes, I agree and it is a point that seems often overlooked IMO.

 

The best outcome measures pass what I have referred to previously as objective 'test of test' parameters.

 

Also, those tests are ideally out of the subjects direct control or influence or even knowledge. Ideally subjects don't have a clue what is being tested like a blood measure they have never even heard of, let alone something that tests their skill or potentially their reputation. Things however cannot be always ideal.

 

Without test of test parameters however, I won't go into detail here but, one is basically using an uncalibrated tool. Audio perceptual testing is interesting in this context, adding a layer of increased difficulty as you have to develop methodology that in essence uses a perceptual test to test perception. There is no gold standard to which we can 'calibrate' against such as using a blood test serving as a surrogate marker of a particular cancer but comparing results against gold standard biopsy results.

 

Even with well done medical research confounders often lead to less than conclusive results and this often has  nothing to do with blinding. Just look at the number of peer reviewed publications on the role of statins in heart disease, or the place of coronary CT calcium scores, what you might expect should be fairly straight forward is anything but.

 

Eliminating bias is not the issue , it is the "how" you do it. Methodology is key. Blinding does not automatically make a methodology valid and indeed you cannot guarantee it does not actually increase false negatives through other means such as interdependent variables . In order to know this you would need to know the false negative rates and this poses difficulty without a gold standard reference for calibration.

 

 

 


Sound Minds Mind Sound

 

 

Share this post


Link to post
Share on other sites
11 hours ago, kennyb123 said:

 

DBT proponents seem to see this as feature and not a bug.  Their desired outcome is guaranteed and made bulletproof (so they think) because properly-conducted DBT has been demonstrated to be a valid research tool.  Valid questions about how the test was conducted are ignored.  That's why I've said that those who press for amateur DBTs are actually the ones behaving like snake oil salesmen.  

 

I agree it seems to be assumed by some as fairly simple and infallible procedure, or at least highly validated, and can even be used by amateurs as a weapon to push a particular view. It is just a test tool and as alluded to above, like all tools, one needs to know how good the tool is at telling you what you think it is telling you - in the context and setting of the whole test methodology .........I think we are heading OT.....


Sound Minds Mind Sound

 

 

Share this post


Link to post
Share on other sites
3 hours ago, Audiophile Neuroscience said:

 

I agree it seems to be assumed by some as fairly simple and infallible procedure, or at least highly validated, and can even be used by amateurs as a weapon to push a particular view. It is just a test tool and as alluded to above, like all tools, one needs to know how good the tool is at telling you what you think it is telling you - in the context and setting of the whole test methodology .........I think we are heading OT.....

 

All things being equal, DBTs are better than sighted testing, unless you are testing for confirmation bias or a placebo effect. Whether DBTs are infallible or not is not a question, at least in my mind. Of course they can be done wrong, and the results may not be accurate. It's important to understand the limitations and follow a proper test design.

 

But, it's an audio malpractice, IMHO, to deny that DBTs can, with proper controls, produce a much more accurate, repeatable and reproducible result than sighted testing. And such controls are not voodoo or a mystery. They are studied, reported and well documented.

 

A sighted test cannot be done right: there's no way to eliminate or separate the effect of biases related to other senses and preconceived ideas and knowledge. A DBT can be done wrong, but when properly planned and conducted, it is a scientific instrument that can be measured and calibrated.

 

Share this post


Link to post
Share on other sites
8 hours ago, cat6man said:

However in most(?) cases at the medical center, i'd guess that you have some (what i'll call) objective output measure such as blood pressure, survival rate, visual acuity, heart ejection fraction........i.e. things you can measure. 

 

compare that with 'the soundstage is wider' or 'it sounds more real' or 'i hear more breath on the vocals'

The critical factor in choosing a test is whether it can answer the question being asked.  As you point out, the question asked by audiophiles is often unanswerable and therefore untestable as asked.  It’s not possible to determine if a given cable improves SQ with blinded testing - for a start, one audiophile’s smooth is another’s muddy. But it’s possible to determine if there’s a consistent difference between two that subjects can identify with 95+% consistency in enough well done blinded listening trials to be statistically significant.  Correctly identifying one alternative from another with 95% certainty in well designed and well conducted DBT is objective.  And this can be useful info.
 

The “best” questions for any study are objectively measurable, as you point out.  But many healthcare decisions are made on the basis of parameters that can be as vague and nebulous as “how real it sounds”, e.g.  sense of well-being, intensity of pain, Quality of Life Years, patient satisfaction, and likelihood of recommending. Picking and using good tests to get valid, repeatable results is dependent on what question is asked, how it’s asked, and in what form an answer is sought.  And measurement systems are less reliable than we believe.  Even “simple” blood pressure measurement is not simple, e.g. 3 consecutive readings 5 minutes apart can vary widely.

 

Other factors ignored by those who oversimplify DBT include consistency among multiple raters and consistency of the same rater over time.  If a subject correctly identifies A or B in 95% of presentations in one trial session but averages 53% over 5 identical sessions, that 95% session is not representative - it’s random chance.  Controls are also needed.  Present the same alternative to raters multiple times and see if they think they perceive differences that aren’t there.

Share this post


Link to post
Share on other sites
32 minutes ago, bluesman said:

Present the same alternative to raters multiple times and see if they think they perceive differences that aren’t there.

To bring this back OT, well done DBT could help determine if network platforms and components make discernible difference in SQ.

Share this post


Link to post
Share on other sites
14 minutes ago, bluesman said:

To bring this back OT, well done DBT could help determine if network platforms and components make discernible difference in SQ.

I suppose we would also have to include in that statement, well done DBT could help determine if network platforms and components make discernible difference in SQ, to those tested.

 

There will always be people who, either real or not, believe they are outliers or different from the test group and the results don't ring true for them. 


Founder of Audiophile Style

Announcing Polestar | Quick Community Reviews and Ratings

Share this post


Link to post
Share on other sites
22 minutes ago, The Computer Audiophile said:

There will always be people who, either real or not, believe they are outliers or different from the test group and the results don't ring true for them. 

And there really are a few clinging to each end of the bell shaped curve.  But most of us are like most of us. Only in Lake Wobegon are the children all above average.

Share this post


Link to post
Share on other sites
4 minutes ago, bluesman said:

And there really are a few clinging to each end of the bell shaped curve.  But most of us are like most of us. Only in Lake Wobegon are the children all above average.

Agree. Plus, I live close to Lake Wobegon :~)


Founder of Audiophile Style

Announcing Polestar | Quick Community Reviews and Ratings

Share this post


Link to post
Share on other sites
5 minutes ago, The Computer Audiophile said:

Agree. Plus, I live close to Lake Wobegon :~)

You’re also not that far from Frostbite Falls.  You may even have met some of our audiophile colleagues who studied statistics at Wassamatta U.

Share this post


Link to post
Share on other sites
2 hours ago, ray-dude said:

I bring this up on this thread, because SFPs are falling into that same pattern for me.  This is a class of component where I'm finding I have to ignore my initial experience and preference, and gauge my mood after extended listening.  Do I want to listen to music longer?  Am I able to focus on work more or less when listening to music with a particular SFP pair?

 

Like my user experience example above, once I identify a long term preference, I then look for what short term identifyable characteristic is a "tell" for what that long term experience will be, then look for components that have more or less of that tell.  I'm starting to find that "tell", but I still have to lean into extended listening.

 

Very well said.  For me, spending more time listening to a SFPs has told me more.  Listening to music with varying levels of recording quality reveals what might be missed with a quick shootout.  Also the Pepsi and Coke example is very apt as something that might have sounded great after a shootout could very well drive you mad after extended listening.  


Digital:  Innuos Zenith Mk3 > Shunyata Sigma USB > Chord Hugo M-Scaler > Wireworld Gold Startlight > OPTO DX > Shunyata Alpha S/PDIF > Chord Hugo TT2 

Amp & Speakers:  Spectral DMA-150mk2 > Aerial 10T

Foundation: Stillpoints Ultra, Shunyata Denali power conditioner, Shunyata Alpha power cords, Shunyata Alpha interconnect, Shunyata Sigma Ethernet, MIT Matrix HD60 speaker cables, ASC isothermal tube traps

Share this post


Link to post
Share on other sites
1 hour ago, cat6man said:

So let me put down a friendly challenge.  In the interest of being 'objective' and not 'subjective' in this sub-forum, how about a sub-sub-group interested in getting to the bottom of this technically and not just acting like my old 'arrogant PhD' colleagues who already knew all the answers (but had oversimplified the situation and therefore hadn't formulated the problem accurately)?

 

This is the fun stuff!!  If PhD's are allowed in, count me in!

 

(FYI, I put my hypothesis out there in part 1 of my Extreme review...reference voltage, ground plane, and reference timing are the father/son/holy ghost of digital audio, I think, and everything always seems to come back to those fundamentals)

Share this post


Link to post
Share on other sites
5 hours ago, ray-dude said:

Like my user experience example above, once I identify a long term preference, I then look for what short term identifyable characteristic is a "tell" for what that long term experience will be, then look for components that have more or less of that tell.  I'm starting to find that "tell", but I still have to lean into extended listening

 

Would this include a setup with two SFP+ modules were someone other than the listener could turn off a tranceiver? That is you could have it in rack for an extended period of time? Say 3-6 months?

Share this post


Link to post
Share on other sites
12 hours ago, bluesman said:

The critical factor in choosing a test is whether it can answer the question being asked.  As you point out, the question asked by audiophiles is often unanswerable and therefore untestable as asked.  It’s not possible to determine if a given cable improves SQ with blinded testing - for a start, one audiophile’s smooth is another’s muddy. But it’s possible to determine if there’s a consistent difference between two that subjects can identify with 95+% consistency in enough well done blinded listening trials to be statistically significant.  Correctly identifying one alternative from another with 95% certainty in well designed and well conducted DBT is objective.  And this can be useful info.
 

The “best” questions for any study are objectively measurable, as you point out.  But many healthcare decisions are made on the basis of parameters that can be as vague and nebulous as “how real it sounds”, e.g.  sense of well-being, intensity of pain, Quality of Life Years, patient satisfaction, and likelihood of recommending. Picking and using good tests to get valid, repeatable results is dependent on what question is asked, how it’s asked, and in what form an answer is sought.  And measurement systems are less reliable than we believe.  Even “simple” blood pressure measurement is not simple, e.g. 3 consecutive readings 5 minutes apart can vary widely.

 

Other factors ignored by those who oversimplify DBT include consistency among multiple raters and consistency of the same rater over time.  If a subject correctly identifies A or B in 95% of presentations in one trial session but averages 53% over 5 identical sessions, that 95% session is not representative - it’s random chance.  Controls are also needed.  Present the same alternative to raters multiple times and see if they think they perceive differences that aren’t there.

 

I mostly agree but a couple of comments.

 

In the absence of objective direct outcome measures, correctly identifying a perceptual difference to a statistically significant degree is indeed useful and objective information. As you point out p=.05 is the 'usual' figure nominated but some would like a bit better than this depending on the situation. @manisandher in the Red Pill/ Blue Pill scored p=0.1, a 99% consistency ie probability that his perception was not a product of random chance or guesswork. But, without direct objective measures the problems of inductive reasoning are amplified - there is no inductive law to prove a hypothesis. Hence, that thread was very long and hotly debated - basically the result was challenged, by some, based on supposed methodological issues. So while "well designed and well conducted DBT" is prerequisite, achieving same is not always, and I would suggest often not, without challenge. A bit like audio, its 'all in the implementation'.

 

You touched on intra-observer false positives. These are important but so are intra-observer false negatives. IMO false negatives in general are really more the issue in the context of blind listening tests. It has to do with the whole "blind test it or it didn't happen" claim. If a perception disappears on a blind listening test, is it due to the perception never existing (beyond biased imagination) or is it that the test has failed in some way ie it is a false negative. It's a rhetorical question as seemingly most have preconceived unshakable positions.

 

Whatever the case, if you don't know the false positives or false negative rates in the first place (together with other test of test parameters) of the test procedure you are working in the dark. The blind leading the blinded. It is analogous to using an uncalibrated tool. At the very least one needs to accept certain assumptions of validity about the test procedure. I think there is more light shed on the situation if the subject passes the test with flying colors, but in either case it comes down to acceptance of the methodology.

 

Not directed at you but asking the question how do we know what we know is not just philosophical musing but strikes at scientific method. Whether we are assessing SFP modules or considering the credibility of reviewers on this very forum site, there will always be claims that "if it wasn't blind tested it didn't happen". In the absence of proof one way or another I remain open but skeptical about reported observations.


Sound Minds Mind Sound

 

 

Share this post


Link to post
Share on other sites
7 hours ago, cat6man said:

 

So let me put down a friendly challenge.  In the interest of being 'objective' and not 'subjective' in this sub-forum, how about a sub-sub-group interested in getting to the bottom of this technically and not just acting like my old 'arrogant PhD' colleagues who already knew all the answers (but had oversimplified the situation and therefore hadn't formulated the problem accurately)?

 

This will be my last shot at trying to see if anyone else is interested in approaching this in a similar manner to my current way of thinking.  I'm not interested in debating  philosophy of testing and will not reply to such.  Anyone want to try to figure out what is going on here?

 

 

Go for it ... 🙂.

 

The overall answer for "what is going on" is that electrical noise from a variety of sources internal and external to the rig impacts the analogue areas of the replay chain; just enough to be audible - this was true 3 decades ago, and is just as true, still, right now. Doesn't matter that the music player is "right over there, way, way from the sensitive stuff!!" ... nasty stuff gets around with the greatest of ease, and your challenge, should you choose to accept it 😝, is to track down and nail every last one of these pathways for the SQ to be degraded by interference mechanisms.

 

The precise, technical explanation for what's going on in a particular setup would be handy to know - but ultimately far less important than knocking the relevant interference pathways on the head ... 😉.


Frank

 

http://artofaudioconjuring.blogspot.com/

 

 

Over and out.

.

 

Share this post


Link to post
Share on other sites
39 minutes ago, Audiophile Neuroscience said:

As you point out p=.05 is the 'usual' figure nominated but some would like a bit better than this depending on the situation. @manisandher in the Red Pill/ Blue Pill scored p=0.1, a 99% consistency ie probability that his perception was not a product of random chance or guesswork.

A p value of 0.1 means a 90% probability, not 99%.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...