Jump to content
IGNORED

Golden ears vs super scopes


Recommended Posts

12 hours ago, SoundAndMotion said:

Hi Paul,

I'm reminded of our 2 week long PM exchange in Sept. 2018, which morphed from audibility testing to a discussion of HRTFs. 

Of course I agree with you, otherwise accepting money for my work would be fraud. I'll add that the complexity does not come from difficult to understand concepts and principles (actually not difficult), but from the sheer number of variables that influence the result, thereby producing a set of results. Obtaining a "complete set" is both prohibitively tedious, but thankfully mostly unnecessary. It is important though to use an appropriate value for your specific goals.
 

I didn't reread our entire exchange, but I found a couple of early quotes that may relate here. I said: "My first question is: what is your goal?" Tailoring an answer to your specific goals and background is much more efficient. On the other hand, with about 15 min. prep, I could prepare 1 or 2 hour-long lectures for a group with comparable backgrounds (colleages, grad, undergrad, 4th grade, kindergarten - I've done all of these), but I won't transcribe such a thing now. We could have a very efficient back-and-forth by phone or FaceTime chat, but...
A couple quick tips:
Finding *the* value for *the* threshold of audibility is meaningless. There is no single value; it depends on the type of stimulus (e.g. sine, noise, music, etc.), the method used (e.g. rising, falling, adjustment, 2AFC, etc.), absolute vs. differential, and must account for significant intra- and inter-subject variability. You wrote: "I believe that 'average' results often reported for others don't apply to me, so I tend to want to try to measure things for myself, with me as a test subject." Trying to test all variables, results in a combinatorial explosion.

 

One important misconception is that a sensory-perceptual threshold is binary: you hear it or you don't. It is statistical, not single-valued, and follows a sigmoidal probability density function, called a psychometric function:Thresholds.thumb.jpg.52aabac6f7fe1c6974a94006a8ee16ba.jpg

 

The cyan line is not correct. You can make it "look" correct if you choose a very large range for the stimulus level scale, but that would be deceptive. BTW, interesting things happen during the rising part of the curve. 

 

Let me know if I can help more...

 

EDIT: I misstated an opinion in a post yesterday that you address in your first paragraph above. There are things that can be considered inaudible, if you are careful with the assumptions made. It is safe to say that both those with normal hearing and those with freakishly, aberrantly good hearing cannot hear a tone at -16dBSPL (3µP).

 

Hi SAM,

 

Appreciate the continued conversation!  While some of the details have changed since our conversation a few years ago, generally my goal remains the same.

 

Simply stated, I'm after the most realistic audio reproduction possible. The number of variables is large, which is why I've  been taking the long, scenic route to get there, exploring, but also trying to narrow down the things I need to look into. I learn better through this sort of random walk, where I discover things that might or might not be relevant, often going off on tangents, but I'm a firm believer these will give me a better understanding in the long run, or at least provide a different perspective. 

 

Testing for audibility is, of course, a large part of this journey. And perhaps that's where you can help. To simplify the discussion, let me just propose a list of questions that I've formed over the last few years. I've seen some studies with attempted answers, but nothing I can accept as definitive. I've done a large number of ABX tests, but I'm always looking for better ways to test. Sorry to dump all of these on you, and if a voice or video chat works better, we can do that, as well :)

 

1. Are there any test protocols that you've feel work best for minor impairment detection/discrimination testing? ABX, Triangular, paired, MUSHRA, etc.? Any new ways that have been proposed recently? As an example, someone suggested playing the two DUTs in stereo (one left the other right ear). I didn't find this test extremely good at discrimination, but perhaps it might be good for detecting phase anomalies.

 

2. Are discrimination tests best conducted with very short snippets and fast switching, or long-term evaluation? (long term here meaning anything over 10 seconds. Sometimes weeks are recommended by our subjective brethren)

 

3. Is there an issue with fatigue in fast-switching tests that must be dealt with, and what is the 'threshold' when that sets in (# of repetitions, length of time)? 

 

4. Are qualitative tests better than the binary discrimination ones in detecting differences? 

 

5. Any other variables in such testing that we, audiophiles, either don't know about or ignore that can have a large influence on the outcome of the test? I know, there are a huge number of variables. Can you list say the top 5 that have the most influence on the outcome?

 

Looking forward to your answers!

Link to comment
6 hours ago, SoundAndMotion said:

Hi Paul,

For a handful of reasons, I'm not in the "right space" to thoroughly and thoughtfully answer your post in written form. I've sent a PM to see if a voice chat might work. Perhaps you or I can/will record it and create a nice post. I'm totally with you on taking the scenic path. That is the path that I and every colleague/friend of mine has taken. The shortcut is overrated: very often muddy and full of bugs, thorns, poop and poison ivy.

But just quickly: I asked "what is your goal?" and you say testing for audibility with the ultimate goal of the most realistic audio reproduction possible. I was unclear. What exactly, specifically is your goal? For example: I want to test my own absolute threshold for SINAD, that is created and varied using this specific method-..., is an entirely different goal from: I want to find the range of estimates of "normal" humans for the angular width of the soundstage using a specific musical sample. And my answers below would need to take that goal into account.

 

What exactly is my goal? Hard to define it beyond "the most realistic sound reproduction possible" :)

 

My path has been to eliminate the obvious errors in reproduction, first. As we already discussed, there's never going to be 100% perfect analog reproduction, so by necessity, we must deal with some lower threshold to know when an error can be safely ignored. Audible thresholds for THD, IMD, FR, jitter, TIM,  etc. I'm not so much interested in the absolutely lowest possible threshold. If there are a couple of people on this earth that can hear THD at 0.0000001% level, I'm not all that concerned as long as I can only hear it to 0.1% 😄

 

Quote

1 - It really, really depends... the idea @Jud used of separating samples to each ear is a trade-off (see @sandyk response above). I participated in his experiment, but found it to be 1 creative step forward, but two back.

 

Yes, Jud convinced me to add this as a testing method into DeltaWave. To tell you the truth, I didn't find that it was an improvement over standard A/B/X, but seemed like an interesting idea worth investigating. Maybe this test would be more sensitive to large phase differences.

 

Quote

2 - Depends on the variable you choose to measure. snippet time and switch times are not the same. Simple detection, like audibility of measured specs (THD, FR, etc.), probably requires echoic memory, so a short switch time. But it is easy to come up with variables/tests that would require longer listening times and greater immunity to long switch times.

 

Common wisdom here is that the echoic memory is short, so it is impossible to compare A to B if one no longer has the details of the sound in their short-term memory. So, does the long-term listening test actually produce usable results? Results that identify differences consistently? Anecdotally, through my own testing, I find shorter snippets and short switch time to be much more sensitive than long term comparison. But I also don't always find the differences consciously. I've passed a number of blind tests by forming an overall impression of a short, 10-second comparison between snippets rather than trying to consciously decide if they are different. I failed the same test trying to consciously make that same distinction. So, can a long-term test be more sensitive, while remaining accurate?

 

Quote

3 - There is a fatigue issue for all tests: avoid it! There is no threshold that I know of; it is equivalent to the question of how long can people pay attention. If a test is boring, people must fight to not "drift away". But similarly, if a task is pleasurable, people must fight to focus on the task and not just sit happily listening. I always require the subject announce they are ready for the next trial (allowing zero to several seconds or a minute of self-controlled break, to gather themselves). I force a few minute (5+) break every 15 minutes. I stop for at least a half day, after about 1-1.5 hours testing. My experiments are "lots of fun" for the first 1 or 2 dozen trials, but I usually need 100-300 trials, so it gets boring and I must query and push for alertness and attention.

 

That's great info! The other part I meant to ask about: does our memory and learning ability stop us from being able to distinguish A from B if repeated more than a number of times? In other words, does our mind fill in gaps and the details and smoothes out the differences by learning what to expect? If so, perhaps switching frequently between different A/B snippets would be a way to avoid that.

 

Quote

4 - "in detecting differences" No, because of your stated goal. If I can easily detect the difference, but my preference depends on the snippet of source material (stimulus), I may answer in an apparently random fashion. It is a confound. 

 

That makes sense, also.

 

Quote

5 - My top 4, for those forgotten, misunderstood or taken for granted...
    Instructions (depending on goal; super-important for me).
    Subject selection. 
    Method, including what statistics will be used: know before start.
    Stimulus selection.   
    Others are also important, but are often thought of: bias control, calibration of all measures, subject-comfort...

 

Excellent! These all make sense. While I'm doing a lot of testing on myself :) I also have been involved in a few internet blind-tests. It's always a challenge to set these up correctly and to make sure the results are meaningful and interpreted the right way. The worst part is the lack of any control over how the test itself is conducted.

 

 

Link to comment

SAM (@SoundAndMotion) and I spent a good hour chatting on Skype yesterday about testing, audio thresholds, and experiments. A number of my questions were answered, and I really enjoyed the conversation. In-person (even virtually) works so much better than exchanging forum posts! :) Looking forward to future discussions. Thank you, SAM!

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...