red or blue pill - Part II?

manisandher · January 10, 2020

Many of you will be familiar with the test that @mansr and I conducted almost two years ago:

Over a long period of listening, I became certain that I could hear consistent differences between various bit-identical playback means (streaming vs. local playback, various bit-identical settings in playback software, etc.), and was confident that I could demonstrate this. So, I invited Mans up to my place to help me test my hypothesis.

I scored 9/10 in the blind ABX (1% probability of achieving this result through guessing alone), proving my hypothesis correct... or so I thought.

Many objectivists here remained sceptical, some even suggesting that I had achieved the result through pure luck!

So...

If there were to be a retest, what would need to be done differently for those of you who remained sceptical to have more confidence in the results?

Mani.

pkane2001 · January 10, 2020

2 hours ago, manisandher said:

So...

If there were to be a retest, what would need to be done differently for those of you who remained sceptical to have more confidence in the results?

Mani.

Hi Mani,

Here are some thoughts:

1. Test something that's more of a common concern. A setting in Phasure XXX software is too obscure and not of much interest to anyone who doesn't use this software (I'd say the majority here).

2. Run more tests. You had a few low-scoring attempts before getting 9/10 (I'm aware of the reasons you gave, but an explanation of why a test failed is usually not acceptable in lieu of a positive result). Two consecutive 9/10 tests or 8/10 would be much more convincing. If you do run multiple tests, the results of all of those tests should be included in the calculation, good or bad.

3. Use better-vetted equipment to record digital and analog feeds -- I was confused by the provided files when trying to analyze them, there were some issues with one ADC and different issues with another, etc. It wasn't clear to me which recordings were accurate representation of what was tested (if any).

4. Document and review the test procedure with us before the test, not after.

5. Some photos/video recording of the session might be helpful in reviewing. The blinding procedure should be carefully considered and any possible tells eliminated (not accusing you of cheating, but there are often tells that have nothing to do with the actual audio, like a small click, or a slightly longer delay before playing, etc.)

I'll probably think of something else later

mansr · January 10, 2020

1 hour ago, pkane2001 said:

1. Test something that's more of a common concern. A setting in Phasure XXX software is too obscure and not of much interest to anyone who doesn't use this software (I'd say the majority here).

My main concern with this is that PeterSt is, to put it mildly, extremely coy about what this setting does. We have no idea whatsoever what is going on. Furthermore, I was initially told the comparison would be between playback of the same file from local vs network storage and/or Tidal. Only on arrival at Mani's house did I find out that he had changed his mind.

1 hour ago, pkane2001 said:

2. Run more tests. You had a few low-scoring attempts before getting 9/10 (I'm aware of the reasons you gave, but an explanation of why a test failed is usually not acceptable in lieu of a positive result). Two consecutive 9/10 tests or 8/10 would be much more convincing. If you do run multiple tests, the results of all of those tests should be included in the calculation, good or bad.

Very much this. Although both the music selection and the test protocol were altered, the overall impression is much weaker than some would make it out to be.

1 hour ago, pkane2001 said:

3. Use better-vetted equipment to record digital and analog feeds -- I was confused by the provided files when trying to analyze them, there were some issues with one ADC and different issues with another, etc. It wasn't clear to me which recordings were accurate representation of what was tested (if any).

The equipment was dreadful. I offered to bring my own audio interface and run it in pass-through mode between the DAC and amp. I was told this wouldn't work for vague reasons. Instead, we were limited to recording some samples afterwards using an ADC that turned out to have some issues.

Then there's the general environment of it all. The playback computer and DAC were sitting in a cluttered basement with god knows what electrical interference. The DAC itself was a naked PCB nailed to a wooden plank. This was connected to the monoblock amps (on the ground floor) with 10 metres or so of coax (RG-59 or similar).

For any future tests I'd insist on containing all the equipment in a somewhat controlled setting (not necessarily a shielded room). Then I'd record the output of the DAC and/or the amps during the listening test rather than separately.

1 hour ago, pkane2001 said:

4. Document and review the test procedure with us before the test, not after.

+1

1 hour ago, pkane2001 said:

5. Some photos/video recording of the session might be helpful in reviewing. The blinding procedure should be carefully considered and any possible tells eliminated (not accusing you of cheating, but there are often tells that have nothing to do with the actual audio, like a small click, or a slightly longer delay before playing, etc.)

Mani had the opportunity to cheat, but I really don't think he did. Still, I'd remove that temptation in any rerun.

As for tells, the software had a huge delay (more than 10 seconds) with one of the settings. Since I was unprepared for this, the best I could do was try to mask it by waiting a randomish time before pressing play on each iteration. The timestamps of the recordings didn't show any pattern obvious at a glance, but these things can be rather insidious. I'd prefer to have the entire thing automated.

All that said, at this point, this is little more than a thought experiment. The "believers" will never be swayed by any test, and I have a hard time imagining any of them going through the rigours it would take to produce compelling evidence in their favour.

Ralf11 · January 10, 2020

There needs to be a Methods section that is clearly separated from the Results/Discussion sections

Archimago · January 10, 2020

18 minutes ago, mansr said:

...

The equipment was dreadful. I offered to bring my own audio interface and run it in pass-through mode between the DAC and amp. I was told this wouldn't work for vague reasons. Instead, we were limited to recording some samples afterwards using an ADC that turned out to have some issues.

Then there's the general environment of it all. The playback computer and DAC were sitting in a cluttered basement with god knows what electrical interference. The DAC itself was a naked PCB nailed to a wooden plank. This was connected to the monoblock amps (on the ground floor) with 10 metres or so of coax (RG-59 or similar).

For any future tests I'd insist on containing all the equipment in a somewhat controlled setting (not necessarily a shielded room). Then I'd record the output of the DAC and/or the amps during the listening test rather than separately.

...

Wowzers. Pictures are definitely mandatory for the reader to understand what the heck is going on over there!

mansr · January 10, 2020

4 minutes ago, Archimago said:

Wowzers. Pictures are definitely mandatory for the reader to understand what the heck is going on over there!

Being there wasn't sufficient. Pictures couldn't convey the half of it.

STC · January 10, 2020

5 hours ago, manisandher said:

If there were to be a retest, what would need to be done differently for those of you who remained sceptical to have more confidence in the results?

1) Randomly pick 10 tracks from different albums. Good quality recording but hopefully not something you are familiar or previously used by you.

2) Play each track only once at a randomly chosen setting. Do not play the same track again with other settings.

I don’t believe whatever difference you hear would not be identifiable when heard in isolation without comparisons.

Ralf11 · January 11, 2020

rather than using a tricky Peter DAC, how about trying an RME next time?

Others of interest would be an el cheapo DAC (Topping) and one made by Bruno Putzeys...

That way you'd be able to tell if someone was able to tell the files apart across a spectrum of DACs

DAC vs. DAC would be a good test as well

fas42 · January 11, 2020

38 minutes ago, Archimago said:

The ability to replicate - whether it proves or disproves ultimately - is important when doing any kind of scientific "study" which I presume was the intent from the beginning???

Obviously there were enough major concerns and uncontrolled variables that a more thorough investigation seems fair, right?

Yes, replication will indeed need to play a part. Uncontrolled variables have to be completely eradicated, but this is the hard part ... I did a post some time ago. derived from Bob Katz?, that a truly useful ABX requires a great deal of effort to set up.

fas42 · January 11, 2020

38 minutes ago, pkane2001 said:

Intense desire to disprove is what scientific review process is based on, Frank. This is what allows for a proper review of new, unproven claims. Imagine the process where everyone is always willing to accept any new claim without questioning. But, then again, I think you might prefer it that way

So you can say, hand over heart, that the world of scientific research has never suffered from "tribal behaviour" ... 😜 ?

acg · January 11, 2020

25 minutes ago, Ralf11 said:

rather than using a tricky Peter DAC, how about trying an RME next time?

Others of interest would be an el cheapo DAC (Topping) and one made by Bruno Putzeys...

That way you'd be able to tell if someone was able to tell the files apart across a spectrum of DACs

DAC vs. DAC would be a good test as well

Peter's dac has not been used previously...what makes you think that it has?

fas42 · January 11, 2020

58 minutes ago, STC said:

1) Randomly pick 10 tracks from different albums. Good quality recording but hopefully not something you are familiar or previously used by you.

Bad move. You should know the recording intimately, every tiny crevice of it. Which means that you can trigger on the slightest variation of some obscure aspect of it.

58 minutes ago, STC said:

2) Play each track only once at a randomly chosen setting. Do not play the same track again with other settings.

I don’t believe whatever difference you hear would not be identifiable when heard in isolation without comparisons.

The listening brain will work hard to 'synchronise' slightly different versions of a recording - it knows that it's "the same track". So, you need to do things which thwart that innate listening behaviour ...

STC · January 11, 2020

12 minutes ago, fas42 said:

Bad move. You should know the recording intimately, every tiny crevice of it. Which means that you can trigger on the slightest variation of some obscure aspect of it.

The listening brain will work hard to 'synchronise' slightly different versions of a recording - it knows that it's "the same track". So, you need to do things which thwart that innate listening behaviour ...

Either way. One to find the minimum standard for high fidelity and the other is prove a difference exist. I am just a small person after high fidelity music reproduction. Ignore me.

fas42 · January 11, 2020

A system when good enough "always works" - the "zone" that you're in personally is irrelevant - someone playing the real piano really well in your lounge never sounds 'wrong', no matter what mood you're in.

But to do a serious test, well, you need to be at complete ease with the situation.

fas42 · January 11, 2020

2 minutes ago, STC said:

Either way. One to find the minimum standard for high fidelity and the other is prove a difference exist. I am just a small person after high fidelity music reproduction. Ignore me.

There, there ... if you want a hug, just ask ...

STC · January 11, 2020

5 minutes ago, fas42 said:

There, there ... if you want a hug, just ask ...

I would rather run straight into the bush 🔥.

manisandher · January 11, 2020

Hi Paul,

14 hours ago, pkane2001 said:

1. Test something that's more of a common concern. A setting in Phasure XXX software is too obscure and not of much interest to anyone who doesn't use this software (I'd say the majority here).

Yes, any retest would use streaming vs. local playback using Roon.

14 hours ago, pkane2001 said:

2. Run more tests. You had a few low-scoring attempts before getting 9/10 (I'm aware of the reasons you gave, but an explanation of why a test failed is usually not acceptable in lieu of a positive result). Two consecutive 9/10 tests or 8/10 would be much more convincing. If you do run multiple tests, the results of all of those tests should be included in the calculation, good or bad.

I reckon a 10-run ABX would take around 10-12 minutes. I'd go for 3 of these in total, with a small break between each. This would give a total sample size of 30. Would you agree that a score of >25/30 could be taken as a positive?

14 hours ago, pkane2001 said:

3. Use better-vetted equipment to record digital and analog feeds -- I was confused by the provided files when trying to analyze them, there were some issues with one ADC and different issues with another, etc. It wasn't clear to me which recordings were accurate representation of what was tested (if any).

There was no issue with the digital feed - it proved that the DAC received bit-identical signals during the ABX, where I scored 9/10.

I used a Tascam DA-3000 ADC for the initial analysis of the analogue output of the DAC. Here are its specs:

How much better would the ADC need to be to be acceptable?

14 hours ago, pkane2001 said:

4. Document and review the test procedure with us before the test, not after.

Absolutely, and hence this thread.

14 hours ago, pkane2001 said:

5. Some photos/video recording of the session might be helpful in reviewing.

I'd video the entire test in both the listening and control rooms. (Capturing the sound with a decent microphone in the listening room might prove useful during analysis too.)

Mani.

manisandher · January 11, 2020

13 hours ago, mansr said:

... I was initially told the comparison would be between playback of the same file from local vs network storage and/or Tidal. Only on arrival at Mani's house did I find out that he had changed his mind.

You don't half talk bollocks sometimes.

Over 3 weeks before you came up:

On 3/2/2018 at 11:48 AM, mansr said:

The hypothesis is that identical files played from different storage media sound different. Presumably, the 5-10 seconds is the time it takes to stop the software player and start playing a different file.

On 3/2/2018 at 1:06 PM, manisandher said:

That's a specific case of the more general hypothesis: a file played* back bit-identically can sound different.

Specific cases then include:

- different storage media

- different digital cables (spdif, USB, etc)

- different software player configurations (buffers, etc)

I'm considering which of these would be best for the ABX, and am leaning towards different software player configs.

And then, the day before you came up, I sent you this PM:

"Hi Mans,

This is the procedure I'd like to use tomorrow:

1. take a quick listen together

- I'd like to demonstrate a few things to you and get your initial thoughts

2. conduct the A/B/X

- I've chosen the track and the bit-identical changes we'll use in the playback software

- you'll be sitting in my office, controlling playback from there, and I'll be sitting in the listening room

- we'll have the Tascam set to auto-record, sitting in the basement next to the audio PC and DAC, capturing the digital output of the audio PC in real time

3. ensure that the digital captures are identical

- I have Audacity and MusicScope here

- if you have other software you'd rather use feel free to bring your laptop along with you

4. capture analogue outputs (test track plus tones)

a. directly from the DAC, using the Tascam

b. from the speakers, using my Earthworks microphone and portable Korg recorder

5. analyse analogue outputs

- we could either attempt to do this here or you could take the files away with you

Obviously, there's no need to go on to step 3 unless you're convinced there really are audible differences (either because you hear them too, or because I manage to demonstrate that this is the case in the A/B/X).

It'd be great if we can get through all of this tomorrow, but really only need to get through 1 and 2 as a must.

See you tomorrow!

Cheers,

Mani."

Your reply:

"Sounds like a plan."

Mani.

manisandher · January 11, 2020

12 hours ago, Archimago said:

Obviously there were enough major concerns and uncontrolled variables that a more thorough investigation seems fair, right?

I find it useful to consider this in two parts:

1. the ABX listening test itself

The specific DAC, amps and speakers used for the ABX, and where they're situated, etc. are all totally irrelevant. It's a red herring on Mans's part. The only thing that matters is showing that the DAC received bit-identical signals when replaying A and B. The digital feed to the DAC was captured throughout the ABX, and it was proven that the DAC had indeed received bit-identical signals throughout. And yet I heard consistent differences between A and B, as shown by my 9/10 score.

2. the analysis of the analogue output of the DAC

This proved difficult. I've posted the specs of the ADC used in the original analysis, which show nothing untoward. But on analysing the analogue captures, the ADC proved unsatisfactory.

Mani.

pkane2001 · January 11, 2020

3 hours ago, manisandher said:

I reckon a 10-run ABX would take around 10-12 minutes. I'd go for 3 of these in total, with a small break between each. This would give a total sample size of 30. Would you agree that a score of >25/30 could be taken as a positive?

Yes, 25/30 is an excellent result.

3 hours ago, manisandher said:

There was no issue with the digital feed - it proved that the DAC received bit-identical signals during the ABX, where I scored 9/10.

I used a Tascam DA-3000 ADC for the initial analysis of the analogue output of the DAC. Here are its specs:

As I recall, the digital feed recordings had issues with not synchronizing right away. Some number of samples were different at the beginning. Since these were the main record of what had actually transpired, I would have to assume that the two captures were not bit perfect, at least at the start of each track. Again, an explanation of what went wrong isn't really a substitute for the real capture being bit-perfect

Tascam THD+N of 0.003% can be a lot better. I have a simple pro Apogee interface (paid about $350 for it used) that does about 10x better (0.0003%). I think Benchmark or RME equipment was mentioned before in this thread, and these do 8-10dB better than my little Apogee. The point is to have as little distortion added to the recording as possible by the test equipment. Things like jitter and phase distortions need to be looked at, as well, since these can create audible differences when large enough. A few simple loop-back recordings of DAC/ADC playing music for 60 seconds would give us enough to analyze to see how the equipment behaves before the test.

With DeltaWave software, I've compared now a large number of DAC/ADC loop-back recordings from various sources. The better equipment produces a null of -90dB or better. The recordings your sessions produced generated, at best, a -50dB null. That's a really poor conversion quality compared to some of the better equipment out there.

semente · January 11, 2020

17 hours ago, mansr said:

I offered to bring my own audio interface and run it in pass-through mode between the DAC and amp. I was told this wouldn't work for vague reasons.

I think that Mani was right to refuse this.

Arpiben · January 11, 2020

Hi @manisandher,

In principle, ABX testing and analogue recording (ADC) should be performed simultaneously.

Measurement plane can be located at loudspeakers input, DAC's output (via passive splitter) or both of them.

Obviously the recording items (ADC / cables/ etc) will interact.

My point is to compare data under same conditions.

The fact that there is an audibility challenge doesn't help. Question is are you eager to ABX under 'recording' test bench?

If not, IMO we are just wasting time and not progressing. My priority is to eventually find different signatures among data nothing else.

pkane2001 · January 11, 2020

3 hours ago, manisandher said:

I find it useful to consider this in two parts:

1. the ABX listening test itself

The specific DAC, amps and speakers used for the ABX, and where they're situated, etc. are all totally irrelevant. It's a red herring on Mans's part. The only thing that matters is showing that the DAC received bit-identical signals when replaying A and B. The digital feed to the DAC was captured throughout the ABX, and it was proven that the DAC had indeed received bit-identical signals throughout. And yet I heard consistent differences between A and B, as shown by my 9/10 score.

2. the analysis of the analogue output of the DAC

This proved difficult. I've posted the specs of the ADC used in the original analysis, which show nothing untoward. But on analysing the analogue captures, the ADC proved unsatisfactory.

Mani.

Mani, there are some things that need to be considered in (1) as well. Bit-perfect transmission is not sufficient if the interface (SPDIF) also carries the clock signal used to drive the DAC output. Large amount of jitter on the interface can cause timing errors. For this reason, using a USB or another asynchronous connection to the DAC would be a much better choice, IMO.

semente · January 11, 2020

4 hours ago, manisandher said:

I reckon a 10-run ABX would take around 10-12 minutes. I'd go for 3 of these in total, with a small break between each.

Did you get torture-endurance training in the special forces?

manisandher · January 11, 2020

8 minutes ago, Arpiben said:

In principle, ABX testing and analogue recording (ADC) should be performed simultaneously.

Hi,

Yes, I can see why this would beneficial. But during the previous ABX, I only had one capturing device to hand. We could have used it to either capture 1) the digital input into the DAC or 2) the analogue output from the DAC, in real-time during the ABX. It was absolutely critical to determine that the digital input remained bit-identical for both A and B (else the ABX would have been invalid), so we prioritised 1) over 2).

In a retest, I think it would be great if we could use two capturing devices, one for digital and one for analogue captures in real-time during the ABX. Even better if I could use a single clock to synchronise them.

So, I need two ADCs with great specs and wordclock inputs/outputs...

Mani.

red or blue pill - Part II?

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Create an account or sign in to comment

Create an account

Sign in