Jump to content

Julf's Blog

  • entries
    4
  • comments
    53
  • views
    2028

Listening test results


Julf

Here are the results of the listening test. Unfortunately I only got 6 submissions before I decided to stop collecting entries, so it is not really a statistically relevant sample, and I would not draw any profound conclusions from the results.

 

As has been pointed out, anyone handy with a program such as Audacity could have cheated by looking at the spectrum plots, and just looking at file sizes would already have provided a lot of guidance. Fortunately the results seem to indicate that the submissions were based on honest listening.

 

What is very interesting is that the full set of files has been downloaded 103 times, and the first track 147 times, but only 6 people actually submitted results. What can we conclude from that? That people are shy and are afraid to make a fool of themselves? Or that they really didn't hear a difference? No way to tell.

 

I would also like to state that when I put together the test, I had no strong position with regards to hi-res. I originally found my way to CA because of my interest in hi-res, and having gotten burned by HDtracks, I found the "Audiophile Downloads" and "Music Analysis - Objective & Subjective" forums extremely useful. I have used a fair bit of money to support HDtracks, eclassical, B&W Society of Sound, 2L, iTrax, Linn and others.

 

I can definitely hear a difference between some of the hi-res tracks and red book. Or at least I think I can. But I can't tell if that is from different mastering. Or it could even be all in my head.

 

A number of people have stated that the a capella material doesn't really allow for the benefit of hi-res to be heard, as it doesn't have any significant high frequency content, and that is definitely a valid point. On the other hand, at least one person stated that unaccompanied, natural human voice is the best test material.

 

I also want to make clear that this is a test of listening material as it is mostly presented to the buying public - it starts out being recorded at 96/24, and it then gets downconverted to whatever distribution format needed. It is not a comparison of recording formats. As the physical listening format is 96/24 on all of the tracks, it also doesn't allow a DAC optimized for lower resolutions to shine - so it is also not a comparison of how a certain DAC handles hi-res vs lower resolution.

 

Anyway, let's move to an explanation of the differences between the tracks. One of the files was of course the original track in 96/24. In this case that was track C. As a control, I then took a copy of the original, converted it from FLAC to WAV and back 300 times, and then copied it over the network between my desktop computer and a server in a very electrically noisy machine room 100 times back and forth. This copy was track G.

 

My next step was converting the file to 16 bit and then upconverting it back to 24 bit (effectively leaving the bottom 8 bits as zeroes). This resulted in track E.

 

If I would do the test again, I would add in a small amount of noise at 23-24 bit level, as the FLAC encoding is very clever about not storing those redundant zero bits - thus the file size was much smaller than the original. This shows that a simple way to detect a very crude 16-to-24-bit upconversion is to compare the size of the FLAC file with the corresponding WAV file. If the WAV file is roughly 2 times as large as the FLAC, the material has real 24-bit content (but perhaps just noise), but if the WAV file is 4-6 times the size of the FLAC file, the material is clearly 16 bit.

 

Next I took the original file and downsampled it (using sox) to 48/24, and then added a small amount of filtered white noise (at the -120 dB level, so definitely inaudible) to make the spectrogram of the file show at least some high frequency content. The 48/24 file was track D.

 

Then I threw in a classic calibration test - I made a copy of the 48/24 track and amplified it by +1 dB. This was track H.

 

Next I did the same 24-to-16-to-24-bit downgrade as with the 96 kHz file, resulting in the 48/16 track B.

 

Then another test - just to verify that adding the tiny bit of noise to mask the lack of HF content didn't distort the test, I included a 48/16 version of the track *without* the noise. This was also a bit of a check against cheating - if the version without noise did significantly worse than the one with the fake HF content added, it would indicate use of analysis. So track I is 48/16 without any artificial noise.

 

Track A is a 44/16 file, produced using the same methods.

 

Track F is another control point - it is a mp3 version, produced using lame with the "insane" quality preset, decoded with mpg123, and converted into a FLAC. So basically a FLAC recording of a mp3 file.

 

Out of the 6 people responding, one provided no numerical assessments, and one did not give them for all tracks, but based on the numbers I got, here are the averages per track:

 

 

 

 

I have arranged the tracks in rough quality order, starting with the mp3 on the left and ending with the original file to the right. Ideally we should see the points line up as a rising line from the bottom left corner up to the upper right corner.

 

Two points stand out - track D (48/24), that for some reason got the lowest average total, and track H (the same 48/24 as track D, but 1 dB louder). This illustrates how important it is to adjust loudness/volume to *exactly* the same when comparing two components or recordings - 1 dB of difference in loudness was the only difference between the one that got the best overall rating, and the one that got the worst.

 

We also see the somewhat surprising fact that the mp3 version was the one that got the second highest score.

 

Here are all the numerical assessments in a scatter plot:

 

 

 

 

 

Again, we would expect to see the points line up as a rising line from the bottom left corner up to the upper right corner.

 

 

Now we get to the comments and assessments for each track. I have replaced the names of the submitters with the names, in the ICAO/ITU phonetic alphabet, of the last letters of the alphabet. Only "Uniform", "Victor" and "Whiskey" provided verbal comments. "Whiskey" provided two separate responses, I have included both responses separated by a slash ("/"). "Uniform" also provided two separate sets of comments, one based on listening through the MacBook Pro headphone jack in 24/96, the other through a 16/48 DAC.

 

Track A - 44.1 kHz / 16 bit, average: 4.9

 

"Uniform": 4

"Like the piece, not sure about the recording" (through 16/48 DAC)

"Hear some collisions here – sound in general very gritty and glassy (both)" (through MacBook Pro 24/96 headphone jack)

 

"Victor": 6

There’s decent harmonic richness and clarity. Voices are distinct but slightly flat and cold.

Not as good as track H. Seems like red book CD. The deeper I listen into this track to better it sounds.

"Whiskey":

Emphasized small S-es. Notice that this could happen in anything, but I noticed it as unnatural / Strange

 

"X-ray": 3

"Yankee": 7.5

"Zulu": 4

 

 

 

Track B - 48 kHz / 16 bit, average: 5.3

 

"Uniform": 8

"Somehow sounds softer than the first" (through 16/48 DAC)

"Rounds off some of the grit – more palatable (not sure what that means)" (through MacBook Pro 24/96 headphone jack)

 

"Victor": 1

Similar to track A but seems a bit worse; kind of mp3 like. Differences from A are not significant

and may be more about which sins of omission are less offensive than which is better. There is a

light metallic edge to the harmonics. Not as engaging as track H.

 

"Whiskey":

Seems to sound more comfortable (relative to A) / Normal

 

"X-ray": 6

"Yankee": 6.5

"Zulu": 5

 

 

Track C - 96 kHz / 24 bit, average: 6.5

 

"Uniform": 4

"Good bit of high “glassiness” - not sure what that means" (through 16/48 DAC)

"Not really drawn in by this one – want to stop listening" (through MacBook Pro 24/96 headphone jack)

 

"Victor":

Sounds rather generic. Nothing is grabbing my attention. Voices difficult to distinguish.

 

"Whiskey":

More spatious. More natural (after the fact ... this could well be "the one") / Flanging

 

"X-ray": 9

"Yankee": 7

"Zulu": 6

 

 

Track D - 48 kHz / 24 bit, average: 3.75

 

"Uniform": 5

"Sounds very similar to A" (through 16/48 DAC)

"Both gritty and glassy – similar to A" (through MacBook Pro 24/96 headphone jack)

 

"Victor":

Reminiscent of track A. Voices are more distinct. Harmonics are richer.

Voices have a more instrumental tone; more engaging than track A.

 

"Whiskey":

Sounds strange / Flanging

 

"X-ray": 2

"Yankee": 6

"Zulu": 2

 

 

Track E - 96 kHz / 16 bit

 

"Uniform": 4

"Didn't like it very much – not sure why" (through 16/48 DAC)

"Didn't like it very much – not sure why" (through MacBook Pro 24/96 headphone jack)

 

"Victor":

Seems somewhat limited and flat. Not bad sounding but not quite free and engaging.

 

"Whiskey":

No S-es ? I listened to this one after the happening once again.

Then I noted : OK, more normal S-es here / Too metallish

 

"X-ray": 5

"Yankee": 8

"Zulu": 7

Average: 6

 

 

Track F - mp3 (VBR, lame --preset insane)

 

"Uniform": 7

"Pretty good" (through 16/48 DAC)

"Like it quite a bit, even if the treble is a bit “glassy” sounding – actually

sounds “good” on this one?" (through MacBook Pro 24/96 headphone jack)

 

"Victor":

Harmonics seem a bit muddy. Some of the voices seem a bit artificial.

There’s a metallic edge to upper register voices. Voices seem one-dimensional.

 

"Whiskey":

S-es / Strange

 

"X-ray": 7

"Yankee": 4

"Zulu": 9

Average: 6.75

 

 

Track G - 96 kHz / 24 bit, converted flac-wav-flac 300 times, copied between computers 100 times

 

"Uniform": 5

"Very much like C, I hear “glassiness”" (through 16/48 DAC)

"Both chunky and glassy :/" (through MacBook Pro 24/96 headphone jack)

 

"Victor":

After listening to track H it’s hard for anything else to compare more favorably. In comparison to H

this sounds a bit more dynamically restricted. Less sense of pace and presence. Sounds a bit compressed and forced.

 

"Whiskey":

Wrong S-es / Bad

 

"X-ray": 8

"Yankee": 5

"Zulu": 8

Average: 6.5

 

 

Track H - 48 kHz / 24 bit + 1 db extra gain

 

"Uniform": 6

"Kind of sounds “gritty” - again, not sure I like it" (through 16/48 DAC)

"Hearing a bit of dissonance (collisions, as I said for A)" (through MacBook Pro 24/96 headphone jack)

 

"Victor": 10

Captivating. Voices have more ease and together the voice sound more instrumental. There is better presence, pace and clarity.

Hall dynamics come across better. Much more natural decay. This track is clearly better.

 

"Whiskey":

Sounds strange. S-es buzz. I noticed the "buzz" earlier on, but didn't write that down

so I don't know anymore where it was) / Wrong

 

"X-ray": 10

"Yankee": 6.5

"Zulu": 10

Average: 8.5

 

 

Track I - 48 kHz / 16 bit, "raw" resample, no masking noise added

 

"Uniform": 7

"Treble a bit rolled-off (e.g., no glassiness), but I like it in this recording" (through 16/48 DAC)

"Now hearing a bit of glassiness in this one" (through MacBook Pro 24/96 headphone jack)

 

"Victor":

Flat in comparison to H. Harshness around S sounds. Overall slight nasal quality. The music seems a bit forced. Lacking ease.

 

"Whiskey":

Too high pitched S-es; furthermore quite normal / Rather normal, but edgy

"X-ray": 4

"Yankee": 7

"Zulu": 3

Average: 5.25

41 Comments


Recommended Comments



First off, thanks for putting up with us all. You've shown a lot of patience through this process.

 

 

 

Second, the results are quite interesting. I rated H best, but also wondered in my email to you if the volume had been boosted. Ha: called it! All the rest I am surprised at and I am shocked (horrified!) at how highly I rated the mp3. Of course, I don't blame myself but rather my equipment which is pretty decent on the analog side but only entry-level on the digital side. Nothing to do with my ears, of course.

 

 

 

Thanks again.

Link to comment

"I rated H best, but also wondered in my email to you if the volume had been boosted. Ha: called it!"

 

 

 

Indeed!

 

 

 

"I am shocked (horrified!) at how highly I rated the mp3"

 

 

 

I wouldn't be so horrified - the "insane" lame preset produces surprisingly good results, and MP3 was designed to fool your ears after all...

Link to comment

I didn't submit my scoredue to the earlier than expected cut off date but I was also really surprised by the results. My top 3 choices were H, D and F! My bottom 3 were B, E, I.

 

 

 

It seems that I am more sensitive to bit than sampling rate but really surprising that I rated the mp3 version so highly.

Link to comment

Often blind testing gets a rap among audiophiles of not being sensitive enough. Yet things like loudness differences in tracks get clearly picked out at very small difference levels. Other testing with larger groups of people show even .2 dB will get singled out. Many people will not consciously notice that level difference unless cued into it and asked to compare and make sure. Some don't even then.

 

 

 

Some friends think I am overly picky about level matching etc. if you are attempting to compare equipment or cables. But turn it down or turn it off, make the swap run volume back up somewhere close to where it was, then adjust to suit a little bit will totally invalidate any such comparisons. Differences would have to be large indeed not to be swamped by a minor volume difference.

 

 

 

Another oft heard complaint is you need to listen over longer periods to know what you think of the sound. But external noise levels can change enough from day to day, and almost certainly will change enough from day to night it will invalidate comparisons like that. The classic example being TV turned down low for late night watching though plenty intelligible. Turn it on the next afternoon and the sound is hardly heard because the ambient noise levels has changed in the daytime by ten or more decibels.

 

 

 

Anyway, good attempt at this test Julf. I am a little surprised you didn't get more than 6 responses though I always expected you would get no more than 1 of 4 from those downloaded.

Link to comment

"I am shocked (horrified!) at how highly I rated the mp3"

 

 

 

I wouldn't be so horrified - the "insane" lame preset produces surprisingly good results, and MP3 was designed to fool your ears after all...

 

 

 

I gave the mp3 a "7" (4th highest preference) in the original run with the Carbon USB cable, then ranked it worst of all in the second run with the Coffee USB. Obviously the change in USB cables made a huge difference in my system! ;-)

Link to comment

I agree with you about blind testing - it is not a perfect tool, but it is the best we have, and it is actually surprisingly good.

 

 

 

"I am a little surprised you didn't get more than 6 responses though I always expected you would get no more than 1 of 4 from those downloaded."

 

 

 

Yes, something around 25 submissions was what I expected too - but then cutting it short left out people who thought they had ample time to do the test.

Link to comment

Another oft heard complaint is you need to listen over longer periods to know what you think of the sound. But external noise levels can change enough from day to day, and almost certainly will change enough from day to night it will invalidate comparisons like that.

 

 

 

May depend on what one is listening for. For example, if you are trying to determine if a new component has a sound of its own, so by definition it is not accurate, you may want to listen to a variety of music over a period of time to see whether the tell-tale signs arise: boredom, "listening fatigue," etc. Comparing two musical selections or components at various times may be helpful precisely because if you only compare once it may be at a time when the environment is not the best.

 

 

 

Agreed that listening to a single component/selection at various times over a long period is much more susceptible to extraneous variations.

 

 

 

I have read that "aural memory" may not be much longer than 30 seconds, so I try to include at least some comparisons where components/selections are switched quickly. For what may be similar reasons, the manufacturer of my favorite cables recommends listening to no more than 1 minute of a musical selection before switching to the component/selection to be compared. (Nevertheless, I think there is also something to be said for listening to a selection all the way through, to try to get an idea of the emotions the artist is trying to convey at various points, and how well the equipment/recording renders these.)

 

 

 

Also, with respect to PeterSt's very worthwhile caution that once you hear something notable in selection A, you will surely hear it in selection B, whether or not you would have noticed it in selection B if you'd played it first: I always go back and forth between components or musical selections to compare them, if I have the time, for precisely the reason that Peter mentions.

Link to comment

Being the one that rated the MP3 as the lowest ...

 

 

 

Okay its me..

 

 

 

I have listened for like 1.5 hrs and would not be surprised if I would make totally unexpected observations, but my list resembles what you may expect from the specs, with F being the lowest and E A and C the best ratings. I did hear differences, but I would also completely agree that there is a matter of taste in it. And since you can't shut down your brain (at least I can't but many on this forum can ;-), different people may have different experiences while listening to these tracks.

 

 

 

-Edgar

Link to comment

Hi Jud,

 

 

 

**"...I have read that "aural memory" may not be much longer than 30 seconds...**

 

 

 

I've read things along these lines too (some worse) and they always leave me puzzled.

 

 

 

I wonder how it is that I know it is my mother at the other end of a phone call, without having to ask who it is. And I (as I'm sure a great many of us can) hear subtle changes in the voice, as compared with memory, to the point where I can identify those occasions when she might have a cold. If aural memory was so short as some would claim, I'd think I wouldn't be able to tell who it was, much less recognize little changes in the voice compared with my memory of it.

 

 

 

That example, for me, cuts a Grand Canyon sized hole in the argument against aural memory. But there are other, smaller examples.

 

 

 

Most experienced guitarists I know can listen to a few seconds of a guitar and tell you right off whether it is a Les Paul or a Stratocaster.

 

 

 

And some audio gear has such a distinct signature, an experience listener (who has heard a lot of gear) can name the brand, if not the model, by listening alone.

 

 

 

At a meeting of the local audio society not too long ago, I arrived a few minutes late and was immediately put in the "hot seat" and asked to describe what I heard. I had no knowledge of what we being demonstrated and in fact, it turned out to be something other than what I commented on. After listening for about 30 seconds, I said I didn't know what the "test" was about but felt I was listening to a CD-R recorded at a high speed. (I find there is a distinct, or rather indistinct quality to the sound, a peculiar out-of-focus sense to the ambience and any traces of reverberation in a recording.) It turned out to be a CD-R burned, if I recall correctly, at 30x.

 

Perhaps a coincidence or "lucky guess" but that's what I heard.

 

 

 

Different microphones too, have a specific "sound". As, do my ears, certain highly touted DACs. And cables, and everything else I've spent any time listening to.

 

 

 

I firmly believe aural memory, like any other sort of memory, can, with "exercise", last a long, long time.

 

 

 

Sorry to all for the slightly off-topic post but I believe there is some relevance to the discussion at hand, i.e. to comparing formats.

 

 

 

Best regards,

 

Barry

 

www.soundkeeperrecordings.com

 

www.barrydiamentaudio.com

Link to comment

And I (as I'm sure a great many of us can) hear subtle changes in the voice, as compared with memory, to the point where I can identify those occasions when she might have a cold. If aural memory was so short as some would claim, I'd think I wouldn't be able to tell who it was, much less recognize little changes in the voice compared with my memory of it.

 

 

 

That example, for me, cuts a Grand Canyon sized hole in the argument against aural memory. But there are other, smaller examples.

 

 

 

And some audio gear has such a distinct signature, an experience listener (who has heard a lot of gear) can name the brand, if not the model, by listening alone.

 

 

 

Or the player. If it's a Hendrix song I've never heard before, I can tell you within a few notes (though it's possible Stevie Ray Vaughn or, less likely, Robin Trower, might fool me for a little while).

 

 

 

So what you've said is certainly something to consider.

 

 

 

But...

 

 

 

Don't know whether you saw the 60 Minutes segment on "face blindness" this past Sunday. There are people who absolutely cannot recognize the faces of friends and family without cues like hairstyle (or voice - and if the voice comes from a friend or family member who's changed hairstyles, there's cognitive dissonance). To demonstrate how this feels to the reporter, a researcher showed her pictures of faces turned upside down, with hair covered. She could not recognize any of them, including her own daughter(!), simply because the orientation was changed. Eye color, the shape and size of the face and all the constituent parts, was still there, it just didn't add up to Famous Actor, or Daughter.

 

 

 

So the case may well be similar with sounds - though there are sounds we'd recognize anywhere, this does not necessarily mean that we have the sort of instant, effortless recall and assessment of changes for all sounds that we do for sounds of family (Mom's voice) or familiar "friends" (Jimi's guitar).

Link to comment

Hi Jud,

 

 

 

I suppose if the sounds were played backwards, it would be somewhat more difficult. But I'd bet you would recognize "Axis: Bold As Love" within seconds, even backwards.

 

 

 

Didn't see the TV show. I would question a researcher who used upside-down faces to "test" facial recognition on anyone other than a professional hand-stander (or head-stander). ;-} That said, I understand that in this case, they wanted to demonstrate a sense of un-familiarity.

 

 

 

The idea of "face blindness" reminds me of all those TV shows where someone wears sunglasses and their family members don't recognize them.

 

 

 

Best regards,

 

Barry

 

www.soundkeeperrecordings.com

 

www.barrydiamentaudio.com

Link to comment

That said, I understand that in this case, they wanted to demonstrate a sense of un-familiarity.

 

 

 

There is in fact a specific part of the brain devoted to facial recognition. Of course that part of the brain is accustomed to seeing faces in their familiar right-side-up orientation. So what was demonstrated, quite specifically, is what the brain *other than* the facial recognition area (the area which "face-blind" people lack or which has some deficit) does with the features of faces. The answer is, very surprisingly to people who don't understand how this specific brain function works, not much.

 

 

 

That's why I'd be cautious (not necessarily doubtful, but cautious) about applying our experience with memory of familiar sounds and music to memory of possibly unfamiliar musical sounds, just as we've now learned one cannot apply the experience of identifying familiar faces to the identification of those faces' constituent parts in a slightly unfamiliar orientation.

Link to comment

Hi Jud,

 

 

 

With unfamiliar sounds, I believe the experience of the listener comes into play as a significant factor.

 

 

 

Best regards,

 

Barry

 

www.soundkeeperrecordings.com

 

www.barrydiamentaudio.com

Link to comment

Ah, Grasshopper: For the truly experienced listener, *all* sounds are familiar.

 

 

 

OK, OK - I know, I've gone Too Far again. ;-)

Link to comment

http://readinginthebrain.pagesperso-orange.fr/intro.htm

 

 

 

http://www.amazon.com/Reading-Brain-Science-Read-ebook/dp/B002SR2Q2I

 

 

 

"Reading in the Brain" by Stanislas Dehaene

 

 

 

Describes current knowledge about how reading takes place in the human brain. My not be everyone's cup of tea, but I find it well done and fascinating. Explains the processing done by the brain to read and parts of the brain's normal functions that are recruited for something synthetic like reading as communication.

 

 

 

Now the relevance here is the difference in testing perception thresholds and perception (Hendrix's guitar style, or Les Paul vs Strat or recognizing your daughter's voice on the phone). The aural memory said to be very short in audio testing is a different animal than the memory that lets one recognize types of sound. I don't know how to easily summarize it here. But there really is no conflict in those ideas.

 

 

 

Recognition of sound also has to be going through a similar multi-layered processing like reading. On the other hand you cannot recognize something below the threshold of perceptibility. It never makes it further along the brain's processing path to be processed in a useful manner. Even the generic brain processing that goes along with the physical filtering of the ear/ear drum etc. almost surely precedes processing further along a chain of neurons that will be involved in recognition in other more complex ways.

Link to comment

Now that I'm back home and have taken a few minutes to analyze my ratings, I see that, other than that $#%&@ louder track and the mp3 track, I selected all the higher frequency versions as the best sounding regardless of bit depth. Weird. Popular wisdom says that the bit depth is more important. Hummm. Double hummm.

Link to comment




×
×
  • Create New...