Listening test results
Here are the results of the listening test. Unfortunately I only got 6 submissions before I decided to stop collecting entries, so it is not really a statistically relevant sample, and I would not draw any profound conclusions from the results.
As has been pointed out, anyone handy with a program such as Audacity could have cheated by looking at the spectrum plots, and just looking at file sizes would already have provided a lot of guidance. Fortunately the results seem to indicate that the submissions were based on honest listening.
What is very interesting is that the full set of files has been downloaded 103 times, and the first track 147 times, but only 6 people actually submitted results. What can we conclude from that? That people are shy and are afraid to make a fool of themselves? Or that they really didn't hear a difference? No way to tell.
I would also like to state that when I put together the test, I had no strong position with regards to hi-res. I originally found my way to CA because of my interest in hi-res, and having gotten burned by HDtracks, I found the "Audiophile Downloads" and "Music Analysis - Objective & Subjective" forums extremely useful. I have used a fair bit of money to support HDtracks, eclassical, B&W Society of Sound, 2L, iTrax, Linn and others.
I can definitely hear a difference between some of the hi-res tracks and red book. Or at least I think I can. But I can't tell if that is from different mastering. Or it could even be all in my head.
A number of people have stated that the a capella material doesn't really allow for the benefit of hi-res to be heard, as it doesn't have any significant high frequency content, and that is definitely a valid point. On the other hand, at least one person stated that unaccompanied, natural human voice is the best test material.
I also want to make clear that this is a test of listening material as it is mostly presented to the buying public - it starts out being recorded at 96/24, and it then gets downconverted to whatever distribution format needed. It is not a comparison of recording formats. As the physical listening format is 96/24 on all of the tracks, it also doesn't allow a DAC optimized for lower resolutions to shine - so it is also not a comparison of how a certain DAC handles hi-res vs lower resolution.
Anyway, let's move to an explanation of the differences between the tracks. One of the files was of course the original track in 96/24. In this case that was track C. As a control, I then took a copy of the original, converted it from FLAC to WAV and back 300 times, and then copied it over the network between my desktop computer and a server in a very electrically noisy machine room 100 times back and forth. This copy was track G.
My next step was converting the file to 16 bit and then upconverting it back to 24 bit (effectively leaving the bottom 8 bits as zeroes). This resulted in track E.
If I would do the test again, I would add in a small amount of noise at 23-24 bit level, as the FLAC encoding is very clever about not storing those redundant zero bits - thus the file size was much smaller than the original. This shows that a simple way to detect a very crude 16-to-24-bit upconversion is to compare the size of the FLAC file with the corresponding WAV file. If the WAV file is roughly 2 times as large as the FLAC, the material has real 24-bit content (but perhaps just noise), but if the WAV file is 4-6 times the size of the FLAC file, the material is clearly 16 bit.
Next I took the original file and downsampled it (using sox) to 48/24, and then added a small amount of filtered white noise (at the -120 dB level, so definitely inaudible) to make the spectrogram of the file show at least some high frequency content. The 48/24 file was track D.
Then I threw in a classic calibration test - I made a copy of the 48/24 track and amplified it by +1 dB. This was track H.
Next I did the same 24-to-16-to-24-bit downgrade as with the 96 kHz file, resulting in the 48/16 track B.
Then another test - just to verify that adding the tiny bit of noise to mask the lack of HF content didn't distort the test, I included a 48/16 version of the track *without* the noise. This was also a bit of a check against cheating - if the version without noise did significantly worse than the one with the fake HF content added, it would indicate use of analysis. So track I is 48/16 without any artificial noise.
Track A is a 44/16 file, produced using the same methods.
Track F is another control point - it is a mp3 version, produced using lame with the "insane" quality preset, decoded with mpg123, and converted into a FLAC. So basically a FLAC recording of a mp3 file.
Out of the 6 people responding, one provided no numerical assessments, and one did not give them for all tracks, but based on the numbers I got, here are the averages per track:
I have arranged the tracks in rough quality order, starting with the mp3 on the left and ending with the original file to the right. Ideally we should see the points line up as a rising line from the bottom left corner up to the upper right corner.
Two points stand out - track D (48/24), that for some reason got the lowest average total, and track H (the same 48/24 as track D, but 1 dB louder). This illustrates how important it is to adjust loudness/volume to *exactly* the same when comparing two components or recordings - 1 dB of difference in loudness was the only difference between the one that got the best overall rating, and the one that got the worst.
We also see the somewhat surprising fact that the mp3 version was the one that got the second highest score.
Here are all the numerical assessments in a scatter plot:
Again, we would expect to see the points line up as a rising line from the bottom left corner up to the upper right corner.
Now we get to the comments and assessments for each track. I have replaced the names of the submitters with the names, in the ICAO/ITU phonetic alphabet, of the last letters of the alphabet. Only "Uniform", "Victor" and "Whiskey" provided verbal comments. "Whiskey" provided two separate responses, I have included both responses separated by a slash ("/"). "Uniform" also provided two separate sets of comments, one based on listening through the MacBook Pro headphone jack in 24/96, the other through a 16/48 DAC.
Track A - 44.1 kHz / 16 bit, average: 4.9
"Uniform": 4
"Like the piece, not sure about the recording" (through 16/48 DAC)
"Hear some collisions here – sound in general very gritty and glassy (both)" (through MacBook Pro 24/96 headphone jack)
"Victor": 6
There’s decent harmonic richness and clarity. Voices are distinct but slightly flat and cold.
Not as good as track H. Seems like red book CD. The deeper I listen into this track to better it sounds.
"Whiskey":
Emphasized small S-es. Notice that this could happen in anything, but I noticed it as unnatural / Strange
"X-ray": 3
"Yankee": 7.5
"Zulu": 4
Track B - 48 kHz / 16 bit, average: 5.3
"Uniform": 8
"Somehow sounds softer than the first" (through 16/48 DAC)
"Rounds off some of the grit – more palatable (not sure what that means)" (through MacBook Pro 24/96 headphone jack)
"Victor": 1
Similar to track A but seems a bit worse; kind of mp3 like. Differences from A are not significant
and may be more about which sins of omission are less offensive than which is better. There is a
light metallic edge to the harmonics. Not as engaging as track H.
"Whiskey":
Seems to sound more comfortable (relative to A) / Normal
"X-ray": 6
"Yankee": 6.5
"Zulu": 5
Track C - 96 kHz / 24 bit, average: 6.5
"Uniform": 4
"Good bit of high “glassiness” - not sure what that means" (through 16/48 DAC)
"Not really drawn in by this one – want to stop listening" (through MacBook Pro 24/96 headphone jack)
"Victor":
Sounds rather generic. Nothing is grabbing my attention. Voices difficult to distinguish.
"Whiskey":
More spatious. More natural (after the fact ... this could well be "the one") / Flanging
"X-ray": 9
"Yankee": 7
"Zulu": 6
Track D - 48 kHz / 24 bit, average: 3.75
"Uniform": 5
"Sounds very similar to A" (through 16/48 DAC)
"Both gritty and glassy – similar to A" (through MacBook Pro 24/96 headphone jack)
"Victor":
Reminiscent of track A. Voices are more distinct. Harmonics are richer.
Voices have a more instrumental tone; more engaging than track A.
"Whiskey":
Sounds strange / Flanging
"X-ray": 2
"Yankee": 6
"Zulu": 2
Track E - 96 kHz / 16 bit
"Uniform": 4
"Didn't like it very much – not sure why" (through 16/48 DAC)
"Didn't like it very much – not sure why" (through MacBook Pro 24/96 headphone jack)
"Victor":
Seems somewhat limited and flat. Not bad sounding but not quite free and engaging.
"Whiskey":
No S-es ? I listened to this one after the happening once again.
Then I noted : OK, more normal S-es here / Too metallish
"X-ray": 5
"Yankee": 8
"Zulu": 7
Average: 6
Track F - mp3 (VBR, lame --preset insane)
"Uniform": 7
"Pretty good" (through 16/48 DAC)
"Like it quite a bit, even if the treble is a bit “glassy” sounding – actually
sounds “good” on this one?" (through MacBook Pro 24/96 headphone jack)
"Victor":
Harmonics seem a bit muddy. Some of the voices seem a bit artificial.
There’s a metallic edge to upper register voices. Voices seem one-dimensional.
"Whiskey":
S-es / Strange
"X-ray": 7
"Yankee": 4
"Zulu": 9
Average: 6.75
Track G - 96 kHz / 24 bit, converted flac-wav-flac 300 times, copied between computers 100 times
"Uniform": 5
"Very much like C, I hear “glassiness”" (through 16/48 DAC)
"Both chunky and glassy :/" (through MacBook Pro 24/96 headphone jack)
"Victor":
After listening to track H it’s hard for anything else to compare more favorably. In comparison to H
this sounds a bit more dynamically restricted. Less sense of pace and presence. Sounds a bit compressed and forced.
"Whiskey":
Wrong S-es / Bad
"X-ray": 8
"Yankee": 5
"Zulu": 8
Average: 6.5
Track H - 48 kHz / 24 bit + 1 db extra gain
"Uniform": 6
"Kind of sounds “gritty” - again, not sure I like it" (through 16/48 DAC)
"Hearing a bit of dissonance (collisions, as I said for A)" (through MacBook Pro 24/96 headphone jack)
"Victor": 10
Captivating. Voices have more ease and together the voice sound more instrumental. There is better presence, pace and clarity.
Hall dynamics come across better. Much more natural decay. This track is clearly better.
"Whiskey":
Sounds strange. S-es buzz. I noticed the "buzz" earlier on, but didn't write that down
so I don't know anymore where it was) / Wrong
"X-ray": 10
"Yankee": 6.5
"Zulu": 10
Average: 8.5
Track I - 48 kHz / 16 bit, "raw" resample, no masking noise added
"Uniform": 7
"Treble a bit rolled-off (e.g., no glassiness), but I like it in this recording" (through 16/48 DAC)
"Now hearing a bit of glassiness in this one" (through MacBook Pro 24/96 headphone jack)
"Victor":
Flat in comparison to H. Harshness around S sounds. Overall slight nasal quality. The music seems a bit forced. Lacking ease.
"Whiskey":
Too high pitched S-es; furthermore quite normal / Rather normal, but edgy
"X-ray": 4
"Yankee": 7
"Zulu": 3
Average: 5.25
41 Comments
Recommended Comments