Jump to content
IGNORED

Measuring objectively real differences in files with identical checksums


wgscott

Recommended Posts

Chris

 It is highly unfair to permit members of this group to attack other members, without them having the right of reply

I should have the ability to at least correct blatant lies.

Alex

 

How a Digital Audio file sounds, or a Digital Video file looks, is governed to a large extent by the Power Supply area. All that Identical Checksums gives is the possibility of REGENERATING the file to close to that of the original file.

PROFILE UPDATED 13-11-2020

Link to comment
5 minutes ago, sandyk said:

Chris

 It is highly unfair to permit members of this group to attack other members, without them having the right of reply

I should have the ability to at least correct blatant lies.

Alex

Leave this thread. Nobody here wants to hear from you and your name isn't even mentioned. 

Founder of Audiophile Style | My Audio Systems AudiophileStyleStickerWhite2.0.png AudiophileStyleStickerWhite7.1.4.png

Link to comment
Quote

Measuring objectively real differences in files with identical checksums

 

Identical checksum does not necessarily mean two files are the same bit by bit.

 

For example, MD5 checksum is 128bit fixed length integer value, therefore two larger than 128 bit files may share the same checksum value. Chances are low (slightly higher chance than 2_128.png.2ddce03717cecc0d86a8e62c26b6ccfe.png of probability)

 

The following example files are created deliberately.

 

WAV files a.wav and b.wav in attached SameMD5Wav.zip share the same MD5 checksum value 161b403cd39f1f3f6cffb64b4dd8ccb9 but Waveform is different, beginning part difference is obviously seen.

 

C:\tmp>CertUtil -hashfile a.wav MD5
MD5 hash of a.wav:
161b403cd39f1f3f6cffb64b4dd8ccb9

C:\tmp>CertUtil -hashfile b.wav MD5
MD5 hash of b.wav:
161b403cd39f1f3f6cffb64b4dd8ccb9

 

 

A.thumb.png.cbc4cc2a90c754bd42609218ea82a1ec.png

B.thumb.png.14629aa69e73a6a871c656aadc885bc4.png

 

SameMD5Wav.zip

 

Sunday programmer since 1985

Developer of PlayPcmWin

Link to comment
1 hour ago, yamamoto2002 said:

 

Identical checksum does not necessarily mean two files are the same bit by bit.

 

 

There is a simple solution to this concern.  All you have to do is cross-check with multiple hashing algorithms to guard against a false positive from a collision due to weakness in one of the algorithms.  One can have high confidence in bit-for-bit equivalence if multiple hashing algorithms agree.

 


$ openssl md5 *.wav
MD5 (a.wav) = 161b403cd39f1f3f6cffb64b4dd8ccb9
MD5 (b.wav) = 161b403cd39f1f3f6cffb64b4dd8ccb9


$ openssl sha256 *.wav
SHA256(a.wav)= b4c9cba57fbe8150943b0d6c361af76bd48ba128f606be295f3a8cbbbba93af4
SHA256(b.wav)= cb6b1a94718fbb53b28ad99f9fc35a9c9d9f7534485c1605ac329bb04e471399

 

$ openssl md4 *.wav
MD4(a.wav)= 8de0b4d206a36321a439c9c150231a73
MD4(b.wav)= d78d9a340f322ba5749565db1fb0068f

Here we see that both MD4 and SHA256 reveal that the files are not equal despite the hashing collision with MD5.

Link to comment
5 hours ago, jabbr said:

I tried to listen for differences in sound from two but identical files, but couldn’t hear any. Then I realized that my file system (ZFS) automagically maps identical files to the same location (actually it does block level de duplication) , so I can’t even test this. 

 

Well then, that "version" stored on your file system could be the absolute best or absolute worst copy of the music...

 

Which is it based on the theory of how this is supposed to work? Or is it all just "different" and equal with no relative judgment possible!?

 

Archimago's Musings: A "more objective" take for the Rational Audiophile.

Beyond mere fidelity, into immersion and realism.

:nomqa: R.I.P. MQA 2014-2023: Hyped product thanks to uneducated, uncritical advocates & captured press.

 

 

Link to comment

About 20 years ago of issue of local computer magazine 月刊アスキー, there was an article about sound difference of identical DAW project file stored on different hard drive.

One drive is IDE HDD and another drive is SCSI HDD, and a music composer says SCSI HDD sounds better.
On the next issue, the cause of sound difference is found, it is caused by sound stutter due to slow drive transfer speed of IDE HDD :)

Sunday programmer since 1985

Developer of PlayPcmWin

Link to comment
6 hours ago, Archimago said:

 

Well then, that "version" stored on your file system could be the absolute best or absolute worst copy of the music...

 

Which is it based on the theory of how this is supposed to work? Or is it all just "different" and equal with no relative judgment possible!?


In all seriousness files stored on media could certainly, no, will have different levels of readout noise. That could be audible. However, the readout noise is not preserved with file copy. In the case of ZFS, the data is also mirrored, and read into RAM cache. In my case, from RAM cache, it is transmitted across network so whatever is going on the NAS is entirely electrically isolated from the streaming endpoint. My network is 10G fiber and assuming my switch meets the compliance testing standard, there is extremely little noise in the bits. Of course there is zero common mode noise ;) 

 

Absolute best or worst stored in my magnetic media, as long as the bits are readable, the bits sent to my DAC do not reflect this noise with virtual certainty (we have no reason to invoke entanglement 😂) and this issue is moot for me.

Custom room treatments for headphone users.

Link to comment
2 minutes ago, jabbr said:


It’s really quite easy if you have the equipment. Make two copies on a CD and then magnify and look at the pits. They will be different. Or measure the magnetization, again different. 
 

At a fine level all digital bits are represented by analog processes and the analog is different from bit to bit.

 

(of course digital systems are explicitly designed to deal with this)

 

I prefer to use a tunneling microscope. Makes the differences much easier to see ;)

 

Link to comment
Just now, plissken said:

 

I'm sure the 7nm lithography they use to make IC's matters to. Best get one cherry picked from the center of the wafer.


Snark aside: 

 

1) are there differences in bit-identical files?: yes, always

2) are the differences meaningful for SQ in my system?: no, never 

Custom room treatments for headphone users.

Link to comment
4 minutes ago, jabbr said:


Snark aside: 

 

1) are there differences in bit-identical files?: yes, always

2) are the differences meaningful for SQ in my system?: no, never 

 

It's not meant as snark. Manufacturers will bin and specialty sku parts from the center of the wafer vs the edge.

 

Will that mean differences for audio? No. Never. In no ones system.

 

It would be absurd knowing what we know to suggest otherwise. Both of us know better. 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...