Jump to content
IGNORED

Help! How to sort duplicate files?


STC

Recommended Posts

9 hours ago, ddetaey said:

If your file names are exactly the same (not the included metadata), then you can use a tool like Easyduplicate

e.g. https://www.easyduplicatefinder.com/

other similar program exists, but again these utilities only look at filename.

 

JRiver could sort identical files based on file name and size. The problem is when I took two smallest drives and checked for duplicates, it gave me a result of 7913 files. I find this to be odd as I expected the duplicates files to be in "even" numbers. It turned out to be that I have two or three in the same folder. The safest option is to check each file and delete them which is what I am doing now. This process is time-consuming as I will open them in file explorer and compare both. Due to my past sins, I have two formats of the same files in one folder or two different folders of two different formats of the same file.

 

Thanks for the suggestion.

 

1 hour ago, Ralf11 said:

Filter by something that will exclude all other files besides the music ones

 

Then, sort the result on something that differs between the 'good' group and the 'bad' group

 

Last, select & delete

 

Do this outside of Jriver using a file utility.

 

I thought of going along this line but still not so easy. I created a new folder and copied the two folders in one. After about two hours of moving about 1.8TB files. I get a message saying "x" number of files with the same name existed. Can I safely "ignore" and delete the two folders? There is no way I could verify the computer transferred them correctly.

 

Link to comment
3 hours ago, STC said:

After about two hours of moving about 1.8TB files. I get a message saying "x" number of files with the same name existed.

 

Ha !

You must first test this a little to the merits of your own choice (you define the rules), but:

 

Such a message comes along with something like "the new one is larger" or "the new one is newer" or "both are equal". That is, when you copy one file it works like this. With several I actually don't know. But let's say you don't even care about this much because it is just music files (is it ?). Then indeed you could answer "skip" and "do this for all next 20000 etc. files". Now you'll have a single unique set.

You can copy to this same folder until you processed everything. Next delete all the source folders.

 

N.b.: XXHighEnd contains formal functionality for the very same, but now within a bandwidth of albums being the same or different (like a remaster would be different from an original). The algorithms to do this are crazily complex and with a too small bandwidth all is different (it also depends on ripping and (e.g.) EAC CD Drive settings). A too wide bandwidth implies all is the same.

This functionality is named Compare Albums. But although you are allowed to check this out a little, it won't work anyway without a license because the demo version is limited to 100 albums of output for selections/lists. And remember, it works at the level of albums, which means that each album is contained in its own folder.

 

STC02.thumb.png.3b8dfbe80a1b3fd6051ec4205d943684.png

 

In this case the 417 albums in G:\RS2\ will be compared against C:\Galleries, this latter being your net output folder. In my case 54K albums are in there, and the comparison will take maybe 5 minutes. After that the net result list of differences can automatically be copied (added) to the output folder. You can apply selections of what to copy as well. Or save the result lists to continue with it later. Etc. etc. etc.

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment
3 hours ago, Ralf11 said:

you have backups, right?

 

they are known to be good, right?

 

if not ----> do that first

 

 

The bad news is none of the backups are identical. :( 

 

 

2 hours ago, PeterSt said:

 

But let's say you don't even care about this much because it is just music files (is it ?).

 

 

Yes. I also got photos but I use ACDSee software which is little easier since I can see to photos side by side before deleting them. No wonder audio is so complicated.

 

 

2 hours ago, PeterSt said:

 

 

Then indeed you could answer "skip" and "do this for all next 20000 etc. files". Now you'll have a single unique set.

You can copy to this same folder until you processed everything. Next delete all the source folders.

 

 

I did something like this but in file explorer one folder contained the actaul name of the tracks but in another folder there were just numbered as track 1, 2 and so on. In JRiver, ishowed the full name of the tracks. Both folders are the same and I have no idea why the actual file name is different. I believe JRiver updated them or I could have manually updated them but still trying to figure out how the actual file name got changed. 

 

In another case, I have four backup copies of one album which are all identical. In all of them track 5 was missing. When I tried to locate the CD I couldn't find it and when I compared the Album art, it got nothing to do with the tracks in the album. I believe track file was filed in a different folder under "unassigned" folder by Jriver or iTunes which I used to rip the CDs. There are about 200 of them named track 5 and I am now forced to listen to them with Shazam to identify the tracks and possibly the album.

 

 

2 hours ago, PeterSt said:

 

In this case the 417 albums in G:\RS2\ will be compared against C:\Galleries, this latter being your net output folder. In my case 54K albums are in there, and the comparison will take maybe 5 minutes. After that the net result list of differences can automatically be copied (added) to the output folder. You can apply selections of what to copy as well. Or save the result lists to continue with it later. Etc. etc. etc.

 

 

This is maybe what I need. Thanx.

Link to comment
48 minutes ago, STC said:

This is maybe what I need. Thanx.

 

OK, from there on let's see it as a challenge and maybe use this thread to work it out. There's lots of tips and tricks and most of it is undocumented. Example: for each item listed there's a plethora of sub functions like 

 

STC03.thumb.png.c4f3d44f5807e03df8b499389271a186.pngSTC04.thumb.png.7610370514ae0c28e8539e5025ed023a.png

 

which seems to suit your quest to determine whatever happened in there (JRiver). What literally showed above is that first "Gallery" meta data was made of it (this is not necessary at all) which allows you to list all in one flat list. Thus, the output folder I talked about is now not about copying xx TB of data to het it there, but is simple meta data (including pictures if there) which hardly takes time to create (it's all 1K files).

All functionality works upon the Gallery Meta Data equally to the original file data (and the context menu above shows you some).

 

What worries me is that you talk about track data and not really albums, plus the "fact" that JRiver stores the coverart in separate folders which nothing can deal with except for JR itself (but I suppose something can be made for it). Thus, while XXHighEnd operates at normal folder structures (there is no database whatsoever), it also expects that (which would be normal), but for JR it is not stored like that. At least it is an option not to store it like that, AFAIK.

 

STC05.thumb.png.388290c2c63d4641f7900eb56221fda0.png

 

All works with Track data just the same (see above), but this does not imply it will be useful to you. I mean, you can't use it to find equal tracks and the like. On an other note things like this can be done at the album level (it is the most used function by myself):

 

STC06a.thumb.png.b5942094ae924155b40fbd4422ab64c0.pngSTC06b.thumb.png.03e24679ed833f565ba29200ff59fc54.png

 

which relatively easy lets you determine useful versions, with also means to tag albums for later revisiting (see the green little "Reject" box in the second. Or, what I am working on right now (actually quite finished) is this:

 

STC07.thumb.png.fca3ccaee67247d42b7b62002289c270.png

STC07a.thumb.png.a7440b9d6962909a90ad2277e9ef9774.pngSTC07b.thumb.png.0d06c4660cb64dc2128b2aa87c286ba5.png

 

which is an alternative explorer (for tablet usage) and with which you can tag each folder up to the album level with colors and border thickness.

While this is a super fast tagging means, it interacts with all the normal functionality and for example clicking the larger part of the button brings us here again:

 

STC07c.thumb.png.0c20c639639cfb1c84168b430ba7a338.png

 

and from there we can apply the copying and comparing and everything. The power is quite infinite and actually exactly made for the task you put yourself to. If it were albums ...

 

Btw, not trying to sell anything (I just spent more for $/time than the license costs) and no obligations either. It's just that I know what all this was made for and I am largely using it myself only. And that's a bit of a waste ...

 

 

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment
2 hours ago, PeterSt said:

 

OK, from there on let's see it as a challenge and maybe use this thread to work it out. There's lots of tips and tricks and most of it is undocumented. Example: for each item listed there's a plethora of sub functions like 

 

STC03.thumb.png.c4f3d44f5807e03df8b499389271a186.pngSTC04.thumb.png.7610370514ae0c28e8539e5025ed023a.png

 

which seems to suit your quest to determine whatever happened in there (JRiver). What literally showed above is that first "Gallery" meta data was made of it (this is not necessary at all) which allows you to list all in one flat list. Thus, the output folder I talked about is now not about copying xx TB of data to het it there, but is simple meta data (including pictures if there) which hardly takes time to create (it's all 1K files).

All functionality works upon the Gallery Meta Data equally to the original file data (and the context menu above shows you some).

 

What worries me is that you talk about track data and not really albums, plus the "fact" that JRiver stores the coverart in separate folders which nothing can deal with except for JR itself (but I suppose something can be made for it). Thus, while XXHighEnd operates at normal folder structures (there is no database whatsoever), it also expects that (which would be normal), but for JR it is not stored like that. At least it is an option not to store it like that, AFAIK.

 

STC05.thumb.png.388290c2c63d4641f7900eb56221fda0.png

 

All works with Track data just the same (see above), but this does not imply it will be useful to you. I mean, you can't use it to find equal tracks and the like. On an other note things like this can be done at the album level (it is the most used function by myself):

 

STC06a.thumb.png.b5942094ae924155b40fbd4422ab64c0.pngSTC06b.thumb.png.03e24679ed833f565ba29200ff59fc54.png

 

which relatively easy lets you determine useful versions, with also means to tag albums for later revisiting (see the green little "Reject" box in the second. Or, what I am working on right now (actually quite finished) is this:

 

STC07.thumb.png.fca3ccaee67247d42b7b62002289c270.png

STC07a.thumb.png.a7440b9d6962909a90ad2277e9ef9774.pngSTC07b.thumb.png.0d06c4660cb64dc2128b2aa87c286ba5.png

 

which is an alternative explorer (for tablet usage) and with which you can tag each folder up to the album level with colors and border thickness.

While this is a super fast tagging means, it interacts with all the normal functionality and for example clicking the larger part of the button brings us here again:

 

STC07c.thumb.png.0c20c639639cfb1c84168b430ba7a338.png

 

and from there we can apply the copying and comparing and everything. The power is quite infinite and actually exactly made for the task you put yourself to. If it were albums ...

 

Btw, not trying to sell anything (I just spent more for $/time than the license costs) and no obligations either. It's just that I know what all this was made for and I am largely using it myself only. And that's a bit of a waste ...

 

 

 

Peter, this is impressive. I like how it shows a snapshot of all the album in the "tracks compare" interface.

 

My problem with trying out new software is the difficulties to learn how to configure them for my purpose. I need virtual cables such as Reroute to interface with the Reaper. Me just too old to learn something new all over again. Hope you understand why I am slow to try XXHighEnd.

 

Thank you for sharing.

Link to comment

Thank you for the kind words.

 

2 minutes ago, STC said:

I need virtual cables such as Reroute to interface with the Reaper.

 

Maybe a dumb question:

We are not talking about playback, right ? So wouldn't it just need references to your file locations ? (volumes with drive letters, or references like \\Server\ShareName\) 

 

No problem if it is difficult to understand all the way, at this moment.

If it is about playback then I myself might have difficulty with pointing you in the right direction. But I'd say it isn't about playback (or at least not now).

 

But as I said, no obligations anywhere, also not because I spent some time with explaining (in this thread). That was my choice ... It is only that I think I understand your relatively huge problem and some automation for that purpose could be able to help out.

But if you need baby steps, don't hesitate ... B|

 

 

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment
1 hour ago, PeterSt said:

We are not talking about playback, right ?

 

Actually, I stream from JRiver to Reaper. The only process Jriver does is up or down sample the files to 24/96.

 

1 hour ago, PeterSt said:

 

So wouldn't it just need references to your file locations ? (volumes with drive letters, or references like \\Server\ShareName\) 

 

JRiver does list the location in the column although not as elegant as yours. But going through 60K files one by one is too taking. For an example, Nils Lofgren's Acoustic live is in 4 different drives. For some weird reasons, track 1 You is not listed as duplicates. I have checked the metadata and all the information is similar to the other tracks and yet it refuses to show up. My worries about going automation without fully understanding what it does to the files is that I could accidently erase some files without knowing. I spotted this discrepancy but how about others I do not know?

 

I think this problem started because of the ripping setting where the put the artist folder first and then the album. Basically, I do not know much about computers.

 

 

1 hour ago, PeterSt said:

 

....

But if you need baby steps, don't hesitate ... B|

 

 

 

I am still crawling. Will pester you when I ready to take the baby steps. ;) 

 

8 minutes ago, fragoulisnaval said:

do you use by any chance apple lossless? You could use iTunes to do the sorting for you quite easily...

 

I ripped most of my files using iTunes but no longer using them when I started listening to DSD files. As far as I know iTunes couldn't play DSD ( at least that was the situation about 6 or 7 years ago) so I wonder how iTunes could sort the DSD files.

 

Thank you for suggestion.

Link to comment
3 hours ago, STC said:

Me just too old to learn something new all over again. Hope you understand why I am slow to try XXHighEnd.

 

 

I see a couple solutions here. None appetizing in the face of every logical file and metadata structure being afflicted on top of many many duplicates.

 

First is realizing you are effectively going to spend as much or more time on a less elegant solution, with a very high likelihood of creating multiple other problems, than you would learning Peter's software and slowly coaxing out all the knots you've tied. 

 

Second, you are not an eager 20 y/o who could harness the skills to sort this out through hard work, ingenuity, and the effective remove wages add. Someone who could write a program or three to reduce the background clutter.  Then set to work understanding and directly fixing the problem while sourcing fresh copies of lost cause albums.  

 

In either case, a HDD of sufficient capacity to combine all music in one place may be fortuitous. Moving files using rsync or similar, Windows transfer of large archives is simply not an advisable solution, would be a step in the right direction. Which makes me believe pursuing the second option above is going to be more enjoyable for you by far. In at least equal amounts as settling on a set of software that leaves your data in a serviceable condition will be for whoever is charged with fixing/maintaining it. :)

 

 

Link to comment
3 hours ago, rando said:

I see a couple solutions here. None appetizing in the face of every logical file and metadata structure being afflicted on top of many many duplicates.

 

Already there you at least lost me. So that was pretty fast.

 

3 hours ago, rando said:

Which makes me believe pursuing the second option

 

Make that the fifth. I couldn't even see one ?

 

3 hours ago, rando said:

In at least equal amounts as settling on a set of software that leaves your data in a serviceable condition will be for whoever is charged with fixing/maintaining it. :)

 

Where did you pick up this kind of language ? :cool:

Lush^3-e      Lush^2      Blaxius^2.5      Ethernet^3     HDMI^2     XLR^2

XXHighEnd (developer)

Phasure NOS1 24/768 Async USB DAC (manufacturer)

Phasure Mach III Audio PC with Linear PSU (manufacturer)

Orelino & Orelo MKII Speakers (designer/supplier)

Link to comment
1 hour ago, PeterSt said:

 

Already there you at least lost me. So that was pretty fast.

 

 

Make that the fifth. I couldn't even see one ?

 

 

Where did you pick up this kind of language ? :cool:

 

Don't make me throw a dictionary at you, Peter!  x-D

 

My apologies for using complex sentences. He should pursue a long term solution using either his own time or that of a younger person better conditioned to the task of efficiently correcting his database errors. He made a giant mess with no extremely simple solution. 

 

 

Link to comment
9 minutes ago, mansr said:

The problem isn't that your sentences are complex. It is that they are void of meaning owing to a complete lack of proper grammatical structure.

 

I rather enjoy Rando’s writing style. Reminds me a bit of impressionistic art, where there are a lot of brushstrokes that seem random and unstructured from close up, and yet coalesce into a coherent picture if not looked at too closely. I had no problem figuring out what he was saying as long as I didn’t focus on any individual word or phrase ;)

Link to comment
4 hours ago, rando said:

He should pursue a long term solution using either his own time or that of a younger person better conditioned to the task of efficiently correcting his database errors. He made a giant mess with no extremely simple solution. 

 

The giant mess happened because three of my drives failed within a week or so.

 

I started ripping my CDs to a laptop and was using that during my initial transition to computer audiophile years ago. Then I bought a NAS and copied those files them. When it was about to reach its capacity, I bought another drive and copied those from the first NAS and also from my laptop. As I progressed to a higher level, I found DSD files fetching from the NAS causes some clicks although they were connected via Cat6 wire. I then transferred all the three drive to a local dedicated pc and created a backup copy for the 1st local HDD.

 

So now I have 5 drives each slightly modified because I either edited or rearranged the tracks so that they reflect exactly how my CD tracks were arranged. About a month ago my laptops HDD failed, although the SMART test said the drive was in healthy condition. Since my laptop contained most of my reference recordings and the DSD files, I copied the backup of that drive to my 1st local HDD......and that was when I realized the data were not identical. During copying, some files names were truncated and still do not know why one folder tracks were with different names although they are identical files.

 

To make the matter worse, Flickr told me to pay up remove the photos. It was a few hectic days downloading close to 1TB of videos and photos to the NAS which may have taxed my laptop and two of the NAS to fail at the same time.  I could access the drive of one 3TB NAS by booting from Linux and then I decided to copy them to the local HDD just be on the safe side. I still need to find a way to access to the older NAS which the Linux couldn't read all the folder. Apparently, the old Mybooklive used some older OS. That's another problem but not crucial as I believe I have all the data in the other drives.

 

This is how the giant mess was created. No one expected 3 drives to fail at the same time. :( 

Link to comment

I just called stop on a rather extensive examination of the Havemeyer collection to eat.  Your comparison elicited a small laugh.  Those French en plein-air daubs are fairly well priceless nowadays. Especially compared to some rather precise and accepted safe choices others put up as personal statements. ;)

 

3 hours ago, mansr said:

The problem isn't that your sentences are complex. It is that they are void of meaning owing to a complete lack of proper grammatical structure.

 

Call me suspicious, but your study and usage of language wasn't shoved down a very narrow channel who's passage was dependent on staying firmly within the lines, quite early on was it?  

 

@STC Sorry to see your thread broken up with arguments. If you find yourself in the weeds on a technical problem that is interrupting your listening enjoyment. Halt until you can equip yourself with the proper tools or place it into the lap of someone who does possess them.  

 

This honestly sounds like a Winter project that will give you chance to guard against future occurrences. Something that should be approached with patience and planning instead of being rushed.

 

Online storage is a painful process many here fear having to face retrieval from. 

Link to comment
  • 2 weeks later...

Hi ST

You might have luck using the smartlist method using -[Filename]=[] ~dup=[Filename] or possibly ~dup=[File Size],[Name]

 

not sure whats going on, are they real duplicates (actual files) or just dupes showing in the JRiver library?

 

Something similar has happened to me but I can't recall why. Maybe the drive showed up with a different letter like E one time and F another, appearing to be two different drives. There are various search tools in MC to locate all files on say E and delete them (making sure you have the real ones backed up).

 

If no luck suggest maybe post at the Interact forum

Cheers

Sound Minds Mind Sound

 

 

Link to comment
6 hours ago, Audiophile Neuroscience said:

You might have luck using the smartlist method using -[Filename]=[] ~dup=[Filename] or possibly ~dup=[File Size],[Name]

 

Hi David, 

Nice to see you here after a long time.

 

Yes,

I am using ~dup=[File Size],[Name] and already cleaned one drive. I have about 1400 files to clear in another drive and just over 60K in yet another drive. Once I clear the 1400 files the 60K should be rather rather easy as I can delete them at one go based on the drive letters . 

 

Link to comment
17 minutes ago, STC said:

 

Hi David, 

Nice to see you here after a long time.

 

Thanks ST

17 minutes ago, STC said:

Yes,

I am using ~dup=[File Size],[Name] and already cleaned one drive. I have about 1400 files to clear in another drive and just over 60K in yet another drive. Once I clear the 1400 files the 60K should be rather rather easy as I can delete them at one go based on the drive letters . 

 

 

Yes.

 

The only other solution I can think of that's maybe easier is to simply restore to a previously library where there were no dups. Run Re-import to bring up to date.

Sound Minds Mind Sound

 

 

Link to comment

I may have missed it, but I don't see any mention of simple software like Similarity, AudioDedup, and AllDup, all of which are freeware that seem to work very well. Another that looks great (although I haven't tried it) is mp3 Duplicate Finder (which does work with FLAC etc despite its name). This one will easily let you move duplicate files into another folder, so you can preserve them until you confirm that you haven't mistakenly culled files you want to keep.

 

If the problem is related to duplicate files that have different tags (and it only takes one to cause this), you can also identify them with a good tag editor by searching for all the files with one of the tags that do match.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...