Tidying up your photo/songs collection
Over the years and hard disk changes and backups and new computers and OS installs I've got a pretty untidy photo collection, scattered across several disks.
I wanted to make a clean and tidy library on my main desktop using F-Spot. Unfortunately this program does not have yet a duplicate detection feature (it seems to be available in the latest version though). So I've ended up with many photos repeated on my new library after importing from different backup copies.
I thought I could do a simple script to find those files with matching md5sum, no matter the filename, so I could remove the photos that appear more than once. But I was glad to see that somebody did it first.
Of course the same idea could be applied to MP3 files or any other type of file. The only drawback, if you want to be picky, is that md5sum covers the whole data of a file, so a change in let's day EXIF data in a JPEG file will render the file as different even though it may contain a duplicate image. Same holds for two identical MP3 audio files with different ID3 tags. A better mouse trap would ignore the metadata on the files.