Programming in perl
My goal here is to write apps in perl that are actually useful to the
end user. I've found the biggest obstacle is the distribution of
dependencies, but slowly things are changing. Debian users have
managed to package findimagedupes, for example.
[2002/02/06 23:55] PixiePlus now
supports similar image finding using an algorithm based on mine, and for
those unable to run a current version of KDE, gqview will also find your similar
images, albeit using a different algorithm whose results I haven't compared
with my own. Both are FAR faster than findimagedupes and, I would say, both
make it obsolete. If someone else would like to continue its development
for web or other non-GUI purposes (this means you, Debian maintainers ;) ),
by all means feel free, but consider my itch scratched.
[2001/09/20 08:30] I've been getting a lot of 'bug reports' suddenly
about an uninitialized value at line 212 causing findimagedupes to not
work. Apparently the API for ImageMagick has changed since the
version I'm running. The "Ping" method no longer returns a comma
separated list, but only one value. Happily it just happens to be the
one value I use, so if you replace line 206:
($width, $height, $size, $format) = split(',', $image->Ping($file));
$format = $image->Ping($file));
it should work again. I see no reason to update findimagedupes until
it breaks (i.e. until I'm running a distribution that includes the new
version of ImageMagick; I don't install libs outside of the RPM system
if I can avoid it,) so if someone wants to take it over temporarily or
permanently please drop me a line.
[2001/03/03 10:05] Markus Schoder has contributed finddupes.cpp, GPL'ed source code for a C++
based version of my horribly slow compare routine. In his testing on
a directory of 35,000 images, it was about 300 times faster than
findimagedupes' perl implementation. It's included here for everyone
who has experienced the speed problem. I'll
probably integrate it into the next release somehow.
You can compile this by going
g++ -O3 finddupes.cpp -o finddupes
(or download this gzipped executable, built on Mandrake
7.2) and run it like so:
finddupes .95 <imagedupes-db.txt
where .95 is the desired threshold, and .9 is the default. Thanks, Markus!
[2001/02/11 21:00] Version
0.1.3 released with fixes and performance enhancements from Paul
Cassella and Max Stekelenburg, as well as bugfixes to make it work
with Linux-Mandrake 7.2 and a new "GUI mode" (not an actual GUI, but
it produces output which ought to be of easier use to a GUI.)
[2000/10/01 15:30] Performs a rough "visual diff" on two or more
images. This command line program will scan two pictures (or a whole
tree of pictures) and determine if there are any that look alike. It
uses a simple algorithm, hopefully documented well in the code, to
reduce every picture to a 16x16x1 bitmap, and counts the bits that
differ between each pair. It's something like 98% accurate when used
on typical image subjects.
Text or other graffiti added to pictures will usually not confuse
the program, but if you take a lot of very similar pictures (like
sunsets or webcam grabs) they will probably turn up as false positives.
NEW (20010211): Download findimagedupes 0.1.3.
Download findimagedupes 0.1.2.
NEW (20010218): Download updated Debian Sid package (0.1.3-1) kindly contributed by Guenter Bechly.
findimagedupes [options] [<file1> <file2>]
-rescan = rescan fingerprints of all files in directory
-f <file> = use <file> as image fingerprint database
-d <dir> = scan <dir> instead of current directory
-t <num> = use <num> as threshold% of similarity (default 90)
-v <program> = launch <program> (in bg) to view each set of dupes
-c <file> = create GQView collection <file>.gqv of duplicates
<file1> <file2> = diff just those two files, using -v if present
(other options ignored if files are specified)
-p = only valid when files specified; prints the
hex of the actual fingerprint of each file.
-g = GUI mode: produce only machine-friendly output.
- perl - as with everything on this page
- ImageMagick - library for manipulating images
- PerlMagick (Image::Magick) - Perl interface to above
- pwd, find, sort, tput (curses), file
(i.e. if this works right under NT I'd be surprised)
- A bunch of pictures of which you've totally lost control
- (optional) GQView - to manage collections of duplicate images visually
[2000/05/19 23:30] This perl module is very primitive right now, but
basically if you're running Linux and put a data CD in the drive, this
will allow you to get the CD title (CD::Info::cdtitle()) or
other info which basically means number of tracks (%info =
CD::Info::cdinfo()). I guess I'll submit it to cpan if I ever
make it do more, like navigate multisession CD's (which I myself never
If anyone objects to me using a new perl module namespace (CD), please
provide a suggestion. I really have no idea what existing category
this would fit under. It is currently OS specific, but there's no
Linux category and anyway I hope it won't be OS specific forever.
[2001/03/01 21;55] Due to PerlQt being apparently
broken under QT2.2 and KDE2.x, and due to my own inability to debug
Perl bindings against C++ libraries, kcdfind is pretty much dead at
this point. I'm looking at alternatives, such as writing a converter
to migrate existing cdcat files to another cataloging program (and
patching that program to use the cd label when it exists) or writing a
new gtk-based front end to cdfind. Sorry for any inconvenience this
may cause the 2 other users of kcdfind ;)
[2000/05/20 18:41] Kudla's CD Finder is a PerlQt CD catalog
app, as well as having a commandline version. It is still in its
early stages and should be considered unstable, though on my machine
it works great. ;)
[2000/05/22 23:30] You can download version 0.10 which includes both
kcdfind and cdfind. Here is a screenshot as of this evening.
Basically it does the same sort of thing every other cd catalog
program does (scan CD-ROM's, save info on all the files, let you
search for files) but no Linux-based CD cataloger that I could find
would use the CD title, which my old (windoze based) cataloger relies
on (it really saves a lot of confusion and typing.) Oh yeah, at the
moment it probably only works under Linux, because CD::Info does ioctl
stuff that I assume is not portable. If you can help me out with that
please let me know!
Note: Before running either kcdfind or cdfind for the first time,
type "touch cdcat" in the directory you're running it from. I'll fix
this in the next release.
If you only want to use the command line cdfind, you don't need:
- CD::Info - my CD info module.
- DBI - Perl database interface
use Mandrake 7.0, and this came in RPM form on the installation CD.
You should install it from your distribution's CD if you can.
- DBD::CSV - CSV file driver
This allows the program to save information in a database without
needing you to set up a database server. As a result the program is
slower when you have a big database, but the security hassles of
running a database server aren't there.
- SQL::Statement -
required by DBD::CSV.
- Text::CSV - also required by DBD::CSV.
- PerlQt - Perl interface to Qt widget set.
- Qt - if you have KDE, you should have this already.
Rob's perl programming
page, March 2001, firstname.lastname@example.org