Librarium Whitehat
Blog Pics
"Inveniam viam aut faciam" : I will either find a way, or I shall make one


Spammers are nothing if not adaptable. One of the new tricks is image spam, embedding text or messages into a graphic on the email. Normal text checkers will of course not find anything, so something new is needed. Well if you are using spamassassin, it is possible to implement and OCR based scanning mechanism. So it can "view" graphics and decide on whether it is spam or not. Lets take a look at how..

Well, linux first off. A recent copy of spamassassin (>=3.1.14), some perl modules, netPBM , Giflib, ocrad, Gifsicle, gocr and finally fuzzyocr. Looks kinda daunting don't it? It really is not that bad, lets start with the perl stuff..
#perl -e shell -MCPAN
>install Digest::MD5 String::Approx Time::HiRes Log::Agent MLDBM::Sync

then lets do netPBM (get here)..
tar -xzvf ./netpbm-10.26.37.tgz
configure (answer questions)
make package pkgdir=/usr/lib/netpbm

then lets get both the giflib programs (get here)..
tar -xzvf ./giflib-4.1.4.tar.gz
cd giflib-4.1.4
make install

tar -xzvf libungif-4.1.4.tar.gz
cd libungif-4.1.4
make install

now ocrad (get here)..
tar -xjvf ./ocrad-0.16.tar.bz2
cd ocrad-0.16
make install

now gifsicle (get here)..
tar -xvzf ./gifsicle-1.46.tar.gz
cd gifsicle-1.46
make install

(almost there) then gocr (get here)..
./configure --with-netpbm=/usr/lib/netpbm
make install

lastly we do fuzzyocr (get here)..
tar -xzvf fuzzyocr-3.5.1-devel.tar.gz
cd FuzzyOcr-3.5.1/

Well we need to get spamassassin to use fuzzyocr so..
cp -r ./ ./FuzzyOcr.scansets ./FuzzyOcr.preps  ./ ./FuzzyOcr /etc/mail/spamassassin
cp ./FuzzyOcr.words /etc/mail/spamassassin

This will copy the needed files to the spamassassin directory so when you restart spamassassin it will use the new checks. But before you restart lets finish configuring fuzzyocr, edit /etc/mail/spamassassin/
- Make sure that you specify a writable file as a logfile, The log level can be specified with the focr_verbose option.
- Make sure that you specify a correct file as global wordlist.
- Set the following and put in the correct paths for your system..
focr_enable_image_hashing 2
focr_db_hash <full_path_to_file>
focr_db_safe <full_path_to_file>
focr_db_max_days <number_of_days>
focr_threshold 0.12
focr_base_score 2
focr_add_score 0.375
focr_wrongctype_score 0.7
focr_wrongext_score 0.7
focr_corrupt_score 1.5
focr_corrupt_unfixable_score 2.5

Lastly, restart spamassassin, and then watch the mail log file, you should start seeing the new checks happening.

Final Words
Checking for image spam is a very good thing to do these days, but it is a resource intensive process, once it is running, monitor your server to see if it can handle it, if it needs to be tweaked more, or even if it needs to be removed. Remember, having the image spam checker crash your email server is no good. So as always have fun and learn.