Pour améliorer le filtrage des spam avec spamassassin, il faut pouvoir ajuster le score des tests qui déclenchent le plus souvent sur les mails non filtrés. Il suffit alors de placer tous les spams loupés par spamassassin (qui tombent donc dans votre INBOX) dans un dossier particulier de votre boite mail, puis lancer un petit script qui va analyser tout ça:
use Email::Simple; opendir DIR, $ARGV[0]; my @files = grep { $_ !~ /^\./ } readdir(DIR); closedir DIR; $/=undef; my %occ; my $cnt = 0; foreach my $file (@files){ open FILE, $file; my $content = <FILE>; close FILE; my $mail = Email::Simple->new($content); my $spam = $mail->header("X-Spam-Status"); $spam =~ m/tests=(.*)/; my @tests = split /,/, $1; foreach my $test (@tests){ $occ{$test}++; } $cnt++; } foreach my $test (sort { $occ{$a} <=> $occ{$b} } keys %occ){ my $prc = sprintf "%.2f", $occ{$test}*100/$cnt; print "$test: $prc\n"; }
On lance maintenant ce script en donnant notre répertoire comme argument, exemple:
cd ~dani/Maildir/.INBOX.missed_spam/cur perl ~dani/parse_sa_tests.pl ./
Le résultat devrait ressembler à quelque chose comme ça (tests positifs par ordre de fréquence, et le % de spam concernés)
HTML_IMAGE_RATIO_06: 3.85 URIBL_GREY: 3.85 LOTS_OF_MONEY: 3.85 DKIM_ADSP_CUSTOM_MED: 3.85 SUBJECT_NEEDS_ENCODING: 3.85 URIBL_BLACK: 3.85 HTML_SHORT_LINK_IMG_2: 3.85 PYZOR_CHECK: 3.85 SPF_NEUTRAL: 3.85 MONEY_FORM: 3.85 MIME_HTML_MOSTLY: 3.85 T_FILL_THIS_FORM_SHORT: 3.85 DATE_IN_PAST_12_24: 3.85 DNS_FROM_AHBL_RHSBL: 3.85 MIME_BASE64_BLANKS: 3.85 UNPARSEABLE_RELAY: 3.85 SUBJ_ALL_CAPS: 3.85 SUBJ_ILLEGAL_CHARS: 3.85 SPF_PASS: 3.85 REMOVE_BEFORE_LINK: 3.85 URIBL_JP_SURBL: 3.85 BAYES_50: 3.85 FILL_THIS_FORM: 3.85 FREEMAIL_REPLYTO: 3.85 HTML_IMAGE_ONLY_16: 3.85 BAYES_05: 3.85 HTML_IMAGE_RATIO_08: 3.85 FREEMAIL_FROM: 7.69 T_REMOTE_IMAGE: 7.69 MIME_QP_LONG_LINE: 7.69 FROM_EXCESS_BASE64: 7.69 DATE_IN_PAST_06_12: 7.69 HTML_IMAGE_RATIO_02: 7.69 BAYES_99: 11.54 T_DKIM_INVALID: 11.54 MSGID_FROM_MTA_HEADER: 11.54 DKIM_VALID_AU: 15.38 SPF_FAIL: 19.23 DKIM_VALID: 23.08 HTML_MIME_NO_HTML_TAG: 38.46 T_KHOP_FOREIGN_CLICK: 38.46 MIME_HTML_ONLY: 42.31 DKIM_SIGNED: 42.31 SPF_SOFTFAIL: 61.54 RCVD_IN_DNSWL_MED: 96.15 HTML_MESSAGE: 96.15