Now appearing nightly, in the nightly Mozilla builds, the Open Source community is very proud to present a very special feature: Naive Bayesian spam filtering!
And you’re probably wondering why I’m excited about something as boring-sounding as that. Don’t worry. I’m no less sane than I was yesterday and I’ll prove it.
Bayes’ Rule is a method of pattern recognition. You tell it what is spam and what isn’t and over time it learns how to recognize what is and isn’t spam. Click here for an explanation of what it is and why it works.
Its main selling point is that when implemented properly and trained thoroughly, Bayesian filtering is very effective at identifying spam and produces nearly zero false positives.
So I excitedly downloaded and ran the Nov. 14 Mozilla nightly build. The filtering doesn’t presently filter, it only marks the messages as spam and non-spam. That’s OK, I can sort them and then zap them myself for a while. I trained it on about 1,400 non-spam messages (I only had a few dozen spams). It doesn’t identify much spam yet, but I’ve had zero false positives. It recognizes my most incessant spammer, the Smartmall Success Group (Kevin Butthead, take your Amway-meets-ecommerce scheme and stick it. I’m much more interested in joining the mafia.) and it’s starting to recognize unsolicited credit card spam.
Spam normally irritates me. Really irritates me. But now it’s a game. I look forward to spam coming in to see if Mozilla recognizes it. And it’s encouraging to watch it learn and get better. I’m going to win this battle. Within a month, I expect that time I waste deleting spam and making sure I didn’t delete anything important will be free for me to do something else with it. Like answer legitimate mail from people I’ve never heard of.
Some people argue this filtering belongs on the server, but not everyone is willing to filter spam on the server. My employer never will (because many of my employer’s departments engage in questionable e-mail practices themselves) and I’d be shocked if my ISP ever did. I can set up my own mail server, but this is a lot easier. It’s probably a lot easier for you too, even if you’re one of the half-dozen or so experienced Unix sysadmins who regularly read these pages.
If you’re like me and you have 1,000+ e-mail messages squirreled away somewhere, and you don’t mind playing with alpha-level code (which you don’t if you’re running Windows, since Microsoft is in the habit of shipping alpha code and charging you hundreds of dollars for the privelige of alpha- and beta-testing it for them), go get this thing. Start training it. And watch the spam go bye-bye.
And if you’re better than me about cleaning out your inbox, get it anyway. It’ll just take you longer to train it.