Increase your breast size in weeks!
Yeah, obviously I’m real interested in that kind of e-mail. I’m sure you are too. Even if you happen not to be male.

I read an article on Slashdot on Friday about an interesting approach to spam. Essentially, it uses artificial intelligence. You get two delete buttons: One that says, “Delete this spam!” and one that says, “Delete this, but it’s not spam.” Based on your answers, it figures out what’s spam and what isn’t. For example, nobody I correspond with regularly has ever talked to me about breast size. Male or female. Strange how none of the women I correspond with ever bring that up, isn’t it? Come to think of it, that’s almost as strange as how none of the men I correspond with do.

The problem with spam is that people tend to define it a little bit differently. It’s kind of like trying to define pornography. To some people, the Sports Illustrated swimsuit issue is pornography. To others, the swimsuit issues of 20 years ago aren’t, but recent ones are.

Fortunately, it’s easier to analyze text than photographs.

Words like “toner” and “breast” and “sex” and “sexy” appear in spam a lot. Words like “bring” are a lot less likely, but very likely to appear in personal correspondence. So, based on the number of highly likely words and highly unlikely words that appear in a given piece of mail, it determines whether a piece of mail is spam.

I like this because it could conveivably have some applications beyond spam. I hate spam, but I hate mail forwards nearly as much. I believe there are precisely 33 of those cutesy mail forwards and send-backs out there, and I’ve seen them all several times, but people continue to insist on sending them to me. Usually with the preface, “I know you hate these and I usually don’t send you stuff like this, but…” Never mind the likelihood that since literally thousands of people have my e-mail address, someone’s already sent it to me.

Some people love those kinds of things, so filtering them for those people wouldn’t necessarily be appropriate (unless you’re that person’s boss and you want them to stop wasting time). But if I could filter them, I’d have a whole lot more time.

The researcher quoted on Slashdot claims only 5 of 1,000 pieces of spam get through, with zero false positives. Very nice.

So I can’t wait for a mail client to become available that uses Bayes classification (the technology used here). You’re probably asking where you can see this in action. I wish I knew.

Meanwhile, though, someone mentioned Cloudmark, a free service which appears to use checksums to identify spam, maintaining a large distributed P2P database of checksums. They claim 75% accuracy.

It was closer to 50% for me when I tried it on my work e-mail, but I reported each piece of missed spam, so that might help it in the future. The more people who use it, the better it’ll get. Individual Bayes classification is better, since it’s based on what I don’t want to read, which might vary slightly from what the masses don’t want to read, but it’s better than nothing. It saves me some time and lowers my blood pressure.

If you have the misfortune of using Outlook for e-mail, give Cloudmark a look. For once, Outlook will do something good for you. Being free, I don’t expect it to be around forever, but we might as well use it while we’ve got it.