Promising ways to fight spam

Increase your breast size in weeks!
Yeah, obviously I’m real interested in that kind of e-mail. I’m sure you are too. Even if you happen not to be male.

I read an article on Slashdot on Friday about an interesting approach to spam. Essentially, it uses artificial intelligence. You get two delete buttons: One that says, “Delete this spam!” and one that says, “Delete this, but it’s not spam.” Based on your answers, it figures out what’s spam and what isn’t. For example, nobody I correspond with regularly has ever talked to me about breast size. Male or female. Strange how none of the women I correspond with ever bring that up, isn’t it? Come to think of it, that’s almost as strange as how none of the men I correspond with do.

The problem with spam is that people tend to define it a little bit differently. It’s kind of like trying to define pornography. To some people, the Sports Illustrated swimsuit issue is pornography. To others, the swimsuit issues of 20 years ago aren’t, but recent ones are.

Fortunately, it’s easier to analyze text than photographs.

Words like “toner” and “breast” and “sex” and “sexy” appear in spam a lot. Words like “bring” are a lot less likely, but very likely to appear in personal correspondence. So, based on the number of highly likely words and highly unlikely words that appear in a given piece of mail, it determines whether a piece of mail is spam.

I like this because it could conveivably have some applications beyond spam. I hate spam, but I hate mail forwards nearly as much. I believe there are precisely 33 of those cutesy mail forwards and send-backs out there, and I’ve seen them all several times, but people continue to insist on sending them to me. Usually with the preface, “I know you hate these and I usually don’t send you stuff like this, but…” Never mind the likelihood that since literally thousands of people have my e-mail address, someone’s already sent it to me.

Some people love those kinds of things, so filtering them for those people wouldn’t necessarily be appropriate (unless you’re that person’s boss and you want them to stop wasting time). But if I could filter them, I’d have a whole lot more time.

The researcher quoted on Slashdot claims only 5 of 1,000 pieces of spam get through, with zero false positives. Very nice.

So I can’t wait for a mail client to become available that uses Bayes classification (the technology used here). You’re probably asking where you can see this in action. I wish I knew.

Meanwhile, though, someone mentioned Cloudmark, a free service which appears to use checksums to identify spam, maintaining a large distributed P2P database of checksums. They claim 75% accuracy.

It was closer to 50% for me when I tried it on my work e-mail, but I reported each piece of missed spam, so that might help it in the future. The more people who use it, the better it’ll get. Individual Bayes classification is better, since it’s based on what I don’t want to read, which might vary slightly from what the masses don’t want to read, but it’s better than nothing. It saves me some time and lowers my blood pressure.

If you have the misfortune of using Outlook for e-mail, give Cloudmark a look. For once, Outlook will do something good for you. Being free, I don’t expect it to be around forever, but we might as well use it while we’ve got it.

7 thoughts on “Promising ways to fight spam

  • August 17, 2002 at 12:42 am
    Permalink

    I think your first client to use the AI spam filtering is coming out August 24 from Apple. If I recall what I’ve heard correctly, the built-in Mail.app in OS X 10.2 uses this technology.

  • August 17, 2002 at 1:02 am
    Permalink

    I came across a white paper not too long ago, published by Microsoft in collaboration with an industry heavyweight (IBM?) and university researchers (names escape me at the moment). No, this wasn’t a Microsoft PR whitepaper; this was actually hard research into using Bayesian filters to block spam. As I recall, the paper was published several years ago, so I’d expect that MS will continue current trends and eradicate third-party efforts by including similar heuristic filtering in a version of Outlook coming soon. See ya, Cloudmark.

    Oh, and Dave: I’ve got a new chicken breast recipe for you. 🙂

  • August 17, 2002 at 1:19 pm
    Permalink

    Dave,

    You don’t talk about laser printer “toner” cartridges much, but most of your fans feel that you usually keep a”breast” of most current computer hardware and software…

  • August 17, 2002 at 1:48 pm
    Permalink

    A package I’m testing now (while “suffering” under Outlook) is called “IHateSpam” from Sunbelt-Software. It seems to have pretty good mechanisms for telling it a white list of senders (it will scan your addressbook and then also scan folders for senders). You can add whole domains.

    So far it’s nailed all spam I’ve tested it with; I don’t get huge amounts but it is starting to get annoying. My work account gets more so I can always test against it. Actually, I could test against our Exchange admin account which gets all the stuff to people who are gone – that would really test it out.

    So far, having used this for a day, I’m finding that it stops about 90% of the spam at home and work. It’s trapped a couple of newsletters of mine which I just quickly told it “Are not spam” and added them to my white list in the future.

    Cost is $19.95 for right now, $15/year after the first year. Normally $29.95 for purchase. This maintenance is worthwhile since it uses a database to help spot spams which is kept on one of their servers.

    Sunbelt has a very good reputation as a supplier of tools to network admins. This is one of the first tools they’ve created which might appeal to a wider audience, although they are clearly going the direction where a company might buy a license for it and roll it out on all their desktops.

    This program is MUCH better than a plain keyword-based system. We have one of those at work; we use a product called Mailwatch. Each word is assigned a weight, and we set a threshold to block a message if its keywords add up to more than a certain amount. It doesn’t work; if we set the threshold low I spend my day letting through legit messages talking about things like screws and about people named Dick. But we use Mailwatch because it spots incoming viruses and strips them and we block all incoming executible attachments.

  • August 20, 2002 at 9:33 am
    Permalink

    Check out http://www.garyarnold.com/projects.php#bayespam it is a perl filter written for qmail. The perl could possible be modified to run on any email server, however, I would not know how to do that. Just thought the link would be usefull.

  • August 23, 2002 at 1:41 pm
    Permalink

    Thanks for the link. I’m half-tempted to set up qmail on a machine just so I can try it. http://qmailtheeasyway.com is your friend…

  • November 20, 2003 at 1:12 pm
    Permalink

    Can you email me enough on how you eradicate spams

Comments are closed.

%d bloggers like this:
WordPress Appliance - Powered by TurnKey Linux