I have seen the future, and it works!

Last Updated on April 16, 2017 by Dave Farquhar

Now appearing nightly, in the nightly Mozilla builds, the Open Source community is very proud to present a very special feature: Naive Bayesian spam filtering!
And you’re probably wondering why I’m excited about something as boring-sounding as that. Don’t worry. I’m no less sane than I was yesterday and I’ll prove it.

Bayes’ Rule is a method of pattern recognition. You tell it what is spam and what isn’t and over time it learns how to recognize what is and isn’t spam. Click here for an explanation of what it is and why it works.

Its main selling point is that when implemented properly and trained thoroughly, Bayesian filtering is very effective at identifying spam and produces nearly zero false positives.

So I excitedly downloaded and ran the Nov. 14 Mozilla nightly build. The filtering doesn’t presently filter, it only marks the messages as spam and non-spam. That’s OK, I can sort them and then zap them myself for a while. I trained it on about 1,400 non-spam messages (I only had a few dozen spams). It doesn’t identify much spam yet, but I’ve had zero false positives. It recognizes my most incessant spammer, the Smartmall Success Group (Kevin Butthead, take your Amway-meets-ecommerce scheme and stick it. I’m much more interested in joining the mafia.) and it’s starting to recognize unsolicited credit card spam.

Spam normally irritates me. Really irritates me. But now it’s a game. I look forward to spam coming in to see if Mozilla recognizes it. And it’s encouraging to watch it learn and get better. I’m going to win this battle. Within a month, I expect that time I waste deleting spam and making sure I didn’t delete anything important will be free for me to do something else with it. Like answer legitimate mail from people I’ve never heard of.

Some people argue this filtering belongs on the server, but not everyone is willing to filter spam on the server. My employer never will (because many of my employer’s departments engage in questionable e-mail practices themselves) and I’d be shocked if my ISP ever did. I can set up my own mail server, but this is a lot easier. It’s probably a lot easier for you too, even if you’re one of the half-dozen or so experienced Unix sysadmins who regularly read these pages.

If you’re like me and you have 1,000+ e-mail messages squirreled away somewhere, and you don’t mind playing with alpha-level code (which you don’t if you’re running Windows, since Microsoft is in the habit of shipping alpha code and charging you hundreds of dollars for the privelige of alpha- and beta-testing it for them), go get this thing. Start training it. And watch the spam go bye-bye.

And if you’re better than me about cleaning out your inbox, get it anyway. It’ll just take you longer to train it.

If you found this post informative or helpful, please share it!

15 thoughts on “I have seen the future, and it works!

  • November 15, 2002 at 7:39 pm
    Permalink

    For some reason, I agree with the whole concept of getting interested with dealing with spam. I’ve been running spamassassin since the version 1.0 days so only 2-3 get through a day but still….

    I’ll give this a try – I’ve got 9 months worth of spam saved up (all in the lovely procmail-filtered spamassassin found spam folder) and new developments are always interesting.

    I’ll try it running concurrently with spamassassin.

  • November 19, 2002 at 11:48 pm
    Permalink

    if you are using pop3 mail, http://popfile.sourceforge.net

    a platform independent naive Bayes spam filter that works with ANY pop mail client, on any platform (including windows, and anything else you can get a perl for.)

    Good stuff.

    RIck

  • September 17, 2003 at 11:51 pm
    Permalink

    Hey Dave, you idiot…..

    The Smartmall Success group does not spam anyone. You signed up at my smartmall site and received email from an autoresponder. That is the ONLY way you could receive email from us. If you no longer wished to receive the email, you should have followed the instructions and unsubscribed.

    Learn to follow directions and stop whining.

  • September 18, 2003 at 2:33 pm
    Permalink

    Kevin is obviously a tryo spammer, and is plainly devoid of some cranial matter, much less common sense. This would explain his use of a Yahoo mail account, as opposed to having mail in his own domain, such as KevinButthead.com.

    Only a fool would respond to Kevin’s request to “unsubscribe” from a site to which they never subscribed in the first place. Perhaps you have missed a dose of your medication, Kevin? I, of course, do not have to worry about spam, since my mail is filtered beautifully by my Microsoft Exchange server. Regardless, you stultify yourself with your comment.

  • September 18, 2003 at 3:14 pm
    Permalink

    No Jacques, you are the one that is “plainly devoid of some cranial matter, much less common sense”. In fact, it is senseless for you to assume anything without any of the facts. You are obviously uneducated in how an autoresponder works. Once a person unsubscribes, that is it. Period. Spam is not tolerated. Abuse of the autoresponder results in termination of your account as well as your account with the mall program. I do not send spam. I don’t like receiving either.

    Once again, you can only receive email if you subscribed at the site.

    There is nothing wrong with using a Yahoo mail account. You should wake up to the fact that Yahoo has a filtering system which sends spam to a bulk mail file. Apparently you should use JacqASS.com.

  • September 18, 2003 at 6:32 pm
    Permalink

    Well Kevin, I can tell you exactly three web sites I’ve visited and, as you slimeballs like to put it, opted in to receive, um, offers and stuff. That would be, if you must know, Compgeeks.com, Directron.com, and Softwareandstuff.com. The handful of times I have “unsubscribed” from spammers’ lists, I, like most people I know, got more spam.

    Now maybe someone did visit your site and punch my e-mail address in. I don’t know or, frankly, care. All I know is it wasn’t me. Not that it bothers me anymore–your pyramid schemes just get filtered along with the limited-time offers to enlarge various parts of my anatomy. If you are legit, maybe you should have some kind of confirmation. But that’s not my problem anymore. I must say, since I started using Bayes filters, I haven’t even thought of you.

    Normally I’d delete your comment, but I think I’ll let it stand. It shows your true colors. I’ll have to ask the girl down the hall who works in marketing, but I don’t think starting correspondence off with “Hey [name], you idiot,” is good for business. Just a hunch I have.

  • September 19, 2003 at 12:44 am
    Permalink

    I suppose then, Kevin, that it is unwise to doubt the sincerity of one who spends 20 minutes per day working at home while “the money just pours in”? David may be many things, but even he would not willingly become a pawn in foolish schemes. I must confess that given the choice between believing a Scot or one who touts the spam-filtering capabilities of plebian, free email but denies his manifest complicity in the spam – oh, pardon, the “someone must have signed you up for unsolicited email” – problem, I must believe the Scot.

    Being obviously deficient in recognizing the bounds of reasonable discourse and rational thought, your desperate attempt to gain money through scurrilous means is not unexpected. You have my best wishes that you will not starve waiting for all of that free money you’ve “earned”.

  • September 19, 2003 at 2:15 am
    Permalink

    Hello Dave,

    I will preface my post by stating I hope to discuss this matter without the name calling. I came across your site while searching at Yahoo to check on the ranking of my Smartmall ad. I saw the following statement next to your headline:

    “It recognizes my most incessant spammer, the Smartmall Success Group ([name] Butthead, take your Amway-meets-ecommerce scheme and stick it”

    Upon reading your initial post in this thread, your readers will clearly see that you started the name calling, not me. I just responded to you in a similar fashion.

    I do NOT actually work for smartmall. I signed up at another member’s site. I then formed a group of members that signed up at my site.

    The following info may provide an explanation as to why people receive spam from some mlm affiliates.

    Smartmall has an advertising “co-op” which provides participants with leads (people that sign up at our sites) Some members who used the co-op received spam complaints. I dicussed your complaint with a member today and he is certain you were a “co-reg” lead (A lead/person whos info is simply uploaded to the dbase through the co-op.) I became aware of co-reg leads a few months ago, but that was after I stopped using the co-op. I also called corporate and was told that they did not use co-reg leads.

    When I received someone’s info, they automatically received a series of messages from my autoresponder. I received only two spam complaints. However, I was able to prove that they DID sign up at my site because my autoresponder recorded their IP address, date and time they opted in. This was not the case for co-op placements. I stopped using the co-op because very few actually upgraded. Plus, my own advertising produced a much better upgrade ratio.

    Some companies use bulk traffic from online gambling and free sweekstakes sites such as this company: http://cashbreak.com/advertise.asp

    Here is a similar co-op:
    http://www.emmleads.com

    This is not a pyramid scheme. Pyramid schemes are programs that have downlines without a product. They are also illegal. If this was indeed a pyramid scheme, Smartmall would have been put out of business.

    If you don’t like the program, that is fine. That is why I advertised on search engines. Only those that are interested signed up. However, I want to assure you that I didn’t know I was spamming anyone. I will be bringing this to the company’s attention again.

  • September 19, 2003 at 9:58 am
    Permalink

    Hello Kevin,

    Yes, I did initially call you a butthead. I’m not too worried about what people will think about that. Most people, when they start getting unsolicited e-mail, say or think worse things about the sender or person claiming to be the sender.

    The fact is, your mail started showing up at my work account. I’ve always used that account for work-related correspondence and the occasional mail to close friends. There’s no way I could have visited your Web site from work, because we use a filter (Websense) that blocks our employees from visiting such sites. And when I want to make some extra money, I write a magazine article or ask around until I find someone who needs a computer repaired. If I truly wanted to run my own business I wouldn’t need anybody else’s co-op program in order to do it.

    Honestly, I don’t know when the e-mail from you started and if or when it stopped. This particular post is nearly a year old.

    I work in the service industry and occasionally clients will start a name-calling war or level an accusation of sabotage or something else. I look into whatever potential issue there was and move on. I appreciate you looking into whether you might have been unintentionally sending spam. Your name-calling doesn’t bother me. My readers (and they number in four digits) know I can follow directions, and they know Jacques Pierre….de la Stenche isn’t a nice person.

    One of my good friends was recently victimized by a professional spammer, so I know that happens. You could be one of those. Or you could be an Alan Ralsky. Your initial response definitely suggested the latter. Your most recent response suggests the former.

    But whatever the case, suffice it to say I’m not interested in your business and if you are e-mailing me you’re wasting your time, because that mail’s getting filtered and I never see it, and I haven’t seen any of it since sometime last November. Honestly, up until yesterday when Jacques…de la Stenche’s alter ego brought this thread to my attention, I didn’t even remember you.

  • September 19, 2003 at 10:55 am
    Permalink

    Kevin, if “co-reg lead” is just another term for “someone who had their info uploaded”, Dave’s issue is still unaddressed. If I can sign someone else up for the Smartmall service WITHOUT a confirmation process (i.e. send an email to the potential participant to verify that THEY signed up), then Smartmall is, at a minimum, negligent. If this is the case, they need to change their policies. In any case, I know Dave would never sign up for anything that, frankly, comes across as a get-rich-quick, “siphon off the profits while sitting at home in your underwear drinking beer” swarm marketing scheme at best, and a spam farm at worst.

    As for the name calling, didn’t your mother ever ask you “if Johnny decided to jump over a cliff, would you do it, too”? Not much of a defense there, dude. Granted, I don’t think Dave would have used that tact if a) he thought “Kevin” really existed (see “smells like a spam farm” above) or b) he expected you to actually visit his site. Still, as in your last comment, your side of the story comes across much better when you don’t sound like an angry telemarketer.

  • September 19, 2003 at 6:17 pm
    Permalink

    Dave,

    I do share the frustration towards spammers. So, in hindsight, I can actually understand the butthead comment. But I am not a professional spammmer. Until yesterday, I was certain you were someone that merely forgot he signed up. There are those who waste their time signing up for various programs and cry spam later down the road. It is a two month campaign so you haven’t received any more email from my autoresponder since last year.

    I will say that I have learned a lot about spam and spammers over the last couple of days. I’m sure I will get a lot of “I told you so’s” from members who complained about the co-op. Due to the recent spam laws, the some of this info is important for anyone who uses a lead source.

    I use proautoresponder. They have banned certain lead companies due to spam complaints.
    There is no way a professional spammer could use this autoresponder and get away with it. An autoresponder will usually have the company’s name on or near the unsubscribe button.
    Professional spammers use their own AR software. You will usually see something along these lines at the bottom of their email:

    “unsubscribe from this mailing list: click here
    or send a blank to: Email address@ send you more spam.com”
    OR…
    “Reply to this message with the word “remove” in the subject line.”
    …..And we know what happens should you follow those directions……
    This is not the case with autoresponder companies I have come across.
    Proautoresponder has a strict anti-spam policy:
    http://proautoresponder.com/spamplcy.htm

    Steve: Your analogy would be appropriate for someone that blindly follows someone’s actions. but most people will not reply politely if you call them names and tell them to stick it. But again, I understand why Dave did it.

    We will have to agree to disagree on the name calling and move on.

    The email that Dave received contained my name and phone number. I even state that I AM a real person and ask the enrollee to contact me. I understand that Dave did not read the email since he felt I was spamming him.

    I never promote anything as “get rich quick”. I usually shy away from these kind of programs. But I joined because I shop online and I saw it as a shoppers club combined with MLM. People can shop and save without paying a fee or working the business side at all. Don’t assume that I am someone that enjoys taking advantage of people.

    Smartmall recently sent an update stating that they improved the quality of the co-op. I wouldn’t know because I no longer use it.
    Unfortunately, anyone who uses co-reg leads and follows up with them, unknowingly spams. Members who purchase these leads are led to believe that these are indeed people that sign up at their site. But in reality, they are wasting their money on people who never signed up and in some cases, bogus names.

    There were a couple of members that considered a class action lawsuit because of what they were sold through the co-op.

  • September 21, 2003 at 2:23 am
    Permalink

    Hello Dave,

    I am a friend of Kevin’s and like many other unsuspecting marketers, I also purchased co-reg leads. After some serious hate email, I cancelled my order!

    Kevin already has a successful downline, so I’m sure he doesn’t care that you didn’t join. The best part of marketing online is that we only have to correspond with interested prospects.

    Steve, You failed to see that it was Kevin who ceased with the name calling. Even after Dave continued with the slimeball remark. And dude, get real… You can’t go around calling people names and expect them to take the moral high ground. I can understand both sides.

    …Angry telemarketer? That is worse than butthead! lol!

    Anyway, I agree that it is very important that marketers understand about co-reg leads. Especially since there are severe penalties for people sending unsolicited email, which include heavy fines.

    I enjoy your site Dave. Keep up the good work!

  • September 21, 2003 at 7:35 pm
    Permalink

    Kevin, while this setup may be perfect legal, you’ve got a huge mountain to climb to with people like Dave and me. We inherently distrust any place that a) claims income streams from getting your friends to join that place and b) any place that has a whiff of “spaminess”. Obviously, Smartmall matches on both accounts. But Dave’s problem wasn’t so much with the service, just that he didn’t want anything to do with it, but *someone* wanted to bother him. And I’m sorry if you don’t think a place that promises six-figure incomes (potentially, of course) with minimal effort as “get-rich-quick”. I just don’t go for it. Too many similarities to fishy things. Good luck with it, though.

    Ron, yes, calling people names doesn’t help when you’re trying to have a polite conversation, and adds nothing logically. I won’t defend Dave’s slimeball comment, even though I know why he said. But I think my assessment of the *tone of Kevin’s initial post* – note, not Kevin himself – was accurate. Not to nitpick, but there *is* a difference. And as for moral high ground, well, morals aren’t conditional. Unless you’re the President, of course…

  • September 22, 2003 at 1:47 am
    Permalink

    Hey, no need to be sorry Steve. After all, I am succeeding.

    There are too many companies out there that are exactly as you describe. Smartmall does not promise six-figure incomes. And I always tell people that it DOES take effort.
    I was curious and skeptical at first, so I joined as a free member. I was fortunate enough to have signed up at someone’s site who was succeeding.

    People can shop online from their own site. The stores in the mall include major retailers such as Wal-mart Disney, Dell, Target, Sony, etc. They do NOT have to upgrade and work the business side. It is basically a shoppers club combined with MLM. I occaisionally buy things at the mall and usually get some good deals.

    The internet has made succeeding a lot easier and faster. You no longer have to pester your family and friends to join. Only people who ARE interested sign themselves up at my site. Some even upgrade with little or no follow up just like I did. I was surprised to find that people that came across my site called me long-distance to inquire about the program. If people don’t like what they see, that’s all right because others will. Because of the advantages of the internet, there are incentives available now that weren’t even thought of a few years ago.

    Compensation plans are a LOT fairer now too.

    The spaminess part of Smartmall,(The company co-op) was unecessary and it is the only part that I find questionable. I’m not sure exactly where they get the leads. They mentioned that they improved the quality of the leads in a recent newsletter. But as I stated above, I no longer use the co-op.

    The success I was having from advertising on search engines prompted me to start my own advertising co-op for my group. People first sign up at my gateway site and then they have to take the additional step of signing up again at either my site or one of my members’ sites. This system weeds out people who are uninterested.

    Therefore, the only other way Dave could have received my email campaign is if he were one of the co-op placements. And he obviously was.

    I didn’t know anything about Dave till I came across this site. And no, I didn’t want to bother him or anyone else for that matter. I have my hands full with people who ARE joining. A lot of people signed up at my site. I don’t have the time nor the desire to bother anyone who is uninterested as that would be counter-productive.

Comments are closed.