Using your logs to help track down spammers and trolls

It seems like lately we’ve been talking more on this site about trolls and spam and other troublemakers than about anything else. I might as well document how I went about tracking down two recent incidents to see if they were related.
WordPress and b2 store the IP address the comment came from, as well as the comment and other information. The fastest way to get the IP address, assuming you haven’t already deleted the offensive comment(s), is to go straight to your SQL database.

mysql -p
[enter the root password] use b2database;
select * from b2comments where comment_post_id = 819;

Substitute the number of your post for 819, of course. The poster’s IP address is the sixth field.

If your blogging software records little other than the date and time of the message, you’ll have to rely on your Apache logs. On my server, the logs are at /var/log/apache, stored in files with names like access.log, access.log.1, and access.log.2.gz. They are archived weekly, with anything older than two weeks compressed using gzip.

All of b2’s comments are posted using a file called b2comments.post.php. So one command can turn up all the comments posted on my blog in the past week:

cat /var/log/apache/access.log | grep b2comments.post.php

You can narrow it down by piping it through grep a bit more. For instance, I knew the offending comment was posted on 10 November at 7:38 pm.

cat /var/log/apache/access.log | grep b2comments.post.php | grep 10/Nov/2003

Here’s one of my recent troublemakers:

24.26.166.154 – – [10/Nov/2003:19:38:28 -0600] “POST /b2comments.post.php HTTP/1.1” 302 5 “https://dfarq.homeip.net/index.php?p=819&c=1” “Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031007 Firebird/0.7”

This line reveals quite a bit: Besides his IP address, it also tells his operating system and web browser.

Armed with his IP address, you can hunt around and see what else your troublemaker’s been up to.

cat /var/log/apache/access.log | grep 24.26.166.154
zcat /var/log/apache.access.log.2.gz | grep 24.26.166.154

The earliest entry you can find for a particular IP address will tell where the person came from. In one recent case, the person started off with an MSN search looking for information about an exotic airplane. In another, it was a Google search looking for the words “Microsoft Works low memory.”

You can infer a few things from where a user originally came from and the operating system and web browser the person is using. Someone running the most recent Mozilla Firebird on Linux and searching with Google is likely a more sophisticated computer user than someone running a common version of Windows and the version of IE that was supplied with it and searching with MSN.

You can find out other things about individual IP addresses, aside from the clues in your logs. Visit ARIN to find out who owns the IP address. Most ARIN records include contact information, if you need to file a complaint.

Visit Geobytes.com IP Locator to map the IP address to a geographic region. I used the IP locator to determine that the guy looking for the airplane was in Brooklyn, and the Microsoft guy was in Minneapolis.

Also according to my Apache logs, the guy in Brooklyn was running IE 6 on Windows XP. The guy in Minneapolis was running Mozilla Firebird 0.7 on Linux. (Ironic, considering he was looking for Microsoft information.) It won’t hold up in a court of law, but the geographic distance and differing usage habits give at least some indication it’s two different people.

What to expect around here

I’m still not recovered, but I expect to be on my way. The doc put me on some prescription meds. Which reminds me: The mafia My health insurance company seems to have changed prescription providers YET AGAIN, and I missed my card in the mail. What is this, flavor-of-the-week?
It’s incredibly messed up when it’s easier to get your new license plates than it is to get a bottle of Amoxicillin.

So I’m torqued off right now.

As far as the recurring problems with spammy comments and trolls, I’m fed up with it. I appreciate the people like Dustin Cook and, yes, that arrogant French aristocrat, for telling the most recent one to shove off. But that’s not a permanent solution.

I’m looking at another piece of software that can be set to require commenters to be registered users–if you want to comment, you’ve got to give a username and password. I hate that. I really do. I don’t want people to have to go through the hassle. I don’t want people wondering what else will happen with their e-mail addresses, which I will require. (The answer is, nothing, because I hate spam more than I hate taxes, but the general public doesn’t know that.) Unfortunately, it seems to be the only way to reduce the trolls and stop the spam.

As far as Railroad Tycoon 3, due to my recent sickness I’ve only been able to play two short games. It’s not a radical departure from Railtycoon 2. The economics are a bit different (and far more realistic) and the graphics are a whole lot better, and overall the game is a lot more realistic now. I can safely say I recommend it. They set the requirements at 400 MHz, 128 MB of RAM, and a 16-meg AGP video card. I played on a 366 with 128 megs and a 16-meg Radeon 7000 video card. It was acceptable. You could probably get by with a 300 MHz machine with the same memory and video card, but there’ll be times when you’ll want more horsepower. 500-600 MHz would definitely be more comfortable.

Brightmail, plus voice recognition

Brightmail update. I promised an update earlier (or at least I implied one) on Brightmail, the free (for private use) spam filtering service at www.brightmail.com. They’ll of course gladly sell your business spam filtering tools–that’s the point of their free service: Get you hooked, so you go tell your boss about it and they get some business.
At any rate, early on it was awful, making me wonder if the volume of spam it blocked was worth the trouble of signing up and then reconfiguring my mail client. Lately, however, it’s gotten much better. Last week it saved me from deleting e-mail offering me a free pager, how to find out anything about anyone, viagra, making $2-$300 a day, making what I’m worth (whatever that is), FWD: Check this out!!!, attention homeowners!, and some cable-stealing scheme. (It sends you a weekly summary, just in case it deleted something legit. The forward sounded like it could have been, but it wasn’t from anyone I know. Some guy named Dave Yaprak, who, as a spammer, should be forced by the rest of us Daves to cease using our name because he’s proven himself unworthy of such a cool name.) During that same time frame, two spams got through: One telling me I can double my money in three months by investing in the Yen, and another offering to sell me 15 million e-mail addresses. So it blocked 78% and gave me something to write about. Good deal.

While I still think it’s too early to deem Brightmail a must-have (I’ve been using it for just under a month now) it does seem to be more effective, and a lot less trouble, than any other anti-spam measure I’ve taken in the past. SpamCop does very little good; and as soon as I talked about Bounce Spam Mail ridding me of that blasted used computer broker that invaded Thompson’s site, they sent me something. It’ll be interesting to see if Brightmail finally rids me of them (I never opt out of spam, because that’s honest-to-goodness verification that I read the account, which makes my address even more valuable to sell to others).

“I never feed trolls and I don’t read spam.” –Weird Al Yankovic

———-

From: “Frank McPherson”

Subject: Recognition software

I don’t do much with voice recognition, but I certainly work with a lot of handwriting recognition. I think that with any recognition software the user eventually changes their style to accommodate the software. That is, the software may perform at 95% recognition and then the user changes what they do to get the remaining 5%. That’s how you get to a higher rate over a period of time.

Frank McPherson, MCSE

———-

You’re certainly right about that. And in the case of a writer with voice recognition, that change isn’t necessarily a good thing. I guess the test is whether an editor notices the difference. I know what my writing is supposed to “sound” like, and if it’s different, I’m not happy.

I just got back from a three-hour editing session, with someone who has a very different philosophy and style of editing, so I’m very much tuned in to the ways of writers at the moment. Anything that damages the integrity of the author’s original thought is a problem, from my point of view.

Maybe I’m just too creative or too perfectionistic for my own good. But we’ll see how the Dragon pans out–I’m very willing to give it a shot.