Finding and blocking an abusive host from your Apache log

My web site slowed to a crawl last night, my CPU usage soared to 100%, and my built-in security measures weren’t helping. I ended up having to do some old-school Linux sysadmin work to stop them.

I haven’t been an everyday sysadmin since 2009. But every once in a while I can still come off the bench and do this stuff.

This VPS in New York rolled my Apache logs over. I blocked him with an IP tables rule.

Finding the bad guy

My web server keeps its Apache logs in /var/log/apache2/other_vhosts_access.log. So I did this:

cat /var/log/apache2/other_vhosts_access.log | awk ‘{print $2}’ | sort | uniq -c | sort -n

You may have to adjust the $2 parameter depending on where your server puts IP addresses in the log. You’ll also need to change the filename to your server’s apache log file location.

I found a few web spiders, but some VPS in New York had accessed my site 85,433 times since my logs last rolled over. Umm, no. Yahoo’s web spider had accessed my site a mere 6,000 times in the same time period. 85,433 is unreasonable.

Blocking the bad guy with iptables

So I blocked him temporarily:

iptables -A INPUT -s x.x.x.x -j DROP

Substitute your bad guy’s IP address for x.x.x.x.

That dropped my CPU usage back down to the 20 percent range.

I’ll lose this rule the next time I reboot. This is only the second time I’ve noticed something like this happening to me, so I wouldn’t say it happens frequently.

How I traced the bad guy to a VPS in New York

Always check an IP address out before you block it, to make sure it’s not actually something you need. If you’re getting tons of traffic from something random like this, or an ISP, you probably want to block it.

One thing you want to do before you block IP addresses is check them out. Most of my semi-high-traffic IPs were bots controlled by search engines. Yahoo was crawling the site pretty hard around the same time as the bad guy, but and I want that traffic. I worked pretty hard to get search engines to index me, especially the engines not named Google.

So I visited ARIN whois and punched in the IP address of my obnoxious little friend. It was registered to a VPS provider in New York. A VPS is a Linux box for rent in someone’s datacenter. You pay some money, and you get an account on a server that’s somewhat isolated from the other subscribers. Most people use them to host websites cheaply. Realistically, there’s no legitimate reason for a VPS to be accessing my site. This dude wasn’t reading my site with Lynx, if you know what I mean. I don’t think he was collecting my content either, because he downloaded everything on the site 20 times.

I’ll probably never know his motive. The effect was that pages that normally load in two seconds took 12. So I put in the iptables rule, and my server quit responding to him. For all he knows, he knocked me over. I’m fine with him thinking that. As soon as I blocked him, my CPU usage dropped below 20 percent and stayed there.

You do want to check out an IP before you block it, though. Besides Yahoo, I had a bunch of traffic from Automattic, makers of several of my critical WordPress plugins. Blocking that traffic inadvertently wouldn’t be good for the user experience.

If you found this post informative or helpful, please share it!