Optimizing dynamic Linux webservers

Linux + Apache + MySQL + PHP (LAMP) provides an outstanding foundation for building a web server, for, essentially, the value of your time. And the advantages over static pages are fairly obvious: Just look at this web site. Users can log in and post comments without me doing anything, and content on any page can change programmatically. In my site’s case, links to my most popular pages appear on the front page, and as their popularity changes, the links change.

The downside? Remember the days when people bragged about how their 66 MHz 486 was a perfectly good web server? Kiss those goodbye. For that matter, your old Pentium-120 or even your Pentium II-450 may not be good enough either. Unless you know these secrets…

First, the simple stuff. I talked about a year and a half ago about programs that optimize HTML by removing some extraneous tags and even give you a leg up on translating to cascading style sheets (CSS). That’s a starting point.

Graphics are another problem. People want lots of them, and digital cameras tend to add some extraneous bloat to them. Edit them in Photoshop or another popular image editor–which you undoubtedly will–and you’ll likely add another layer of bloat to them. I talked about Optimizing web graphics back in May 2002.

But what can you do on the server itself?

First, regardless of what you’re using, you should be running mod_gzip in order to compress your web server’s output. It works with virtually all modern web browsers, and those browsers that don’t work with it negotiate with the server to get non-compressed output. My 45K front page becomes 6K when compressed, which is better than a seven-fold increase. Suddenly my 128-meg uplink becomes more than half of a T1.

I’ve read several places that it takes less CPU time to compress content and send it than it does to send uncompressed content. On my P2-450, that seems to definitely be the case.

Unfortunately, mod_gzip is one of the most poorly documented Unix programs I’ve ever seen. I complained about this nearly three years ago, and the situation seems little improved.

A simple apt-get install libapache-mod-gzip in Debian doesn’t do the trick. You have to search /etc/apache/httpd.conf for the line that begins LoadModule gzip_module and uncomment it, then you have to add a few more lines. The lines to enable mod_gzip on TurboLinux didn’t save me this time–for one thing, it didn’t handle PHP output. For another, it didn’t seem to do anything at all on my Debian box.

Charlie Sebold to the rescue. He provided the following lines that worked for him on his Debian box, and they also worked for me:

# mod_gzip settings

mod_gzip_on Yes
mod_gzip_can_negotiate Yes
mod_gzip_add_header_count Yes
mod_gzip_minimum_file_size 400
mod_gzip_maximum_file_size 0
mod_gzip_temp_dir /tmp
mod_gzip_keep_workfiles No
mod_gzip_maximum_inmem_size 100000
mod_gzip_dechunk Yes

mod_gzip_item_include handler proxy-server
mod_gzip_item_include handler cgi-script

mod_gzip_item_include mime ^text/.*
mod_gzip_item_include mime ^application/postscript$
mod_gzip_item_include mime ^application/ms.*$
mod_gzip_item_include mime ^application/vnd.*$
mod_gzip_item_exclude mime ^application/x-javascript$
mod_gzip_item_exclude mime ^image/.*$
mod_gzip_item_include mime httpd/unix-directory
mod_gzip_item_include file .htm$
mod_gzip_item_include file .html$
mod_gzip_item_include file .php$
mod_gzip_item_include file .phtml$
mod_gzip_item_exclude file .css$

Gzipping anything below 400 bytes is pointless because of overhead, and Gzipping CSS and Javascript files breaks Netscape 4 part of the time.

Most of the examples I found online didn’t work for me. Charlie said he had to fiddle a long time to come up with those. They may or may not work for you. I hope they do. Of course, there may be room for tweaking, depending on the nature of your site, but if they work, they’re a good starting point.

Second, you can use a PHP accelerator. PHP is an interpreted language, which means that every time you run a PHP script, your server first has to translate the source code into machine language and run it. This can take longer than the output itself takes. PHP accelerators serve as a just-in-time compiler, which compiles the script and holds a copy in memory, so the next time someone accesses the page, the pre-compiled script runs. The result can sometimes be a tenfold increase in speed.

There are lots of them out there, but I settled on Ion Cube PHP Accelerator (phpa) because installation is a matter of downloading the appropriate pre-compiled binary, dumping it somewhere (I chose /usr/local/lib but you can put it anywhere you want), and adding a line to php.ini (in /etc/php4/apache on my Debian box):

zend_extension=”/usr/local/lib/php_accelerator_1.3.3r2.so”

Restart Apache, and suddenly PHP scripts execute up to 10 times faster.

PHPA isn’t open source and it isn’t Free Software. Turck MMCache is, so if you prefer GPL, you can use it.

With mod_gzip and phpa in place and working, my web server’s CPU usage rarely goes above 25 percent. Without them, three simultaneous requests from the outside world could saturate my CPU.

With them, my site still isn’t quite as fast as it was in 2000 when it was just serving up static HTML, but it’s awfully close. And it’s doing a lot more work.

 

Confessions of a SQL 7 junkie

My name is Dave, and I’m a Microsoft junkie. So are the people I hang out with every day at work. We’re all junkies. We’re addicted to the glamor drug of Microsoft SQL Server 7.
I’m still trying to recover from the nightmare that is Microsoft SQL Server.

You see, I have a problem. My employer and most of its clients rely heavily on SQL Server. SQL Server is a touchy beast. We have some servers running completely unpatched SQL Server 7, for fear of breaking a client’s application. No, I absolutely will not tell you who my employer is or who those clients are.

That makes us, in Microsoft’s eyes, socialism-loving pinko Commies, since we won’t migrate to SQL 2000. Unfortunately, SQL 2000 isn’t completely compatible with SQL 7. So we’re forced into being pinko Commies.

Part of the reason SQL Slammer hit was because of the touchiness of the service packs and hotfixes, and part of it was the difficulty in installing them. The hotfix that would prevent SQL Slammer requires you to manually copy over 20 files, mercifully spread out over only two directories. But it takes time and it’s easy to make a mistake. So Microsoft released a SQL 2000 patch with a nice, graphical installer. But the pinko Commies like me who still use SQL 7 have to manually copy files.

Now, SQL 7 isn’t vulnerable to SQL Slammer, but it has plenty of security flaws of its own. And there’s one thing that history has taught us about viruses. Every time a new virus hits, a game of one-upmanship ensues. Similar viruses incorporating new twists appear quickly. And eventually a virus combining a multitude of techniques using known exploits appears. A SQL Slammer derivative that hits SQL 7 in one way or another is only a question of time.

Someone asked me why we can’t just leave everything unpatched and beef up security. The problem is that while our firewall is fine and it protects us from the outside, it doesn’t do anything for us on the inside. So the instant some vendor or contractor comes in and plugs an infected laptop into our network–and it’s a question of when, not if–we’re sunk. Can we take measures to keep anyone from plugging outside machines into our network? Yes. We can maintain a list of MAC addresses for inside equipment and configure our servers not to give IP addresses to anything else. But that’s obstructive. The accounting department is already supremely annoyed with us because we have a firewall at all. Getting more oppressive when there’s even just one other option isn’t a good move. People in the United States love freedom and they get annoyed when it’s taken away, even in cases that are completely justifiable like an employer blocking access to porn sites. But in a society where sysadmins have to explain that an employer’s property rights trump any given individual’s right to use work equipment for the purpose of seeing Pamela Anderson naked, one must be picky about what battles one chooses to fight.

In a moment of frustration, after unsuccessfully patching one server and breaking it to the point where SQL wouldn’t run at all anymore, I pointed out how one can apply any and every security patch available for Debian Linux at any instant it comes out with two commands and the total downtime could be measured in seconds, if not fractions of a second. And the likelihood of breaking something is very slight because the Debian security people are anal-retentive about backward compatibility. The person listening didn’t like that statement. There’s a lot more software available for Windows, he said. I wondered aloud, later, what the benefit of building an enterprise on something so fragile would be. Jesus’ parable of building a house on rock rather than on sand came to mind. I didn’t bring it up. I wasn’t sure it would be welcome.

But I think I’ll keep on fighting that battle. Keeping up on Microsoft security patches is becoming a full-time job. I don’t know if we can afford a full-time employee who does nothing but read Microsoft security bulletins and regression-test patches to make sure they can be safely deployed. I also don’t know who would want that job. But we are quickly reaching the point where we are powerless and our lives are becoming unmanageable.

Such is the life of the sysadmin. It’s a little bit of a rush to come into crisis situations, and a lot of my clients know that when they see me, there’s something major going on because they only see me a couple of times a year. In the relatively glamor-less life of a sysadmin, those times are about as glamorous as it gets. And for a time, it can be fun. But when the hours get long and not everyone’s eager to cooperate, it gets pretty draining.

Open a blg file in Windows

Open a blg file in Windows

In some versions of Windows, the usual method of viewing a file–double clicking on it–doesn’t work for BLG files. If you can’t view a BLG file just by double clicking on it, here’s the other way to open a BLG file in Windows.

This works in current versions of Windows and all the way back to Windows 2000.

Read more

Running a Web site without static IP with Linux and DynDNS

I run this Web site without a static IP address. I registered an address at DynDNS.org which, as long as I keep it updated, keeps me on the ‘Net.
In the past I’ve used a Windows-based program to keep my address updated. But the hard drive in that Windows box took leave of its life a few days ago. Somehow my IP address didn’t change for a few days, but then my DSL modem fell off the ‘Net.

Then I found setup instructions for Debian and Dyndns, which solved that problem. There’s a Dyndns client in Debian now, which this document explains, so now my Web server can keep itself online without any help from a Windows box and without me writing any nasty code.

Now, I haven’t tested this theory, but I suspect one could use DynDNS plus DHCP or PPPoE to run a Web site with a registered domain name without paying the extra monthly fee for a static IP address. The trick would be to set up your registered name’s DNS record as a CNAME to your DynDNS name.

Setting up the DNS records is left as an exercise to the reader, mostly because my understanding of it is good enough for me to do it myself, but not to explain it–when I’ve tried in the past, all I’ve succeeded in doing was confusing both of us.

Network infrastructure for a small office

We talked earlier this week about servers, and undoubtedly some more questions will come up, but let’s go ahead and talk about small-office network infrastructure.
Cable and DSL modems are affordable enough that any small office within the service area of either ought to get one. For the cost of three dialup accounts, you can have Internet service that’s fast enough to be worth having.

I’ve talked a lot about sharing a broadband connection with Freesco, and while I like Freesco, in an office environment I recommend you get an appliance such as those offered by Linksys, US Robotics, D-Link, Netgear, Siemens, and a host of other companies. There are several simple reasons for this: The devices take up less space, they run cooler, there’s no need to wait for them to boot up in case of power failure or someone accidentally unplugging it, and being solid state, theoretically they’re more reliable than a recycled Pentium-75. Plus, they’re very fast and easy to set up (we’re talking five minutes in most cases) and very cheap–under $50. When I just checked, CompUSA’s house brand router/switch was running $39. It’s hard to find a 5-port switch for much less than that. Since you’ll probably use those switch ports for something anyway, the $10-$20 extra you pay to get broadband connection sharing and a DHCP server is more than worth your time.

My boss swears that when he replaced his Linksys combo router/100-megabit switch with a much pricier Cisco combo router/10-megabit switch, the Cisco was faster, not only upstream, but also on the local network. I don’t doubt it, but you can’t buy Cisco gear at the local office supply store for $49.

For my money, I’d prefer to get a 24-port 3Com or Intel switch and plug it into a broadband sharing device but you’ll pay a lot more for commercial-grade 3Com or Intel gear. The cheap smallish switches you’ll see in the ads in the Sunday papers will work OK, but their reliability won’t be as high. Keep a spare on hand if you get the cheap stuff.

What about wireless? Wireless can save you lots of time and money by not having to run CAT5 all over the place–assuming your building isn’t already wired–and your laptop users will love having a network connection anywhere they go. But security is an issue. At the very least, change your SSID from the factory default, turn on WEP (check your manual if it isn’t obvious how to do it), and hard-code your access point(s) to only accept the MAC addresses of the cards your company owns (again, check your manual). Even that isn’t enough necessarily to keep a determined wardriver out of your network. Cisco does the best job of providing decent security, but, again, you can’t buy Cisco gear at your local Staples. Also, to make it easier on yourself, make sure your first access point and your first couple of cards are the same brand. With some work, the variety pack will usually work together. Like-branded stuff always will. When you’re doing your initial setup, you want the first few steps to go as smoothly as possible.

I’d go so far as to turn off DHCP on the wireless segment. Most wardrivers probably have the ability to figure out your network topology, gateway, and know some DNSs. But why make life easier for them? Some won’t know how to do that, and that’ll keep them out. The sophisticated wardriver may decide it’s too much trouble and go find a friendlier network.

Why worry about wireless security? A wardriver may or may not be interested in your LAN. But that’s one concern. And while I don’t care if someone mooches some bandwidth off my LAN to go read USA Today, and I’d only be slightly annoyed if he used it to go download the newest version of Debian, I do care if someone uses my wireless network to send spam to 250,000 of his closest friends, or if he uses my wireless network to visit a bunch of child porn or warez sites.

Enough about that. Let’s talk about how to wire everything. First off, if you use a switched 100-megabit network, you can just wire everything together and not give much thought to anything. But if you’re using hubs or wireless to connect your desktops, be sure to put your servers on 100-megabit switch ports. The servers can then talk to each other at full speed if and when that’s necessary. And a switch port allows them to talk at full speed to a number of slower desktop PCs at once. The speed difference can be noticable.

The low-end server

Here’s a good question: What should a small operation do when it gets fed up with its network and is tempted to just chuck it all and start over?
Well, my advice is to start over. But I don’t agree that starting over requires one to chuck everything.

We’ll start with the server. Chances are, these days, you need one. If you’re doing Web and e-mail, you absolutely need one. But to a lot of people, servers are a mystical black box that costs more money than a desktop PC but runs a similar operating system. And that’s all they know.

Here’s what you need to know: A corporate server is built to stricter tolerances than a desktop PC and sometimes uses higher-quality parts (common examples are ServerWorks chipsets instead of Intel chipsets, SCSI instead of IDE, and error-correcting memory instead of the cheap nonparity stuff). You also often get niceties like hot-swap drive cages, which allow you to add or replace hard drives without powering down or opening the case.

They’re generally also better tested, and you can get a support contract on them. If you’re running an enterprise with hundreds or thousands of people relying on your server, you should buy server-grade stuff, and building your own server or repurposing a desktop PC as a server ought to be grounds for dismissal. The money you save isn’t worth it–you’ll pay more in downtime.

But a dozen people won’t hit a server very hard. This Web site runs on a Dell OptiPlex Pentium II/450 workstation. A workstation is a notch above a desktop PC but a notch below a server, in the pecking order. The biggest difference between my Optiplex and the PC that was probably sitting on your desk at work a year or two ago is that my Optiplex has a SCSI hard drive in it and it has a 3Com NIC onboard.

A small office can very safely and comfortably take a reasonably powerful name-brand PC that’s no longer optimal for someone’s desk (due to an aging CPU) and turn it into a server. A Pentium II-350 or faster, outfitted with 256 MB of RAM, a SCSI host adapter and a nice SCSI hard drive, and a 3Com or Intel 100-megabit Ethernet card will make a fine server for a couple of dozen people. (My employer still has a handful of 200 MHz Pentium Pro servers on its network, serving a couple hundred people in some cases.)

This server gets hit about as hard as a typical small business or church office server would. So far this month I’ve been getting between 500 and 550 visitors per day. I’ve served about 600 megabytes’ worth of data. My average CPU usage over that time period is in the single digits. The biggest bottleneck in this server is its 7200-rpm SCSI disk. A second disk dedicated to its database could potentially speed it up. But it’s tolerable.

Hot swappable hard drives are nice to have, but with an office of a dozen people, the 5-10 minutes it takes to power down, open the case, swap drives, and close the case back up and boot again probably doesn’t justify the cost.

A business or church office that wanted to be overly cautious could buy the very least expensive sever it can find from a reputable manufacturer (HP/Compaq, Dell, IBM). But when you do that, you’re paying for a lot of power that’s going to sit there unused most of the time. The 450 MHz CPU in this box is really more than I need.

Jeremy Hendrickson e-mailed me asking about whether his church should buy a new server, and whether it really needed two or three servers, since he was talking about setting up a Samba server for file serving, Apache for Web serving, and a mail server. Running file and Web services on the same box won’t be much of a problem. A dozen people just won’t hit the server that hard. You just make sure you buy a lot of disk space, but most of that disk space will go to file serving. The database that holds all of the content on this site is only a few megabytes in size. Compressed, it fits on a floppy disk with lots of room to spare. Yes, I could realistically do nightly backups of my Web server on floppies. If floppies were at all reliable, that is.

I flip-flop on whether e-mail belongs on the same server. The security vulnerabilities of Web servers and mail servers are a bit different and it would be nice to isolate them. But I’m a lot more comfortable about a Linux box running both being exposed on the ‘Net than I am a Windows box running one or the other. If I had two boxes, and could afford to be paranoid, I’d use two.

Jeremy said his church had a P3-733 and a P2-450, both Dells, due for retirement. I’d make the P3 into a file/print/Web server and the P2 into a mail server and spend the money budgeted for a new server or servers to buy lots of disk space and a nice tape backup drive, since they’d get lots of use out of both of those. A new $1200 server would just buy lots of CPU power that’ll sit idle most of the time and you’d still have to buy disks.

As far as concern about the reliability of reusing older systems, the things that tend to wear out on older PCs are the hard drive and the operating system. Windows deterriorates over time. Server operating systems tend not to have this problem, and Linux is even more immune to it than Microsoft server operating systems. So that’s not really a concern.

Hard disks do wear out. I read a suggestion not long ago that IDE hard disks should be replaced every 3 years whether they seem to need it or not. That’s a little extreme, but I’ve found it’s hard to coax much more than four years out of an IDE disk. Dropping a new SCSI disk or two or three into an old workstation before turning it into a server should be considered mandatory. SCSI disks give better performance in multiuser situations, and are generally designed to run for five years. In most cases, the rest of the PC also has several years left in it.

Later this week, we’ll talk about Internet connectivity and workstations.

Optimizing Web graphics

Gatermann told me about a piece of freeware he found on one of my favorite sites, tinyapps.org, called JPG Cleaner. It strips out the thumbnails and other metadata that editing programs and digital cameras put in your graphics that isn’t necessary for your Web browser to render them. Sometimes it saves you 20K, and sometimes it saves you 16 bytes. Still, it’s worth doing, because more often than not it saves you something halfway significant.
That’s great but I don’t want to be tied to Windows, so I went looking for a similar Linux program. There isn’t much. All I was able to find was a command-line program, written in 1996, called jpegoptim. I downloaded the source, but didn’t have the headers to compile it. I went digging and found that someone built an RPM for it back in 1997, but Red Hat never officially adopted it. I guess it’s just too special-purpose. The RPM is floating around, I found it on a Japanese site. If that ever goes away, just do a Google search for jpegoptim-1.1-0.i386.rpm.

I used the Debian utility alien to convert the RPM to a Debian package. It’s just a 12K binary, so there’s nothing to installing it. So if you prefer SuSE or TurboLinux or Mandrake or Caldera, it’ll install just fine for you. And Debian users can convert it, no problem.

Jpegoptim actually goes a step further than JPG Cleaner. Aside from discarding all that metadata in the header, its main claim is that it optimizes the Huffman tables that make up the image data itself, reducing the image in size without affecting its quality at all. The difference varies; I ran it on several megabytes’ worth of graphics, and found that on images that still had all those headers, it frequently shaved 20-35K from their size. On images that didn’t have all the extra baggage (including some that I’d optimized with JPG Cleaner), it reduced the file size by another 1.5-3 percent. That’s not a huge amount, but on a 3K image, that’s 40-50 bytes. On a Web page that has lots of small images, those bytes add up. Your modem-based users will notice it.

And Jpegoptim will also let you do the standard JPEG optimization, where you set the file quality to a numeric value between 1 and 100, the higher being the truest to the original. Some image editors don’t let you adjust the quality in a very fine-grained manner. I’ve found that a level of 70 is almost always perfectly acceptable.

So, to try to get something for nothing, change into an image directory and type this:

jpegoptim -t *

And the program will see what it can save you. Don’t worry if you get a negative number; if the “optimized” file ends up actually being bigger, it’ll discard the results.

To lower the quality and potentially save even more, do this:

jpegoptim -m70 -t *

And once again, it’ll tell you what it saves you. (The program always optimizes the Huffman tables, so there’s no need to do multiple steps.) Be sure to eyeball the results if you play with quality, and back up the originals.

Commercial programs that claim to do what these programs do cost anywhere from $50 to $100. This program may be obscure, but that’s criminal. Go get it and take advantage of it.

Also, don’t forget the general rule of file formats. GIF is the most backward-compatible, but it’s encumbered by patents and it’s limited to 256-color images. It’s good for line drawings and cartoons, because it’s a lossless format (it only compresses the data, it doesn’t change it).

PNG is the successor to GIF, sporting better compression and support for 24-color images. Like GIF, it’s lossless, so it’s good for line drawings, cartoons, and photographs that require every detail to be preserved. Unfortunately, not all browsers support PNG.

JPEG has the best compression, because it’s lossy. That means it looks for details that it can discard to make the image compress better. The problem with this is that when you edit JPEGs, especially if you convert them between formats, you’ll run into generation loss. Since JPEG is lossy, line drawings and cartoons generally look really bad in JPEG format. Photographs, which usually have a lot of subtle detail, survive JPEG’s onslaught much better. The advantage of JPEG is the file sizes are much smaller. But you should always examine a JPEG before putting it on the Web; blindly compressing your pictures with high compression settings can lead to hideous results. There’s not much point in squeezing an image down to 1.5K when the result is something no one wants to look at.

Cheap network hardware

Steve DeLassus reminded me that NICs are dirt-cheap at Buy.com right now. A Netgear FA311 runs $10.50 after rebate. (Hint: these cards use the NatSemi module in Linux, and yes, you have to have a pretty recent distribution to have that module, though you can certainly download the source and compile it if you want.)
A Netgear 4-port 100-meg hub runs about 35 bucks. A Netgear 5-port 10/100 switch runs about 40. Very nice. Pricing at mwave.com is very similar.

If you prefer a tier-1 NIC, you can pick up Intel cards for $19 at Directron.com. Or if $10.50 will break you, you can get a generic RealTek-based card from Directron for $9.50 (it uses the rtl8139 module; 8139too will work as well, but the prior module is better). Be aware that the RealTek 8139 is anything but a high-end chip; and generic 8139s ought to be considered tier-3 cards. But if you’re on a budget and need something that’ll work with Linux, no questions asked, it’ll do.

Cheap cables? Directron’s got 7-footers for 3 bucks. Your choice of a 14′ or 25′ is 5 bucks. Pricing at Newegg.com is even a little lower.

I built my first home network in late 1998. I bought a SOHOware kit that included a 4-port 10-meg hub, a pair of 25′ cables, and a pair of 10/100 PCI NICs with a DEC Tulip knockoff chipset. I was pretty proud of myself for finding it for less than $100. That hub fell over dead within a few months. Now for that price you can have first-tier stuff.

I’m out of here for a couple of days. I’ve sent Steve DeLassus some stuff that he can post while I’m gone, so things shouldn’t be too different around here. Unless Steve decides he wants to write something, that is, in which case you’ll just see a marked increase in quality that day…

Well, and you won’t see immediate responses to comments from me.