Tag Archives: unix commands

Increase the speed of your Web pages

There are commercial utilities that will optimize your HTML and your images, cutting the size down so your stuff loads faster and you save bandwidth. But I like free.
I found free.

Back in the day, I told you about two programs, one for Windows and one for Unix, that will crunch down your JPEGs by eliminating metadata that’s useless to Web browsers. The Unix program will also optimize the Huffman tables and optionally resample the JPEG into a lossier image, which can net you tremendous savings but might also lower image quality unacceptably.

Yesterday I stumbled across a program on Freshmeat that strips out extraneous whitespace from HTML and XML files called htmlcrunch. Optionally, it will also remove comments. The program works in DOS–including under a command prompt in Windows 9x/NT/2000/XP, and it knows how to handle long filenames–or Unix.

It’s not advertised as such, but I suspect it ought to also work on PHP and ASP files.

How much it will save you depends on your coding style, of course. If you tend to put each tag on one line with lots of pretty indentation like they teach in computer science classes, it will probably save you a ton. If you code HTML like me, it’ll save you somewhat less. If you use a WYSIWYG editor, it’ll probably save you a fair bit.

It works well in conjunction with other tools. If you use a WYSIWYG editor, I suggest you first run the code through HTML Tidy first. HTML Tidy, unlike htmlcrunch, actually interprets the HTML and removes some troublesome information. But in some cases, HTML Tidy will add characters, but this is usually a good thing–its changes improve browser compatibility. If you feed HTML Tidy a bunch of broken HTML, it’ll fix it for you.

You can further optimize your HTML with the help of a pair of Unix commands. But you run Windows? No sweat. You can grab native Windows command-line versions of a whole slew of Unix tools in one big Zip file here.

I’ve found that these HTML tools sometimes leave spaces between HTML elements under some circumstances. Whether this is intentional or a bug in the code, who knows. But it’s easy to fix with the Unix tr command:

tr "> indexopt.html

Some people believe that Web browsers parse 255-character lines faster than any other line length. I’ve never seen this demonstrated. And in my experience, any Web browser parses straight-up HTML plenty fast no matter what, unless you’re running a seriously, seriously underpowered machine, in which case optimizing the HTML isn’t going to make a whole lot of difference. Also in my experience, every browser I’ve looked at parses CSS entirely too slow. It takes most browsers longer to render this page than it takes for my server to send it over my pokey DSL line. I’ve tried mashing my stylesheets down and multiple 255-character lines versus no linebreaks whatsoever made little, if any, difference.

But if you want to try it yourself, pass your now-optimized HTML file(s) through the standard Unix fmt command, like so:

fmt -w 255 index.html > index255.html

Optimizing your HTML files to the extreme will take a little time, but it’s probably something you only have to do once, and your page visitors will thank you for it.

A remote administration Unix trick

OK, here’s the situation. I had a Linux box running Squid, chugging away, saving us lots of bandwidth and speeding things up and making everything wonderful, but we wanted numbers to prove it, and we liked being able to just check up on it periodically. Minimalist that I am, though, I never installed Telnet or SSH on it. And besides, I haven’t found an SSH client for Windows I really like, and Telnet is horribly insecure.
Sure, I could just walk up to it and log in and look around. But the server was several city blocks away from my base of operations. For a while it was a good excuse to go for a walk and talk to girls, but there weren’t always girls around to talk to, and, well, sometimes I needed to check up on the server while I was in the middle of something else.

So here’s what I did. I used CGI scripts for the commands I wanted. Take this, for example:

echo ‘Content-type: text/html’
echo ”
echo ‘‹pre›’
ps waux
echo ”
cat /proc/meminfo
echo ‘‹/pre›’

Then I dropped those files into my cgi-bin directory and chmodded them to 755. From then on, I could check on my server by typing into a Web browser. Boom, the server would tell me what processes were running, how much memory was in use, and even more cool, how much memory was used by programs and how much was used for caching.

Here’s how it works. The first two lines fake out Apache and your Web browser, essentially just giving them a header so they’ll process the output of these commands. The next line tells it it’s pre-formatted text, so don’t mess with it. This isn’t necessary for all commands, but for commands like ps that output multicolumn stuff, it’s essential. Next, you can type whatever Unix commands you want. Their output will be directed to the Web browser. I echoed a blank line just so the memory usage wouldn’t butt up against the process info. The last line just cleans up.

I wrote up scripts for all the commands I frequently used, so that way when my boss wanted to know how Squiddy was doing, I could tell him. For that matter, he could check it himself.

But if I knew there were going to be girls around, I went ahead and made an excuse to walk that direction anyway. Some things are more important than remote administration, right?