There are commercial utilities that will optimize your HTML and your images, cutting the size down so your stuff loads faster and you save bandwidth. But I like free.
I found free.
Back in the day, I told you about two programs, one for Windows and one for Unix, that will crunch down your JPEGs by eliminating metadata that’s useless to Web browsers. The Unix program will also optimize the Huffman tables and optionally resample the JPEG into a lossier image, which can net you tremendous savings but might also lower image quality unacceptably.
Yesterday I stumbled across a program on Freshmeat that strips out extraneous whitespace from HTML and XML files called htmlcrunch. Optionally, it will also remove comments. The program works in DOS–including under a command prompt in Windows 9x/NT/2000/XP, and it knows how to handle long filenames–or Unix.
It’s not advertised as such, but I suspect it ought to also work on PHP and ASP files.
How much it will save you depends on your coding style, of course. If you tend to put each tag on one line with lots of pretty indentation like they teach in computer science classes, it will probably save you a ton. If you code HTML like me, it’ll save you somewhat less. If you use a WYSIWYG editor, it’ll probably save you a fair bit.
It works well in conjunction with other tools. If you use a WYSIWYG editor, I suggest you first run the code through HTML Tidy first. HTML Tidy, unlike htmlcrunch, actually interprets the HTML and removes some troublesome information. But in some cases, HTML Tidy will add characters, but this is usually a good thing–its changes improve browser compatibility. If you feed HTML Tidy a bunch of broken HTML, it’ll fix it for you.
You can further optimize your HTML with the help of a pair of Unix commands. But you run Windows? No sweat. You can grab native Windows command-line versions of a whole slew of Unix tools in one big Zip file here.
I’ve found that these HTML tools sometimes leave spaces between HTML elements under some circumstances. Whether this is intentional or a bug in the code, who knows. But it’s easy to fix with the Unix tr command:
tr "> indexopt.html
Some people believe that Web browsers parse 255-character lines faster than any other line length. I’ve never seen this demonstrated. And in my experience, any Web browser parses straight-up HTML plenty fast no matter what, unless you’re running a seriously, seriously underpowered machine, in which case optimizing the HTML isn’t going to make a whole lot of difference. Also in my experience, every browser I’ve looked at parses CSS entirely too slow. It takes most browsers longer to render this page than it takes for my server to send it over my pokey DSL line. I’ve tried mashing my stylesheets down and multiple 255-character lines versus no linebreaks whatsoever made little, if any, difference.
But if you want to try it yourself, pass your now-optimized HTML file(s) through the standard Unix fmt command, like so:
fmt -w 255 index.html > index255.html
Optimizing your HTML files to the extreme will take a little time, but it’s probably something you only have to do once, and your page visitors will thank you for it.