Increase the speed of your Web pages

There are commercial utilities that will optimize your HTML and your images, cutting the size down so your stuff loads faster and you save bandwidth. But I like free.
I found free.

Back in the day, I told you about two programs, one for Windows and one for Unix, that will crunch down your JPEGs by eliminating metadata that’s useless to Web browsers. The Unix program will also optimize the Huffman tables and optionally resample the JPEG into a lossier image, which can net you tremendous savings but might also lower image quality unacceptably.

Yesterday I stumbled across a program on Freshmeat that strips out extraneous whitespace from HTML and XML files called htmlcrunch. Optionally, it will also remove comments. The program works in DOS–including under a command prompt in Windows 9x/NT/2000/XP, and it knows how to handle long filenames–or Unix.

It’s not advertised as such, but I suspect it ought to also work on PHP and ASP files.

How much it will save you depends on your coding style, of course. If you tend to put each tag on one line with lots of pretty indentation like they teach in computer science classes, it will probably save you a ton. If you code HTML like me, it’ll save you somewhat less. If you use a WYSIWYG editor, it’ll probably save you a fair bit.

It works well in conjunction with other tools. If you use a WYSIWYG editor, I suggest you first run the code through HTML Tidy first. HTML Tidy, unlike htmlcrunch, actually interprets the HTML and removes some troublesome information. But in some cases, HTML Tidy will add characters, but this is usually a good thing–its changes improve browser compatibility. If you feed HTML Tidy a bunch of broken HTML, it’ll fix it for you.

You can further optimize your HTML with the help of a pair of Unix commands. But you run Windows? No sweat. You can grab native Windows command-line versions of a whole slew of Unix tools in one big Zip file here.

I’ve found that these HTML tools sometimes leave spaces between HTML elements under some circumstances. Whether this is intentional or a bug in the code, who knows. But it’s easy to fix with the Unix tr command:

tr "> indexopt.html

Some people believe that Web browsers parse 255-character lines faster than any other line length. I’ve never seen this demonstrated. And in my experience, any Web browser parses straight-up HTML plenty fast no matter what, unless you’re running a seriously, seriously underpowered machine, in which case optimizing the HTML isn’t going to make a whole lot of difference. Also in my experience, every browser I’ve looked at parses CSS entirely too slow. It takes most browsers longer to render this page than it takes for my server to send it over my pokey DSL line. I’ve tried mashing my stylesheets down and multiple 255-character lines versus no linebreaks whatsoever made little, if any, difference.

But if you want to try it yourself, pass your now-optimized HTML file(s) through the standard Unix fmt command, like so:

fmt -w 255 index.html > index255.html

Optimizing your HTML files to the extreme will take a little time, but it’s probably something you only have to do once, and your page visitors will thank you for it.

I have seen the future, and it works!

Now appearing nightly, in the nightly Mozilla builds, the Open Source community is very proud to present a very special feature: Naive Bayesian spam filtering!
And you’re probably wondering why I’m excited about something as boring-sounding as that. Don’t worry. I’m no less sane than I was yesterday and I’ll prove it.

Bayes’ Rule is a method of pattern recognition. You tell it what is spam and what isn’t and over time it learns how to recognize what is and isn’t spam. Click here for an explanation of what it is and why it works.

Its main selling point is that when implemented properly and trained thoroughly, Bayesian filtering is very effective at identifying spam and produces nearly zero false positives.

So I excitedly downloaded and ran the Nov. 14 Mozilla nightly build. The filtering doesn’t presently filter, it only marks the messages as spam and non-spam. That’s OK, I can sort them and then zap them myself for a while. I trained it on about 1,400 non-spam messages (I only had a few dozen spams). It doesn’t identify much spam yet, but I’ve had zero false positives. It recognizes my most incessant spammer, the Smartmall Success Group (Kevin Butthead, take your Amway-meets-ecommerce scheme and stick it. I’m much more interested in joining the mafia.) and it’s starting to recognize unsolicited credit card spam.

Spam normally irritates me. Really irritates me. But now it’s a game. I look forward to spam coming in to see if Mozilla recognizes it. And it’s encouraging to watch it learn and get better. I’m going to win this battle. Within a month, I expect that time I waste deleting spam and making sure I didn’t delete anything important will be free for me to do something else with it. Like answer legitimate mail from people I’ve never heard of.

Some people argue this filtering belongs on the server, but not everyone is willing to filter spam on the server. My employer never will (because many of my employer’s departments engage in questionable e-mail practices themselves) and I’d be shocked if my ISP ever did. I can set up my own mail server, but this is a lot easier. It’s probably a lot easier for you too, even if you’re one of the half-dozen or so experienced Unix sysadmins who regularly read these pages.

If you’re like me and you have 1,000+ e-mail messages squirreled away somewhere, and you don’t mind playing with alpha-level code (which you don’t if you’re running Windows, since Microsoft is in the habit of shipping alpha code and charging you hundreds of dollars for the privelige of alpha- and beta-testing it for them), go get this thing. Start training it. And watch the spam go bye-bye.

And if you’re better than me about cleaning out your inbox, get it anyway. It’ll just take you longer to train it.

New Freesco

Freesco 0.3 is out. It now has working PPPoE support, I understand. I don’t know what else it does just yet. But I intend to find out.
Freesco is easily my favorite single-floppy Linux distro, because once you get the hardware going, it’s easy to get running. Anyone familiar with computer networking can do it, without knowing a thing about Unix. And once it’s working, you can move it to the hard drive, which is good, since tiny hard drives are common as dirt and cheap and reliable (Freesco spins the drive down after it’s done booting, so a hard drive should work pretty much indefinitely, seeing as you’ll only reboot the thing when there’s a power failure), whereas floppy disks are anything but reliable.

That’s not to say that getting the hardware going isn’t a pain sometimes, but that’s not Freesco’s fault. Resolving a bunch of IRQ and I/O conflicts to get a 486 with a pile of ISA cards in it working perfectly is a pain no matter what OS you intend to run on it.

The worm that’s not a worm

I got mail at work today. The subject:
David you have an e-card from Alex.

Well, about the only person I know who calls me David is my mom. And I don’t know anybody named Alex. And why would a guy be sending me an e-card? Not wanting to explore that possibility any further, I disregarded it.

Then I remembered reading about something like that somewhere, so I went back and looked at it.

Short story: A really sleazy e-card company is sending out e-mail containing nothing but an URL at friendgreetings.com, which sends down ActiveX controls and installs some spyware that, among other things, sends bogus cards to everyone in your Outlook address book. That’s where I got that e-card message from. I was in this guy’s address book, for whatever reason. (Turns out he’s the webmaster at work. Funny how the webmaster and the hostmaster can go for long periods of time and never meet, eh?)

Officially, this isn’t a virus or a worm because it’s a company doing this crap, rather than a bored loser who lives in his parents’ basement and you have to click on an EULA (which most people do blindly anyway) for it to activate. I fail to see the difference, but I guess I’m weird that way.

I originally wrote that the anti-virus makers didn’t consider this a worm, but Symantec seems to have relented. You can get a removal tool at Symantec’s site.

If you want to protect yourself pre-emptively, locate your hosts file (in C:\winnt\system32\drivers\etc on NT/2000/XP; I’m wanting to say it’s in C:\Windows\System on Win9x; on most Unix systems it’s in /etc, not that it matters since this not-a-worm runs on Windows) and add the following entry:

127.0.0.1 www.friendgreetings.com

More cleanly, you can ask your network admins really nicely if they can block friendgreetings.com at the firewall or DNS level.

If you have inadvertently unleashed this monster, first, close Outlook immediately. Normally, I’d advise getting right with everyone else before cleaning things up, but since there’s the risk of making things worse if you do it that way, clean house, then start apologizing.

Next, download the removal tool.

If you want to be really safe, go into the control panel and remove anything that appears to have anything to do with friendgreetings.com. Next, I’d go to www.cognitronix.com and download Active Xcavator and remove anything having to do with friendgreetings.com. Next, I’d head over to LavaSoft and download Ad-Aware and let it shoot anything that moves.

Next, apologize profusely to the guy who runs your mail server (ours got clogged up for hours processing all the mail from not-our-friendgreetings.com) and to everyone in your address book. I can’t offer you any advice on the best way to do that. Except I’d use something other than Outlook to do it. Head over to TinyApps.org to find yourself a small freeware mail client. Assuming you’re not on an Exchange server, I’d suggest pulling the network plug before firing up Outlook again to get those e-mail addresses.

Meanwhile, it would do no good whatsoever if everyone who’s gotten one of these annoying e-cards (whether they opened it or not) opened a command prompt and typed ping -t www.friendgreetings.com and left it running indefinitely. No good whatsoever. It’s still a distributed denial of service attack if all of the participants participate voluntarily and independently. Right?

All of this and nothing

There’s no one thing to write about. So I’ll write about a few little things. Deal?
Read more

Update your BIND servers

A buffer overflow vulnerability exists in a large number of versions of BIND. CERT released an advisory over the weekend. I haven’t seen this on most news sites yet. Read more

Possibly the first Apache worm

I just found this article describing a worm that attempts to infect vulnerable Apache servers running on FreeBSD.
This doesn’t have much effect on Linux or other Unix variants (other than probably crashing lots of Apache sessions, which the machine may or may not recover gracefully from) but chances are this is just a harbinger of things to come.

You should upgrade to Apache 1.3.26 or Apache 2.0.39 immediately to avoid any problems, especially if you use FreeBSD. I’ve been running version 1.3.26 on Debian here for about a week without any issues, as I’ve come to expect from Apache.

Will ZDNet ever get a clue about Linux?

The next time ZDNet runs a story about Linux and you start feeling the urge to click on the link and read it, I’ve got a piece of advice for you.
Lie down until it goes away.

If you have a clue about Linux, the story will just make you mad. If you’re trying to learn about Linux, ZDNet will fill you up with enough misinformation to confuse you for weeks.
Read more

A DOS-style editor for Linux

I keep seeing “someday someone will write a DOS edit clone for Linux”-type longings in Linux publications. These are pointless, because someone already did, years ago.

And no, its name isn’t vi or emacs. It’s a true blue (it really is blue) DOS-like editor that uses a lot of the same keystrokes as the Microsoft QuickBasic-derived editor we all learned to tolerate, if not love, in the early ’90s. Hey, it wasn’t very powerful or fast, I know, but it was easy to learn and a whole lot better than edlin.

This one’s called SETedit, it’s from Argentina, and it’s just as easy to use but a whole lot more powerful. It’s also been ported to Win32, if you want to run it in more than just Linux.
Read more

Disguising a Linux box for the big, bad world

I had to put a Linux server out all alone in the big, bad world today. Before I turned it loose, I did a few things to give it a fighting chance out there.
The biggest thing I did was make the machine volunteer as little information as possible. Here’s how.
Read more