Data compression, 1980s-style–and why PKZIP won

My employer has me doing some very gray-hat work that I don’t want to describe in detail, because the information has a tremendous potential for misuse. But suffice it to say I’ve been trying to send data places the data shouldn’t go, and I tried to do it by going all 1987 on it by compressing the data with obsolete compression programs. Ever heard of security by obscurity? I was trying to bypass security by using obscurity. In the process, I learned why PKZIP won the compression wars.

Read more

Intel enters the budget Sandforce market

Intel announced a new low-end SSD today, the 330, based on a Sandforce 2281 controller. The popular 120 GB capacity will retail for $149. While not as cheap as OCZ’s entry-level SSDs, it’s within striking distance. Read more

Defrag scareware

This isn’t exactly news, as word has been going around for a couple of weeks, but if you haven’t heard about it elsewhere, there are some fake defragmenters going around.

I heard mention of it today, and it reminded me that I saw one last week when I was working on my mother in law’s computer. This was especially obnoxious, considering that at the time, I was running Firefox and I was visiting a mainstream site.

So there are a couple of things you need to keep in mind.
Read more

Slimming down Windows XP for SSDs and nettops

I found a very long and comprehensive guide for using Nlite to reduce the size of a Windows installation.

The guide is geared towards an Asus Eee. But it should work well on pretty much anything that has an Intel CPU in it.A couple of tweaks to his settings will make it suitable for AMD-based systems. Just remove anything Intel-specific, and add back in anything specific to AMD, and there you go.

And if you have a multi-core or hyperthreaded CPU, leave multi-processor support in.

I also recommend slipstreaming SP3 and all the hotfixes you can. Then you don’t have to run Windows Update, them, and you don’t have to clean up after it either. I haven’t investigated all of the whys and wherefores, but I’ve noticed that the more you slipstream ahead of time, the smaller your Windows directory ends up being. I have some systems at work that are constantly bursting at the seams on their system partitions. Other systems, which were built later from a copy of Windows with more stuff slipstreamed in, have a lot more breathing room.

Using the i64x.com instructions, you can pretty much count on getting a Windows XP installation under half a gig in size. That makes life with a small SSD much more bearable, since a typical installation tends to take a couple of gigs these days.

I’ll add some tips of my own. Inside the Windows directory, there are some subdirectories named inf, repair, and servicepackfiles. Compress those. That’ll free up some more space–at least a couple dozen megabytes in most cases.

If you’re really cramped, compress the whole Windows directory. Boot time actually decreased by a couple of seconds when I did this (down to 12 seconds from about 14), but software installations slowed considerably. But for everyday operation, you could almost consider NTFS compression a performance trick. It makes sense; an SSD can sometimes saturate the bus it’s connected to, so data compression lets it shove 20-50% more data through that saturated bus.

The downside is that when you install something that lives in the Windows directory, it has to not only copy the data into place, but also compress it. Installing the .NET Framework on a system with a compressed Windows directory takes a while.

A good compromise is to install pretty much everything you think you’ll need on the system, then start compressing.

It’s difficult to make a case for compressing the entire drive, however. Most modern data file formats are compressed–including all modern media formats and Office 2007 documents–so turning on NTFS compression on directories storing that kind of data gives no benefit, while introducing overhead.

How to use compression to help life with an SSD

Since pretty much everyone thinks my love of SSDs is insane, I’ll throw another insane idea on top of it: using data compression. It makes sense. Doing it selectively, you help performance, while saving space. At a much higher cost per gig, that saved space is very nice to have.

Here’s why compression makes sense. Under many circumstances, an SSD can saturate your IDE bus. Then you run into the 56K modem problem. The bus is saturated, but you want more speed, so what do you do? Compress the data. Although data compression makes people nervous (shades of DoubleSpace I’m sure), modems have been doing this for two decades. Why? Because it works.

So while your drive is happily shoving 200 megs per second through your IDE bus, if you can compress that file by 20 percent, guess what? You’ll get 20% better throughput.

CPU usage is the main objection to this. But in my experience, NTFS compression uses 20-40% of a recent (P4-class or newer) CPU when compressing. That’s the hard part. When decompressing, overhead is a lot less. The objections to NTFS compression really date to the days when 200 MHz was a fast CPU.

I don’t recommend just compressing your whole disk. Selective compression is a lot better. There’s no use trying to compress data that’s already compressed, and a lot of our data is.

Use the command COMPACT to do the job for you. Here’s my sequence of commands:

CD \
COMPACT /S /C *.doc *.xls *.rtf *.txt *.1st *.log readme* *.bmp *.wav *.wmf *.bat *.cmd *.htm *.html *.xml *.css *.hlp *.chm *.inf *.pnf *.cat

If you have other compressible files, of course you can add those.

This is a one-time event, but you can schedule it to happen daily or weekly if you want. Just put the two lines in a batch file and create a scheduled task to run it. The command will skip any files that are already compressed. While the compression itself doesn’t take a lot of CPU time, scanning the drive does, so you might want to run it while you’re away if you’re going to schedule it.

Don’t bother trying to compress your My Music or My Pictures directories; that data is all highly compressed already, so all you do is tax your CPU for no reason when you compress that kind of data. Of course the main reason people buy 1 TB drives is because they have hundreds of gigabytes of music and movie files. It’ll be a while before storing that kind of data on SSD is practical. In that case, buy an SSD to hold the operating system and apps, and a conventional drive to hold all that data.

Some people compress their C:\Program Files directory. This can work, but some programs are already compressed. I would be more inclined to experiment with subdirectories on a case-by-case basis. Try compressing one program directory, see if it packs down any, and if it does, great. If not, uncompress it and move on.

UPX does an outstanding job of packing down program files but it’s not completely transparent. I found enough programs didn’t run afterward that I gave up on it. NTFS compression is a lot less effective, but a lot more transparent. As long as you don’t compress your swap file or hibernation file (and Windows will warn you incessantly if you even try to do that), you won’t break anything with it.

If you enjoy tinkering with things, by all means feel free to experiment with UPX. There was a time when I would have probably done it, but given a choice today between playing with data compression or playing with metalworking tools, I’d rather play with my metalworking tools.

But I do really like this SSD. For the first time in a very long time, I can sit down at a computer running modern software and it still feels fast.

We\’ll have to wait longer for PCI RAMdisks

In case nobody noticed, it’s August. July came and went, and there’s no Gigabyte I-RAM on the market yet.

But there are a few benchmarks out there, and Anandtech has an article that, once you get past the usual rambling and over-the-top introduction, has some useful insights.I was going to say the first problem is the somewhat disappointing speed, but actually, there are two bigger problems:

Availability. Now they’re saying it’ll be out sometime in August. And they’re initially only going to make 1,000 of them.

Price. The original $50 MSRP is out the window; now this thing is going to cost $150.

Can anything else be wrong? Unfortunately, yes. The speed is a bit disappointing. The SATA interface is the bottleneck. The very newest hard drives can come close to saturating the SATA interface for short periods of time, so the RAMdisk doesn’t outperform it by much. If this drive were using an interface with more bandwidth, there wouldn’t be as much problem, but squeezing more bandwidth out of the 33 MHz PCI bus is tough. We’re at the point now where the PCI bus is a much bigger bottleneck than the ISA bus was in 1994. The theoretical limit of the PCI bus is 132 megabytes per second, which isn’t much higher than the sustained throughput of 100 megabytes per second that the I-RAM delivers.

The combination of PCI Express and a faster disk protocol has the potential to resolve this issue, but at the expense of limiting the device’s market even further.

I’m disappointed by the review in a couple of regards, though. First, they compare the I-RAM to the fastest SATA drive available at the time of the review. That’s not necessarily what every would-be purchaser would be using. I believe that an I-RAM used to replace (or in conjunction with) a drive that’s a couple of years old would be a mind-blowing upgrade.

Second, they don’t take fragmentation into account. Enthusiasts are more likely to defragment their hard drives twice a day than everyone else, so fragmentation may not be an issue for them. But my wife, mother, and mother-in-law don’t know what fragmentation is. Well, maybe my wife does because she’s probably overheard me talk about it. The thing about the I-RAM is that it makes seek times irrelevant, so it’s never going to slow down due to fragmentation. Translation: For people who have lives, this thing could be phenomenal.

The review complained constantly about the drive’s capacity. So I’m disappointed that they didn’t test the drive with NTFS compression enabled. While data compression is still taboo, and it increases CPU usage, when you’re out of room it’s your only choice. While its effectiveness is unpredictable, it’s fairly safe to bet compression will get you another gigabyte or two of usable space on a 4-gig model. But just as importantly, under some circumstances, compression can actually increase performance. I want to know if increasing the amount of data you’re flowing over the saturated bus makes up for the increased CPU usage.

So there is a benefit to running Windows Server 2003 and XP

One of the reasons Windows Server 2003 and XP haven’t caught on in corporate network environments is that Microsoft has yet to demonstrate any real benefit to either one of them over Windows 2000.

Believe it or not, there actually is one benefit. It may or may not be worth the cost of upgrading, but if you’re buying licenses now and installing 2000, this information might convince you it’s worth it to install the current versions instead.The benefit: NTFS compression.

Hang on there Dave, I hear you saying. NTFS compression has been around since 1994, and hard drives are bigger and cheaper now than ever before. So why do I want to mess around with risky data compression?

Well, data compression isn’t fundamentally risky–this site uses data compression, and I’ve got the server logs that prove it works just fine–it just got a bad rap in the early 90s when Microsoft released the disastrous Doublespace with DOS 6.0. And when your I/O bus is slow and your CPU is really fast, data compression actually speeds things up, as people who installed DR DOS on their 386DX-40s with a pokey 8 MHz ISA bus found out in 1991.

So, here’s the rub with NTFS compression when it’s used on Windows Server 2003 with XP clients: the data is transferred from the server to the clients in compressed form.

If budget cuts still have you saddled with a 100 Mb or, worse yet, a 10 Mb network, that data compression will speed things up mightily. It won’t help you move jpegs around your network any faster, but Word and Excel documents sure will zoom around a lot quicker, because those types of documents pack down mightily.

The faster the computers are on both ends, the better this works. But if the server has one or more multi-GHz CPUs, you won’t slow down disk writes a lot. And you can use this strategically. Don’t compress the shares belonging to your graphic artists and web developers, for instance. Their stuff tends not to compress, and if any of them are using Macintoshes, the server will have to decompress it to send it to the Macs anyway.

But for shares that are primarily made up of files created by MS Office, compress away and enjoy your newfound network speed.

Easy and secure remote Linux/Unix file transfers with SCP

Sometimes you need to transfer files between Linux boxes, or between a Linux box and some other box, and setting up Samba or some other form of network file system may not be practical (maybe you only need to transfer a couple of files, or maybe it’s just a one-time thing) or possible (maybe there’s a firewall involved).
Well, you should already have SSH installed on your Linux boxes so you can remotely log in and administer them. On Debian, apt-get install ssh sshd. If you’re running distro based on Red Hat or UnitedLinux, you may have a little investigative work to do. (I’d help you, but I haven’t run anything but Debian for 2 or 3 years.)

The cool thing about SSH is that it not only does remote login, but it will also do remote file transfer. And unlike FTP, you don’t have to stumble around with a clumsy interface.

If you want to transfer files from a Windows box, just install PuTTY. I just downloaded the 240K PSCP.EXE file and copied it into my Windows directory. That way I don’t have to mess with paths, and it’s always available. Make sure you’re downloading the right version for your CPU. The Windows NT Alpha version won’t run on your Intel/AMD/VIA CPU. Incidentally, Putty.exe is a very good Telnet/SSH client and a must-have if you’re ever connecting remotely to Unix/Linux machines from Windows.

SSH includes a command called SCP. SCP works almost like the standard Unix CP command. All you to do access a remote file is append a username, followed by the @ sign, and the IP address of the remote server. SCP will then prompt you for a password.

Let’s say I want to move a file from my Linux workstation to my webserver:

scp logo.jpg root@192.168.1.2:/var/www/images

SCP will prompt me for my password. After I enter it, it’ll copy the file, including a nice progress bar and an ETA.

On a Windows machine with PuTTY installed, simply substitute the command pscp for scp.

I can copy the other way too:

scp root@192.168.1.2:/var/www/index.php .

This command will grab a file from my webserver and drop it in the current working directory.

To speed up the transfers, add the -C switch, which turns on compression.

SCP is more secure than any other means of file transfer, it’s probably easier (since you already need SSH anyway), and since it’ll do data compression, it’s probably faster too.

Reviving a laptop

The drive in my work laptop gave a S.M.A.R.T. error over the weekend. I never have had much luck with Hitachi laptop drives. Micron sent a replacement drive–an IBM, thankfully–and, doubly thankfully, the Hitachi hung on until today. So I whipped out Bart’s magic network boot disk–to which I’d added the 3c556 module necessary to get this Micron Transport LT on the network–and ran my copy of Ghost from a network drive. (It won’t fit on that disk, no way, no how. Not with all the other stuff crammed onto it.)
Depending on how far gone the drive is, Ghost can cope with failing hard drives, because you can use the -FRO switch to make it work around bad clusters to the best of its ability. So I initiated Ghost with ghost -z9 -fro (the -z9 tells it to use maximum compression, since the network is the bottleneck here) and made a copy of my disk to a network drive. An hour and a half later (ugh–do I ever miss Token Ring) I had a backup. So I swapped in the IBM drive and repeated the process in reverse. An hour and a half up, an hour and a half down. The data compression wasn’t the bottleneck.

And in the end, I had a healthy laptop again. The IBM drive is quieter and seems faster. I noticed it wasn’t the nice new 5400 RPM model (it’s a 4200 rpm drive) but it’s not a slouch. And it definitely doesn’t clunk as much as the Hitachi always did. I love Hitachi’s video equipment, but their hard drives have always given me trouble. IBM’s laptop drives have always been fine for me. And I know IBM took a lot of black eyes over the GXP desktop series, but think about the things that are known to cause problems with IDE drives:

Rounded cables
PCI bus overclocked beyond 33 MHz
Heat
Cables longer than 18 inches (the length of the wire–not the cable itself)
Certain VIA chipsets in conjunction with Sound Blaster Live! sound cards

IBM 75GXP and 60GXP drives were typically bought by people seeking performance. People seeking performance often do at least one of the above, intentionally or unintentionally. During the 75GXP’s heyday, the hottest chipsets on the block were made by VIA (Intel was still embroiled in the whole Rambus fiasco), and the sound card everyone had to have was the SB Live. I suspect the GXPs were more sensitive to these factors than some other drives and they really weren’t as bad as their reputation.

While rounded cables are good for airflow, they’re bad for signal integrity. Rounded SCSI cables are common, especially in servers, and have been for years, but SCSI takes precautions with its signals–most notably, termination–that IDE doesn’t. That’s part of the reason why IDE is cheaper. So yes, though ribbon cables do look really retro, replacing them with fancy rounded cables isn’t a good idea unless you like replacing hard drives. Get Serial ATA adapters and run your drives serial if you don’t think retro is cool. I’ve been conspiring for the last couple of years to get something semi-modern into my vintage IBM AT case, so I happen to like retro.

But I digress. I hope when the merger between Hitachi’s and IBM’s storage divisions happens, we get the best aspects of both rather than the worst.