Gzip vs Zip

Zip and gzip are two similarly named utilities, and they perform similar functions, but they are unrelated. They also have slightly different use cases. Let’s take a look at gzip vs Zip.

Simply speaking, Zip and gzip are implementations of the same compression algorithm for different operating systems, so they are similar but not exactly compatible. Zip originated with MS-DOS, while gzip originated with Unix-like operating systems, notably Linux.

Gzip

gzip vs Zip
The GNU gzip utility borrowed the Deflate algorithm from Zip, hence the name. But the two utilities aren’t compatible, either technically or philosophically. PKZIP was shareware, meaning you were supposed to pay for it, even if many people never did.

Gzip is the newer of the two programs, initially released October 31, 1992. It serves a replacement for the standard Unix utility compress, which uses the LZW compression algorithm that GIF uses. LZW was patented by Unisys. Unisys’ controversial decision to start collecting royalties on the LZW patent precluded its use in free software like the GNU project.

Conceptually, gzip works like compress, just using methods that don’t have patent issues. The two methods are incompatible, but for GNU, compatibility was always a secondary concern. Since gzip offered better compression, it overtook compress as the de facto Unix compression standard.

Since it is a Unix-like utility, it works two ways. It can simply compress a file directly from the command line, like an MS-DOS compression utility would do. But it can also compress a stream passed to it via a pipe, the vertical bar character. This allows other Unix utilities to use it for compression, without having to implement compression themselves.

Gzip uses a compression algorithm called Deflate, which is a combination of LZ77 and Huffman encoding. Deflate was invented by a programmer named Phil Katz. While Katz also patented Deflate, it was possible to implement Deflate in such a way as to not violate his patent.

Zip

Zip is the most popular compression format on MS-DOS and Windows systems. It overtook numerous other compression standards, largely due to support from BBS sysops. It was invented by Phil Katz and his implementation, PKZIP, was first released January 1, 1989.

Much like gzip, PKZIP was born from legal issues. Katz had implemented a version of an existing compression standard called ARC. Katz’ implementation was smaller and faster, because he split the compressor and decompressor into separate programs and coded parts of them in assembly language. Software Enhancement Associates (SEA), the creators of the MS-DOS version of ARC, sued Katz for trademark and copyright infringement.

SEA won the lawsuit but lost the PR war. Katz created a new compression standard that he called Zip. It created smaller files than ARC, which was an important consideration at a time when 2400 bits per second was a fast modem. But he made a point of declaring Zip to be an open standard, which stood in contrast to SEA’s attempt to retroactively close the ARC standard.

Various implementations of Zip soon appeared. Info-Zip was a free implementation that ended up being ported to nearly every operating system not named MS-DOS. Katz didn’t like Windows and was slow to create a Windows version of PKZIP. That left an opening for Nico Mak to step in with Winzip, which became the most popular Zip utility.

Zip vs gzip

While gzip borrowed heavily from Phil Katz’ Zip standard, it wasn’t a full implementation. That wasn’t the goal. The design goal with gzip was to be a replacement for compress that used a standard that didn’t have patent issues. It didn’t need to be compatible with anything but itself. To that end, Jean-loup Gailly and Mark Adler simply implemented a compress-like utility that included just one of the Zip algorithms, Deflate.

The file format implemented by PKZIP and its compatible utilities is different from gzip. The files inside these archives can be compressed with Deflate or any number of other algorithms, although modern implementations may omit some of the earlier, less efficient algorithms.

From a technical standpoint, the biggest difference between the two is that gzip is only a compression utility. It doesn’t combine multiple files into a single file. In the Unix world, that functionality belongs in another utility called tar. Zip utilities perform both functions in a single tool.

Philosophically, the two tools have a big difference too. PKZIP was shareware. If you used it, you were supposed to pay $25 for it, or $47 if you wanted a manual. Being part of the GNU project, gzip was free. PKZIP and Winzip had a reputation for being programs that everyone used but no one registered, but that’s not exactly true. Enough people registered them that Phil Katz and Nico Mak made a living off them.

Zip vs gzip compression efficiency

There are a few people who’ve written about the efficiency of various compression algorithms and programs. And you can probably find results that will support whichever one you want to win. Since they both use the same algorithm, Zip and gzip will have similar results. But since Zip works on each file individually and then adds it to the archive, where gzip relies on tar to create the archive and then operates on the whole thing, the results can vary. Gzip will usually be smaller, sometimes dramatically so. On a modern system, it will be faster too. But if the goal is the smallest file possible, there are several newer compression algorithms that can beat both of them.

To try out both of them, I compressed the /bin directory on my main Linux system with both Info-Zip and with the combination of tar and gzip. My /bin directory contained 192 MB of files.

To test both, I used the following commands:


time tar cvzf test.tgz /bin/*
time zip test.zip /bin/*

I ran both commands twice so the contents of /bin would be in the disk cache.

The combination of tar and gzip took 7.46 seconds and it produced a 62 MB file.

Info-Zip took 11.888 seconds and produced a 94 MB file.

Zip vs gzip on a single file

The difference is less pronounced when you have a single file. I created a large single file with the command tar cvf test.tar /bin/*. I then compressed the resulting file with both Zip and gzip. When I did that, the gzip file was only 139 bytes smaller than the zip file. Gzip finished one second faster, but they both took over 34 seconds.

Which is better?

On a modern system, generally speaking, the Unix-like approach of separating the two operations of combining files and compressing them yields better results. It’s faster and it yields a smaller file. On an MS-DOS PC in 1989, that approach wasn’t necessarily practical. PKZIP originally ran on systems that had 640K of RAM or less and didn’t necessarily have a hard drive. Those systems didn’t necessarily have the resources for a two-stage approach like you use with tar and gzip. Zip’s approach made sense at the time. But, as people are fond of reminding me, we’re closer to 2050 than we are to 1990 now.

Today, if you have a choice, tar and gzip will probably yield a smaller file, but unless you’re sending it to another Linux user, they may not know what to do with it. Zip is less efficient, but every modern operating system has a Zip-compatible utility included with it today and almost everyone knows what to do with a Zip file. Momentum is a powerful thing. The Qwerty keyboard wasn’t designed for efficiency either, and yet it’s what we continue to use.

If the goal is the smallest possible file, there are better choices than gzip. The most common alternative is bzip2. It’s slower than gzip, but creates files that are nearly 10 percent smaller. LZMA is another good choice. There’s a Linux utility called lzma that implements it. On Windows, the most popular program that uses the LZMA algorithm is 7-Zip.

A contrast in triumph and tragedy

There’s a bit more to the contrast between gzip vs Zip. The story of ARC vs Zip was often portrayed at the time as the triumph of the little guy over the man. Phil Katz successfully portrayed himself as the little guy, taking on the giant SEA and the incumbent ARC. In reality, this was just successful PR spin. Both SEA and Katz’ PKWARE were shoestring operations operating out of their founder’s homes. Katz’ scorched earth approach drove Thom Henderson, the creator of ARC, to find another line of work.

Katz’ invention won, but Katz the man lost. Zip became hugely popular, but Katz’ own implementation of it fell by the wayside. Like many people around 1990, Katz underestimated the significance of Windows 3.0. It’s easy today to call that shortsighted, but Katz was hardly alone in that opinion. Another programmer, Nico Mak, released a Windows GUI called Winzip in 1991. Initially it required PKZIP to be present. But version 5.0 in 1993 incorporated Zip-compatible code, eliminating the need for Phil Katz’ binaries to be present and cutting him out of the loop. PKZIP for Windows wasn’t released until 1998. By then it was too late to catch up.

Katz struggled with alcoholism and social isolation as the 1990s wore on. He had a lengthy criminal record, consisting mostly of DWI and DWI-related offenses. And on April 14, 2000, Katz was found dead in a motel room, with an empty bottle of liquor in his hand and five other empty bottles in the room. He was 37.

I have no way of knowing what Katz wanted out of life, but it seems he probably didn’t get it. And it’s OK to find that sad while also finding what he did to Henderson and SEA distasteful. Human beings are complex things.

gzip and the triumph over Unisys

The gzip project is more of a little-guy-triumphing-over-The-Man story. Unisys, an old-line computer company, held a patent on a commonly used compression algorithm. Although they allowed it to be used royalty-free for a time, in the early 1990s they reversed course. That led to Jean-loup Gailly and Mark Adler creating gzip, creating a new, unencumbered de facto standard to replace the incumbent compress.

The heavy-handed approach didn’t have the effect Unisys wanted. At its peak, Unisys was the second largest computer company in the world. LZW’s success or failure wasn’t going to make or break Unisys, but it turned Unisys into a villain. And it accelerated the push of LZW into obsolescence by encouraging people to find unencumbered alternatives.

It’s been said that the opposite of love isn’t hate, it’s indifference. And in the computing world, many more people today are indifferent toward Unisys than either love or hate it.

One thought on “Gzip vs Zip

  • February 17, 2021 at 8:08 pm
    Permalink

    In theory, a file format like Zip that handles each file separately could be more efficient that one that handles all files together. It could use specialized compression methods for certain file types, such as FLAC for audio files or JPEG XL for images, that take advantage of special knowledge of those kinds of data and are far more efficient on them than any general purpose algorithm.

    The actual Zip format is not nearly that sophisticated. It just uses a few different lossless algorithms and chooses between them, with the additional option of no compression for file types like JPEG that gain little or nothing from Lempel-Ziv based compression algorithms.

    What really made Unisys a villain in the case of the LZW patent is that it was at the heart of the GIF file format, which had far more direct impact on most personal computer users and was far harder for them to replace than the compress utility for Unix was.

Comments are closed.

%d bloggers like this: