What does compressing a file do? There are several ways to describe it, from eliminating redundancy, to just thinking of it as digital shorthand. The goal is to make the file take up less space on disk and in transit. How it works is a bit more complicated.
How file compression works
Modern file compression is more complicated than this, but at its heart, it’s about abbreviation. One way to compress this blog post would be to take all of the most common words and abbreviate them with characters that never appear in it. The word “compression” appears a lot. So replacing it with a symbol that doesn’t appear anywhere in the blog post would save 10 bytes per occurrence.
Of course you need some way to return the data back to its original form. That requires storing some kind of dictionary in the file.
The most popular file compression methods, such as Zip, have existed since the late 1980s. But as computers have become more powerful, newer, more novel and clever methods have become practical. Most file compression methods use a combination several different methods that the developers found worked well together.
Lossy vs lossless file compression
Most file compression methods are lossless. This means when you decompress them, the result is an identical copy of the original. But sometimes you can get by with subtle changes to the file that are hard to notice. In a picture or video file, subtle changes to the colors can make the file compress much better. This reduces the color fidelity, but if the faster transfer can be worth it.
When you stream audio or video online, it’s usually heavily compressed. Otherwise it wouldn’t be practical to stream high-definition video, let alone 4K video. If a movie looks better in Blu-Ray than when you stream it, you’re probably not imagining things. There probably really is a difference, and under the right conditions with the right screen, you can notice it, even though it should be pretty subtle.
Lossy compression compounds as you work with it, so photographers and video producers will work with lossless formats until the product is finished. If you’re asking what does compressing a file do, it may be because you’ve experienced this degradation when working with photos and video. The solution is to use lossless file formats like RAW or TIFF.
Audiophiles hate digital audio, partly due to the compression. Part of this is because people tended to be too aggressive with their compression rates to save space. How much compression you can get away with before you start to degrade the audio is a matter of religious debate, but the three-chord punk rock I listen to sometimes tends to be more tolerant of it than complex classical music.
If you run lossy compression against a file multiple times, you will create generational loss. Each subsequent compression will make the file lose some detail. So you don’t want to convert a file from MP3 to WAV and back to MP3, or from JPEG to PNG and back to JPG. Leave the files as-is.
Controversies in file compression
Some old-timers always get nervous when the topic of file compression comes up. I can think of one, or perhaps two reasons for this.
In 1993, Microsoft included a disk compression product in MS-DOS version 6.0 called Doublespace. Doublespace compressed your disk transparently, allowing you to store twice as much data on your hard drive. Since drives smaller than 200 megabytes were still common and hard drive space cost about a dollar a megabyte at the time, this was a big selling point. It wasn’t an original idea, but people assumed something designed by Microsoft and bundled with its own operating system would work better.
Except it didn’t. Doublespace could sometimes lead to data loss. Microsoft soon released a fix, but the damage was done. Decades later, I still occasionally run into people who remember the Doublespace scandal and express doubt about data compression in general.
Windows NT also includes disk compression. It’s completely unrelated to the old MS-DOS product, but if you compress a system drive without being careful about it, you can make the system less stable. If you’ve ever had to fix a production server that someone compressed to free up some disk space, you may be skeptical of compression too. I frequently used it when I was a system administrator to get myself out of a jam, but I always only compressed selected files. I’ve even blogged about that relatively recently. Modern Windows versions have a different way of compressing the operating system portion of the disk.
Does compressing files damage them?
Generally speaking, no, compressing files doesn’t damage them, unless you’re talking lossy compression with audio, video, or photo files. Even then, it’s only repeated lossy compression that really damages them.
Other forms of compression won’t damage your files. This blog post was compressed before my server sent it to you, to make it load faster. I routinely use compression several times a day, every day as part of my job.
What does compressing a file do? And is it worth it?
When done properly–and it’s difficult to do improperly–compressing files is worth it. Data compression is everywhere. This web page got compressed on its way to you, and your computer decompressed it before displaying it. Most high-traffic web sites use compression routinely.
Most of us don’t have to think about what compressing a file does or how it works. Compression is all around us, every day. It’s one of the things that makes the modern Internet possible.