Message digests for forensic purposes

I found a question in my studies whose answer I didn’t like. So I’ll repeat the question and the choices, and state what I think the answer should be and why I think that way. Any experts out there who might be reading can feel free to chime in.

Which of the following is a potential problem when creating a message digest for forensic purposes?

A. It’s an extremely slow process
B. The message digest is almost as long as the data
C. The last access time of the file is changed
D. One-way hashing technology invalidates message digest processing

First, a tiny bit of background: The idea here is that if you have a hard drive for forensic/evidence purposes, you need to prove the drive hasn’t been modified. So after making a bit-level copy of the drive, you take an MD5 or a SHA-1 value. Then you do your investigations on the copy, and use the MD5 or SHA-1 to demonstrate you didn’t plant the evidence you say you found.

We’ll start with answer D. That response is self-contradictory. One-way hashing is another name for message digest processing. The process doesn’t invalidate itself, at least not if you do it right. So we can eliminate D immediately. Good test-taking practice.

Working backward, answer C is much better. If you do it wrong, you certainly can change the last access time of any files you look at. But that’s why you examine a bit-level copy of the drive, not the original drive. And you can work around that issue, by examining the drive with a sector editor–something you want to do anyway, since some of the information you’ll want to look at will no longer be a valid part of any file–and, if you need to access things at the file level, by mounting the drive read-only. Using a sector editor or mounting a drive read-only are moderately advanced skills, but if you don’t know how to do those things, you don’t need to be doing forensics. Someone who does forensics every day may even know some other tricks–I’ve only done forensics twice in my career. Suffice it to say, if a professional is doing the job, answer C won’t be a problem.

Answer B is wrong by definition. Message digests are always a fixed length, whether you take a message digest of a single character or of a 4-terabyte hard drive. They’re short enough that you can inspect them visually to see if the values match. So B is another garbage answer.

I think the answer should be A–it’s slow. I did a little digging, and a reasonably modern CPU can calculate a one-way hash at a rate of 150-200 MB per second. The hard drive you’re hashing may or may not be able to deliver the data that quickly. But if I’m doing the math right, calculating the message digest on a 1 TB hard drive should take a couple of hours, and the time required should roughly scale linearly, so a 2 TB drive would take 4 hours and a 3 TB drive would take 6. That’s not a crippling length of time–it should be very close to the length of time required to do a bit-level copy–but it’s the best of the four answers. And when a modern computer can do so many things in a matter of seconds, I guess a couple of hours would qualify as extremely slow to many people.

And finally, one last thing. A common mistake is to confuse message digests/one-way hashing with encryption. They’re conceptually similar and use similar math, but they aren’t interchangeable. They serve very different functions. Think of encryption like scrambling and descrambling data, and message digests like fingerprinting. A subtle change in a file is enough to change its checksum noticeably, just like my sons have different fingerprints. They look an awful lot like each other–and like me–but they each got different mixtures of their parents’ genes. And that’s the point of a message digest–to prove that a file hasn’t been changed. That’s why you can’t “get a file back” from a one-way hash. Getting the file back from the one-way hash isn’t the point. It’s strictly an integrity check.

If you found this post informative or helpful, please share it!

One thought on “Message digests for forensic purposes

  • January 5, 2012 at 8:37 am
    Permalink

    Dave,

    You’re correct in that A would be the best answer. B isn’t true, C only would occur if the person collecting the image wasn’t using a writeblocker or made a similar boneheaded mistake, and D is completely wrong. You can hash a hashed hash as many times as you like.

Comments are closed.