Linguistic analysis isn’t hooey

For the second time in two months, I’ve seen a case where a linguist analyzed writing and tried to conclude whether someone was or wasn’t the author of a suspicious e-mail message. The first was a threatening letter purportedly sent to Christopher Coleman, who was convicted last month of murdering his family, and the other was Paul Ceglia’s attempt to prove he owns a substantial share of Facebook.

The inevitable flood of comments calling such analysis “black magic” followed. But as an author, I have to give validity to it.

Sometimes I run across things I wrote years ago that I don’t even remember writing. But I recognize it as me. Can I tell you exactly what it is about the writing that tips me off? No. But I recognize my own writing. That alone tells me there are patterns that someone else would be able to pick up on.

And my writing has probably changed more over the years than most people. When an editor slaps me upside the head enough times after making the same mistake, eventually I stop making that mistake. Someone who doesn’t write professionally will tend to make the same mistakes over and over. Coleman consistently misspelled the word “opportunities” as “oppurtunities.” The threats he claimed to have received contained the same unusual spelling of the word, which suggests he wrote the threats himself. Especially since the headers in the messages showed they were sent from his own computer.

And you don’t necessarily have to be a linguist or a writer to pick it up. Sometimes somebody will get banned from a forum for bad behavior, then reappears using a new alias. And people figure out it’s the same person. It’s really hard to make yourself sound like somebody else, especially over an extended period of time. It’s so difficult that Mark Twain’s ability to make all of his characters sound like somebody other than Mark Twain was the first thing I noticed about his writing.

It’s even easier when a person tends to make mistakes. And there are plenty of those out there: misspellings, grammatical errors, punctuation errors, and misused apostrophes among them. Some people are very consistent in those errors. Some people are really inconsistent. Either can be a tip-off.

And mistakes are the norm. When I was a teenager, online buddies I’d never met in person assumed I had a college education because my writing had so few mistakes in it.

Impersonating somebody else is difficult if only because it’s hard to notice what you sound like. The very first time my wife and I ever spoke on the phone, one of the first things she noticed about me was my vocabulary. That’s the first thing she noticed, and yet, if anything, I go out of my way to not use pretentious words. For example, let’s look at the last eight dictionary.com words of the day: pecksniffian, pangram, foist, decollete, intestable, catarrh, leitmotif, and avoirdupois. Of the seven, I’ve only ever used foist. I thought I used it rarely, but then I checked and saw I’ve used it twice this year. But hey, it’s a one-syllable word and its meaning is easy to gain from context.

I think I sound like what I am: a guy who’s lived his entire life in the Midwest; most of it in Missouri; and most of that in or near its larger cities (as opposed to its rural areas); and who graduated from the University of Missouri-Columbia. Where we live, the schools we attend, our religious practices, our age, and countless numbers of other things contribute to how we think, speak, and write.

It’s like a fingerprint, and as much as you try to change it, some of it’s going to come through eventually. There’s no question in my mind that a linguist can pinpoint that. I’ve seen English and journalism professors–not even people who specialized in linguistics–pinpoint where students had lived previously, based on word choices and other subtle clues.

If it’s possible to pinpoint where you grew up, it’s even easier to examine known samples of writing and tell you whether a sample in question was likely to have been written by the same person.

And if a person can’t sort it out entirely unassisted, one can enlist computer software for help, such as the way scholars do to figure out whether contested plays were written by Shakespeare or just someone claiming to be Shakespeare.

If you found this post informative or helpful, please share it!

2 thoughts on “Linguistic analysis isn’t hooey

  • June 7, 2011 at 1:35 pm
    Permalink

    Reminds me of the name for a carbonated beverage: soda vs. pop vs. coke. That last leads to interesting conversations in (Southern) restaurants:

    I’ll have a coke.
    What kind of coke?
    Pepsi.

    • June 10, 2011 at 9:39 pm
      Permalink

      Exactly!

Comments are closed.