Home » ASCII


How to quickly find the differences between two Word documents

From time to time, I have to deal with new revisions of familiar implementation guides or other system documentation, and the authors rarely include a changelog in the document. And of course the first question anyone asks about the new guide is what’s changed. That means I have to find the differences between two Word documents.

This week I found myself collaborating on a long-ish document and needing to synchronize some changes. Word’s tracked changes and comments can help somewhat, but generally I find them clumsy and annoying.

If you have five minutes and a willingness to use a command prompt, you can find the differences easily, then work from there.
Read More »How to quickly find the differences between two Word documents

Analysis: Big retailers unite to make DMCA look stupid

A quartet of retailers ganged up on FatWallet.com and made it take down some ads for next week’s big sales, citing the DMCA. The ‘Net is up in arms.
It’s stupid. But not for the reasons you think.

In case you’re wondering, it’s been common practice for years now for someone to get hold of stores’ sales flyers in advance, then go on some forum somewhere (FatWallet.com isn’t the only place they go) and post scans of the flyers, or links to scans of the flyers, so people know what’s going to be on sale where. People make the biggest deal abut the holiday sales flyers, but it’s pretty easy to get the sales flyer from any old Sunday’s paper a few days in advance. If I want to know what’s on sale at Office Depot next Sunday, I can probably know by Thursday without going to too much trouble.

Retailers are starting to crack down on this.

The DMCA is the wrong law to be invoking. We’re talking scans of paper ads here. The stuff wasn’t digital media until someone other than the retailer made it into digital media. The appropriate law to be invoking is plain old copyright law. Let’s not make things more complicated than we need to. Unless we want to make the DMCA look like the stupidity that it is. They can feel free to do that if they want.

The community at large is in an uproar because they’re mad that the ads can’t be distributed in advance and big retailers who can afford lots of lawyers are picking on a Web site, probably operated on a shoestring, that definitely cannot. Yes, the legal system is a bunch of bullies. But that can go two ways also. Big companies can harass individuals or little ones, but if everyone who’s offended by these actions sued all four companies for $250 in small claims court in their home counties, that would be legal harassment as well, because it would cost these companies more than $250 to defend themselves from the nuissance suits. They would win, but the fight isn’t worth fighting.

Besides, when you scan an ad and you put it on the Internet, you are breaking the law. Copyright is just that–the right to decide who can copy something. Or can’t. And the conditions can be stupid and ludicrous. Or reasonable. And they can change over time too.

“But it’s a collection of facts!” people are whining. So is the telephone book. The telephone book is copyrighted. You can print your own telephone book (McCleod does just that, printing up its own alternative to the Southwestern Bell Yellow Pages), but if you copy someone else’s, they can (and will) sue you. There are bogus entries in every telephone book to keep people from doing that. A lot of copyrighted things are nothing more than collections of facts.

Other people are bemused that the store’s ads are being circulated, more widely than otherwise, for free, and the stores are offended that people might show up to buy stuff.

That argument I buy. But it’s not the public showing up that they’re worried about. I’m not sure that it’s the public knowing in advance what’s going to be on sale and for how much that they’re worried about either. Waiting an extra week for a sale before making a purchase is pretty standard practice anyway–it’s just that 20 years ago, you had to guess what might be on sale.

No, it’s that Target and Wal-Mart don’t want each other to know their sale prices. Best Bait-n-Switch doesn’t want Circuit… City to know its sale prices. Staples doesn’t want OfficeMax and Office Depot to know its sale prices. The longer the competition knows the prices in advance, the more time they have to adjust. It happens to be a lot easier for me to get in and out of my local Best Bait-n-Switch, so my natural inclination is to go there. Someone who lives a little north of me will find it a whole lot easier to get in and out of Circuit, so if Circuit has known its rival’s pricing for a week, its prices will all be the same anyway, so they’ll go there. Best Bait-n-Switch wants those people who live to the north to go to the extra trouble of making a left turn on Lindbergh (it’s a pain, as anyone familiar with the area can tell you) to save a few bucks.

So what can FatWallet.com do? I’m not a lawyer, so I don’t know for sure. They’re on much stronger legal ground if they don’t present scans of the ads. Just the facts, ma’am. A list of goods and prices in plain old ASCII is probably protected free speech. If it isn’t, adjust the prices. You know how everything sells for $19.95 or $19.99 or something similar, right? Round the prices up. It’s easier to type anyway. Presenting them in the same order as they are on the page might be too close of a copy. Sort the items alphabetically.

It takes more time for someone to sit down and type up a few dozen items and prices. But the person who does it probably stands on good legal ground. After all, the ad itself is copyrighted material. But the facts, as they say, can’t be.

One way to defeat spammers

Ever since Brightmail closed up their free filtering service, I’ve been thinking a lot more about spam because I’ve been getting a lot more. I know where these losers are getting my e-mail address. It’s right here on my Web page. But I need to post that so people can contact me. Fortunately, I found a trick. Look at this:

That’s just an e-mail link, right? It works just like any other, right? Well, here’s the HTML code for that:


See what I did? I obscured the @ sign with an ASCII code (64), along with the dot (46) and a couple of other characters like the colon. Most automated e-mail address harvesters don’t decode the HTML, so their search routines, which look for things like @ signs and dot-somethings will blow right past that.

So if you run a site, obscure your e-mail address. If you don’t remember your ASCII codes, hopefully you’ve still got QBasic on one of your machines. In QBasic, the command PRINT ASC(“A”) will give you the ASCII code for the letter A. Substitute any letter you like. Or you can remember that A is 65 and lowercase a is 97. A is 65, B is 66, and so on.

When a Web site asks you for an e-mail address, you can see if it’ll let you obscure parts of it. Unfortunately, my forums flag illegal characters, but I may be able to modify that. Some Web sites aren’t that smart.

Obviously this trick won’t work in e-mail, unless you always send your mail in HTML format, which I (along with about half the world) really wish you wouldn’t–it’s annoying. And even if you obscure the mail you send, if I copy and paste your mail to my site, it’ll go up there unobscured. So this advice is mostly for webmasters.

Anyway… On to other things.

We’ve moved, if you haven’t noticed. These pages should be at least a little bit faster. The forums will be several times faster. And the forums are goofy. I haven’t figured out exactly why, but posts are missing and user files are acting up. If you’re having problems (Steve DeLassus just told me he can’t post because it tells him his .dat file can’t be accessed), go ahead and re-register. If you want your post count raised to its previous level, just let me know. I can change that. (Hmm, I wonder if Gatermann would notice if I set his post count to a negative number…?) I’d have preferred to move everything intact, of course.

Anyway. Go play in the forums. See what breaks. If I don’t know it’s broke, I sure can’t fix it. (I may not be able to if I do know, but hey, I can give it my best shot.)

Update: It’s 5:45 in the p.m., and you’re watching… Wait. That’s something else. The forums seem to be working properly now. Lack of uniformity between Linux distributions bites me again… It wasn’t the location of the files YaBB was objecting to, nor was it permissions. It was ownership. Under Mandrake, Apache runs as a user named “apache” and thus files created by CGI scripts like YaBB are owned by “apache.” Under TurboLinux, Apache runs as user “nobody,” and thus files created by CGIs are owned by “nobody.” And when you just tar up your Web site and move it to a new box like I did, those files remain owned by their old owners. Since Linux assumes you know what you’re doing, it happily handed those files over to a non-existant user. So when YaBB came knocking, Unix security kicked in and said, “Hey, nobody, you don’t own these files,” hence those error 103s everyone was getting.

No, this is still the old server.

The new server works, but I got sidetracked last night. I had to take care of a weird work problem, and I ran out to a bookstore where the girls who work there seem to have this competition to see who can be the nicest, and then I came back home and had a long phone conversation with an old friend I hadn’t talked to in a couple of years. Between all that and trying to make some sense of Steve Gibson’s latest discoveries and trying to figure out what he wants and whether I agree with him, my server just kept chugging along.
I need to make my homebrew spam filter too. I’m thinking I’ll press a 486 into that duty, at least initially. I’m out of good PCs to experiment on. Once I get it working, if it’s slow, I’ll get some parts and build something better to block the onslaught of spam.

Oh, speaking of spam, for those of you who have Web pages… If you obscure certain characters in your e-mail address–sub in the raw ASCII code for the at sign and the period and one or two letters–most spam bots can’t harvest it. I need to do that for my pages. I’ve also found some cool-sounding traps for spam bots, including one that tries to dynamically figure out the spambot’s IP address, then feeds it accounts like abuse@owner.com and postmaster@owner.com. If they work, I’ll most certainly toss them your way.

Just added: More Like This

New feature: More Like This. It took me several hours to implement this one. It should have taken me less than thirty minutes. Hot tip: If you try to run a CGI script and you get Internal Server Error messages, try re-uploading the script in ASCII mode rather than the default binary. Betcha it works after that. When running under Linux or Unix, Perl hates extra carriage returns, and Windows often inserts them.
I think this is the next trend in Weblogs sites. If it’s not, it should be. The idea is this: You assign some keywords to each entry. And at the end of the entry, you put a line that says More Like This and some hyperlinked keywords. So if you like it when I write about baseball or music and you want to see more, click on the baseball or music hyperlink at the end of the entry, and the search engine I stole will go fish around for other entries I gave the same keyword to.

For sites that always write about the same thing, this isn’t very useful. For sites with eclectic content, this is a boon. You can quickly find whatever writings of mine tickle your fancy and skip over the subjects that bore you. How cool is that?

My goal is to put together the best site in the Daynotes circuit. I’ll never have the best content, but if I have reasonably good content and you can quickly find a whole lot of what you’re looking for (be it entertainment, stuff that makes you think, or technical content), I stand a ghost of a chance of reaching that goal.

I actually managed to implement More Like This without modifying any Greymatter code. First, I grabbed Meta Tag Search, a simple CGI script that searches on meta tags, from http://support.cws.net/hosting/cgiscripts.html. I followed the installation instructions. Like I said before, upload it in ASCII mode. You’ll save yourself a few hours and a lot of gray hair. Now, whenever I make an entry in Greymatter, I put a keywords meta tag at the very beginning of the entry. I don’t know if other search engines will find the tags there, but Meta Tag Search will, and that’s my primary concern. Then, at the end of the entry, I add collection of hyperlinks that call Meta Tag Search.

So now it takes a couple more minutes’ effort to make each post, but I think it’s worth it.

More like this: Linux Weblogs HTML CGI