Open sourcing code doesn’t necessarily mean people will rush to it

Open sourcing code doesn’t necessarily mean people will rush to it

John C. Dvorak wrote a nice layman’s introduction to open source on PCMag.com. But he makes at least one big false assumption.

Dvorak says he’d love to see old code open sourced. Some examples he sought, such as CP/M, CP/M-86, and GEM, have already been open source for years. Caldera, after buying the intellectual property of the former Digital Research from Novell, released just about everything that wasn’t directly related to DR-DOS, some of it as GPL, and some under other licenses. The results have hardly been earth shattering.

Read more

Optimizing dynamic Linux webservers

Linux + Apache + MySQL + PHP (LAMP) provides an outstanding foundation for building a web server, for, essentially, the value of your time. And the advantages over static pages are fairly obvious: Just look at this web site. Users can log in and post comments without me doing anything, and content on any page can change programmatically. In my site’s case, links to my most popular pages appear on the front page, and as their popularity changes, the links change.

The downside? Remember the days when people bragged about how their 66 MHz 486 was a perfectly good web server? Kiss those goodbye. For that matter, your old Pentium-120 or even your Pentium II-450 may not be good enough either. Unless you know these secrets…

First, the simple stuff. I talked about a year and a half ago about programs that optimize HTML by removing some extraneous tags and even give you a leg up on translating to cascading style sheets (CSS). That’s a starting point.

Graphics are another problem. People want lots of them, and digital cameras tend to add some extraneous bloat to them. Edit them in Photoshop or another popular image editor–which you undoubtedly will–and you’ll likely add another layer of bloat to them. I talked about Optimizing web graphics back in May 2002.

But what can you do on the server itself?

First, regardless of what you’re using, you should be running mod_gzip in order to compress your web server’s output. It works with virtually all modern web browsers, and those browsers that don’t work with it negotiate with the server to get non-compressed output. My 45K front page becomes 6K when compressed, which is better than a seven-fold increase. Suddenly my 128-meg uplink becomes more than half of a T1.

I’ve read several places that it takes less CPU time to compress content and send it than it does to send uncompressed content. On my P2-450, that seems to definitely be the case.

Unfortunately, mod_gzip is one of the most poorly documented Unix programs I’ve ever seen. I complained about this nearly three years ago, and the situation seems little improved.

A simple apt-get install libapache-mod-gzip in Debian doesn’t do the trick. You have to search /etc/apache/httpd.conf for the line that begins LoadModule gzip_module and uncomment it, then you have to add a few more lines. The lines to enable mod_gzip on TurboLinux didn’t save me this time–for one thing, it didn’t handle PHP output. For another, it didn’t seem to do anything at all on my Debian box.

Charlie Sebold to the rescue. He provided the following lines that worked for him on his Debian box, and they also worked for me:

# mod_gzip settings

mod_gzip_on Yes
mod_gzip_can_negotiate Yes
mod_gzip_add_header_count Yes
mod_gzip_minimum_file_size 400
mod_gzip_maximum_file_size 0
mod_gzip_temp_dir /tmp
mod_gzip_keep_workfiles No
mod_gzip_maximum_inmem_size 100000
mod_gzip_dechunk Yes

mod_gzip_item_include handler proxy-server
mod_gzip_item_include handler cgi-script

mod_gzip_item_include mime ^text/.*
mod_gzip_item_include mime ^application/postscript$
mod_gzip_item_include mime ^application/ms.*$
mod_gzip_item_include mime ^application/vnd.*$
mod_gzip_item_exclude mime ^application/x-javascript$
mod_gzip_item_exclude mime ^image/.*$
mod_gzip_item_include mime httpd/unix-directory
mod_gzip_item_include file .htm$
mod_gzip_item_include file .html$
mod_gzip_item_include file .php$
mod_gzip_item_include file .phtml$
mod_gzip_item_exclude file .css$

Gzipping anything below 400 bytes is pointless because of overhead, and Gzipping CSS and Javascript files breaks Netscape 4 part of the time.

Most of the examples I found online didn’t work for me. Charlie said he had to fiddle a long time to come up with those. They may or may not work for you. I hope they do. Of course, there may be room for tweaking, depending on the nature of your site, but if they work, they’re a good starting point.

Second, you can use a PHP accelerator. PHP is an interpreted language, which means that every time you run a PHP script, your server first has to translate the source code into machine language and run it. This can take longer than the output itself takes. PHP accelerators serve as a just-in-time compiler, which compiles the script and holds a copy in memory, so the next time someone accesses the page, the pre-compiled script runs. The result can sometimes be a tenfold increase in speed.

There are lots of them out there, but I settled on Ion Cube PHP Accelerator (phpa) because installation is a matter of downloading the appropriate pre-compiled binary, dumping it somewhere (I chose /usr/local/lib but you can put it anywhere you want), and adding a line to php.ini (in /etc/php4/apache on my Debian box):

zend_extension=”/usr/local/lib/php_accelerator_1.3.3r2.so”

Restart Apache, and suddenly PHP scripts execute up to 10 times faster.

PHPA isn’t open source and it isn’t Free Software. Turck MMCache is, so if you prefer GPL, you can use it.

With mod_gzip and phpa in place and working, my web server’s CPU usage rarely goes above 25 percent. Without them, three simultaneous requests from the outside world could saturate my CPU.

With them, my site still isn’t quite as fast as it was in 2000 when it was just serving up static HTML, but it’s awfully close. And it’s doing a lot more work.

 

Using your logs to help track down spammers and trolls

It seems like lately we’ve been talking more on this site about trolls and spam and other troublemakers than about anything else. I might as well document how I went about tracking down two recent incidents to see if they were related.
WordPress and b2 store the IP address the comment came from, as well as the comment and other information. The fastest way to get the IP address, assuming you haven’t already deleted the offensive comment(s), is to go straight to your SQL database.

mysql -p
[enter the root password] use b2database;
select * from b2comments where comment_post_id = 819;

Substitute the number of your post for 819, of course. The poster’s IP address is the sixth field.

If your blogging software records little other than the date and time of the message, you’ll have to rely on your Apache logs. On my server, the logs are at /var/log/apache, stored in files with names like access.log, access.log.1, and access.log.2.gz. They are archived weekly, with anything older than two weeks compressed using gzip.

All of b2’s comments are posted using a file called b2comments.post.php. So one command can turn up all the comments posted on my blog in the past week:

cat /var/log/apache/access.log | grep b2comments.post.php

You can narrow it down by piping it through grep a bit more. For instance, I knew the offending comment was posted on 10 November at 7:38 pm.

cat /var/log/apache/access.log | grep b2comments.post.php | grep 10/Nov/2003

Here’s one of my recent troublemakers:

24.26.166.154 – – [10/Nov/2003:19:38:28 -0600] “POST /b2comments.post.php HTTP/1.1” 302 5 “https://dfarq.homeip.net/index.php?p=819&c=1” “Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031007 Firebird/0.7”

This line reveals quite a bit: Besides his IP address, it also tells his operating system and web browser.

Armed with his IP address, you can hunt around and see what else your troublemaker’s been up to.

cat /var/log/apache/access.log | grep 24.26.166.154
zcat /var/log/apache.access.log.2.gz | grep 24.26.166.154

The earliest entry you can find for a particular IP address will tell where the person came from. In one recent case, the person started off with an MSN search looking for information about an exotic airplane. In another, it was a Google search looking for the words “Microsoft Works low memory.”

You can infer a few things from where a user originally came from and the operating system and web browser the person is using. Someone running the most recent Mozilla Firebird on Linux and searching with Google is likely a more sophisticated computer user than someone running a common version of Windows and the version of IE that was supplied with it and searching with MSN.

You can find out other things about individual IP addresses, aside from the clues in your logs. Visit ARIN to find out who owns the IP address. Most ARIN records include contact information, if you need to file a complaint.

Visit Geobytes.com IP Locator to map the IP address to a geographic region. I used the IP locator to determine that the guy looking for the airplane was in Brooklyn, and the Microsoft guy was in Minneapolis.

Also according to my Apache logs, the guy in Brooklyn was running IE 6 on Windows XP. The guy in Minneapolis was running Mozilla Firebird 0.7 on Linux. (Ironic, considering he was looking for Microsoft information.) It won’t hold up in a court of law, but the geographic distance and differing usage habits give at least some indication it’s two different people.

What needs to happen for Linux to make it on the desktop

I saw an editorial at Freshmeat that argued that there’s actually too much software for Linux. And you know what? It has a point.
I’m sure some people will be taken aback by that. The number of titles that run under Windows must number into six digits, and it’s hard to walk into a computer store and buy Linux software.

But I agree with his argument, or at least most of it. Back in my Amiga days, the first thing people used to ask me was, “What, do you not like software?” Then I asked why they felt the need to have their choice of 10 different word processors, especially when they’d just buy pirate Microsoft Word or WordPerfect anyway. (Let’s face it: Large numbers of people chose PCs in the early 90s over superior architectures was because they could pirate software from work. Not everyone. Maybe not even the majority. But a lot.) I argued that one competent software title in each category I needed was all I wanted or needed. And for the most part, the Amiga had that, and the software was usually cheaper than the Mac or PC equivalent.

Linux is the new Amiga. Mozilla is a far better Web browser than IE, and OpenOffice provides most of the functionality of Microsoft Office XP–it provides more functionality than most people use, and while it doesn’t always load the most complex MS Office documents correctly, it does a much better job of opening slightly corrupt documents and most people don’t create very complex documents anyway. But let’s face it: Its biggest problem is it takes an eternity to load no matter how fast your computer is. If it would load faster, people would be very happy with it.

But there is nothing that provides an equivalent to a simple database like Access or Filemaker. I know, they’re toys, and MySQL is far more powerful. But end users like dumb, brain-dead databases with clicky GUI interfaces on them that they can migrate to once they realize a spreadsheet isn’t intended to do what they’re trying to do with it. Everyone’s first spreadsheet is Excel. Then someday they realize Excel wasn’t intended to do what they’re using it for. But you don’t instantly dive into Oracle. You need something in between, and Linux doesn’t really have anything for that niche.

People are constantly asking me about a WYSIWYG HTML editor for Linux as well. I stumbled across one. Its name is GINF. Yes, another stupid recursive-acronym name. GINF stands for “GINF is not Frontpage.” How helpful. What’s wrong with a descriptive name like Webpage-edit?

More importantly, what was the first non-game application that caught your fancy? For most people I know, it was Print Shop, or one of the many knockoffs of Print Shop. People love to give and receive greeting cards, and when they can pick their own fonts and graphics and write their own messages, they love it even more. Not having to drive to the store and fork over $3.95 is just a bonus. Most IT professionals have no use for Print Shop, but Linux’s lack of alternatives in that department is hurting it.

Take a computer with a CPU on the brink of obsolesence, a so-so video chipset, 128 megs of RAM and the smallest hard drive on the market, preload Linux on it along with a fast word processor that works (AbiWord, or OpenOffice Writer, except it’s not fast), a nice e-mail client/PIM (Evolution), a nice Web browser (Mozilla), and a Print Shop equivalent (bzzzt!), and a couple of card games (check Freshmeat) and you’d have a computer for the masses.

The masses do not need 385 text editors. Sysadmin types will war over vi and emacs until the end of time; one or two simple text-mode editors as alternatives will suffice, and one or two equivalents of Notepad for X will suffice.

Linux’s day will eventually arrive regardless, if only because Microsoft is learning what every monopolist eventually learns: Predatory pricing stops working once you corner the market. Then you have to raise prices or find new markets. Eventually you run out of worthwhile markets. So in order to sustain growth, you have to raise prices. Microsoft is running out of markets, so it’s going to have to raise prices. Then it will be vulnerable again, just like Apple and CP/M were vulnerable to Microsoft because their offerings cost more than Microsoft was willing to charge. And, as Microsoft showed Netscape, you can’t undercut free.

But that day will arrive sooner if it doesn’t take a week to figure out the name of the Linux equivalent of Notepad because there are 385 icons that vaguely resemble a notepad and most of them have meaningless names.

Optimizing a web server

Promises of better Apache performance have me lusting after lingerd, a very obscure utility that increases performance for dynamic content. It’s been used on a handful of little sites you might have heard of: Slashdot, Newsforge, and LiveJournal.
Unfortunately there’s no Debian package, which means compiling it myself, which means compiling Apache myself, which also means compiling PHP and MySQL, which means a big ol’ pain, but potentially better performance since I could go crazy on the GCC optimization flags. Hello, -O3 -march=i686!

And if I’m going to compile all that myself, I figure I might as well compile it all myself and get the high performance across the board and get GCC 3.2x into the picture for even better performance. The easy way to do that is with lfs-install, which builds a system based on Linux From Scratch. For workstations I’d rather use something along the lines of Gentoo, but for servers, LFS is small, mature, and reasonably conservative.

Supposedly metalog offers improved performance over the more traditional syslogd or sysklogd. The good news is that those who are more sane than me and sticking with Debian for everything can take advantage of a Debian package (at least in unstable), and just apt-get away.

If I have any sanity left, I’ll think about minit to replace SystemVInit and save me about 400K of memory in a process that’s always running, and fgetty to save me a little more. I’ve tried fgetty in the past without success; it turns out fgetty requires DJB’s checkpassword in order to work.

Keep in mind I haven’t tried any of this yet. But the plan sounds so good in my current sleep-deprived state I couldn’t help but share it.

Roll your own news aggregator in PHP

M.Kelley: I’m also wondering how hard would it be to pull a PHP/MySQL (or .Net like BH uses) tool to scrape the syndicated feeds off of websites and put together a dynamic, constantly updated website.
It’s almost trivial. So simple that I hesitate to even call it “programming.” And there’s no need for MySQL at all–it can be done with a tiny bit of PHP. Since it’s so simple, and potentially so useful, it’s a great first project in PHP.

It’s also terribly addictive–I quickly found myself assembling my favorite news sources and creating my own online newspaper. To a former newspaper editor (hey, they were student papers, but one of them was at Mizzou, and in my book, if you can be sued for libel and anyone will care, it counts), it’s great fun.

All you need is a little web space and a writable directory. If you administer your own Linux webserver, you’re golden. If you have a shell account on a Unix system somewhere, you’re golden.

First, grab ShowRDF.php by Ian Monroe, a simple GPL-licensed PHP script that does all the work of grabbing and decoding an RDF or RSS file. There are tons of tutorials online that tell you how to code your own solution to do this, but I like this one because you can pass options to it to limit the number of entries, and the length of time to cache the feed. Many RDF decoders fetch the file every time you call them, and some feeds impose a once-an-hour limit and yell at you (or just flat ban you) if you go over. Using existing code is a good way to get started; you can write your own decoder that works the way you want at some later date.

ShowRDF includes a PHP function called InsertRDF that uses the following syntax:
InsertRDF("feed URL", "name of file to cache to", TRUE, number of entries to show, number of seconds to cache feed);

Given that, here’s a simple PHP page that grabs my newsfeed:


<html><body>

<?php include("showrdf.php"); ?>

<?php

// Gimme 5 entries and update once an hour (3600 seconds)

InsertRDF("https://dfarq.homeip.net/b2rss.xml", "~/farquhar.cache", TRUE, 5, 3600);

?>

</body></html>

And that’s literally all there is to it. That’ll give you a very simple HTML page with a bulleted list of my five most recent entries. Unfortunately it gives you the entries in their entirety, but that’s b2’s fault, and my fault for not modifying it. I’ll be doing that soon.

You can see the script in action by copying and pasting it into your Web server. It’s not very impressive, but it also wasn’t any effort either.

You can pretty it up by making yourself a nice table, or you can grab a nice CSS layout from glish.com.

I can actually code tables without stealing even more code, so here’s an example of a fluid three-column layout using tables that’ll make a CSS advocate’s skin crawl. But this’ll get you started, even if that’s the only useful purpose it serves.


<html><body>

<?php include("showrdf.php"); ?>

<table width="99%" border="0" cellpadding="6">

<tr>

<td colspan="3" align="left">
<h1>My personal newspaper</h1>
</td>

</tr>

<tr>

<td width="25%">

<!--- This is the leftmost column's contents -->

<!--- Hey, how about a navigation bar? -->

<?php include("navigationbar.html"); ?>

</td>

<!--- Middle column -->

<td width="50%">

<p><h1>Dave Farquhar</h1></p>

<?php

// Gimme 5 entries and update once an hour (3600 seconds)

InsertRDF("https://dfarq.homeip.net/b2rss.xml", "~/farquhar.cache", TRUE, 5, 3600);

?>

</td>

<!--- Right sidebar column -->

<td width="25%">

<p><h2>Freshmeat</h2></p>

<?php

InsertRDF("http://www.freshmeat.net/backend/fm-releases-software.rdf", "~/fm.cache", TRUE, 10, 3600);

?>

<p><h2>Slashdot</h2></p>

<?php

InsertRDF("http://slashdot.org/developers.rdf", "~/slash.cache", TRUE, 10, 3600);

?>

</td>

</tr>

</table>

</body></html>

Pretty it up to suit your tastes by adding color elements to the <td> tags and using font tags. Better yet, use the knowledge you just gained to sprinkle PHP statements into a pleasing CSS layout you find somewhere.

Finding newsfeeds is easy. You can find everything you ever wanted and then some at Newsisfree.com.

Using something like this, you can create multiple pages, just like a newspaper, and put links to each of your files in a file called navigationbar.html. Every time you create a new page containing a set of feeds, link to it in navigationbar.html, and all of your other pages will reflect the change. This shows another nice, novel use of PHP’s niceties–managing things like navigation bars is one of the worst things about static HTML pages. PHP makes it very convenient.

A b2 user looks longingly at Movable Type

This web site is in crisis mode.
I’ve been talking the past few days with a lot of people about blogging systems. I’ve worked with a lot of them. Since 1999, I’ve gone from static pages to Manilla to Greymatter to b2, and now, I’m thinking about another move, this time to Movable Type.

At the time I made each move, each of the solutions I chose made sense.

I really liked Manilla’s calendar and I really liked having something take care of the content management for me. I moved to Greymatter from Manilla after editthispage.com had one too many service outages. (I didn’t like its slow speed either. But for what I was paying for it, I couldn’t exactly complain.) Greymatter did everything Manilla would do for me, and it usually did it faster and better.

Greymatter was abandoned right around the time I started using it. But at the time it was the market leader, as far as blogs you ran on your own servers went. I kept on using it for a good while because it was certainly good enough for what I wanted to do, and because it was super-easy to set up. I was too chicken at the time to try anything that would require PHP and MySQL, because at the time, setting up Apache, PHP and MySQL wasn’t exactly child’s play. (It’s still not quite child’s play but it’s a whole lot easier now than it used to be.)

Greymatter remained good enough until one of my posts here got a hundred or so responses. Posting comments to that post became unbearably slow.

So I switched to b2. Fundamentally, b2 was pretty good. Since it wasn’t serving up static pages it wasn’t as fast as Greymatter, but when it came to handling comments, it processed the 219th comment just as quickly as it processed the first. And having a database backend opened up all sorts of new possibilities, like the Top 10 lists on the sidebar (courtesy of Steve DeLassus). And b2 had all the basics right (and still does).

When I switched to b2, a handful of people were using a new package called Movable Type. But b2 had the ability to import a Greymatter site. And Movable Type was written in Perl, like Greymatter, and didn’t appear to use a database backend, so it didn’t appear to be a solution to my problem.

Today, Movable Type does use a MySQL backend. And Movable Type can do all sorts of cool stuff, like pingbacks, and referrer autolinks. Those are cool. If someone writes about something I write and they link to it, as soon as someone follows the link, the link appears at the bottom of my entry. Sure, comments accomplish much the same thing, but this builds community and it gives prolific blogs lots of Googlejuice.

And there’s a six-part series that tells how to use Movable Type to implement absolutely every good idea I’ve ever had about a Weblog but usually couldn’t figure out how to do. There are also some ideas there I never conceived of.

In some cases, b2 just doesn’t have the functionality. In some cases (like the linkbacks), it’s so easy to add to b2 even I can do it. In other cases, like assigning multiple categories to a post, it’s difficult. I don’t doubt b2 will eventually get most of this functionality. But when someone else has the momentum, what to do? Do I want to forever be playing catch-up?

And that’s my struggle. Changing tools is always at least a little bit painful, because links and bookmarks go dead. So I do it only when it’s overwhelmingly worthwhile.

Movable Type will allow you to put links to related entries automatically. Movable Type will help you build meaningful metatags so search engines know what to do with you (MSN had no idea what to do with me for the longest time–I re-coded my page design a couple of weeks ago just to accomodate them). MT will allow you to tell it how much to put into your RSS feed (which I’m sure will draw cheers from the poor folks who are currently pulling down the entire story all the time).

MT doesn’t have karma voting, like Greymatter did (and I had Steve add to b2). I like it but I can live without it. I can probably get the same functionality from page reads. Or I can just code up a “best of” page by hand, using page reads, feedback, and gut feeling as my criteria.

The skinny: I’m torn on whether I should migrate. I stand to gain an awful lot. The main reason I have to stay with what I have is Steve’s custom code, which he worked awfully hard to produce, and some of it gives functionality that MT doesn’t currently have. Then again, for all I know it might not be all that hard to adapt his code to work with MT.

I know Charlie thought long and hard about switching. He’s one of the people I’ve been talking with. And I suspected he would be the first to switch. The biggest surprise to me when he did was that it took him until past 3 p.m. today to do it.

And I can tell you this. If I were starting from scratch, I’d use Movable Type. I doubt I’d even look at anything else.

apt-get install aclue

My boss called a meeting mid-week last week, and if all goes well, there’ll be some changes at work. That’s a very good thing.
I deliberately don’t write about work very often, and only in vague terms when I do, because some things I wrote about work in the past came back to bite me.

I’ve thought blogs were a very useful tool for a long time. When I started my career in 1997, I found myself gravitating towards some embryonic blog-like sites that offered technical information. Eventually enough people egged me into starting one myself. I found myself posting the solutions to my technical problems there, since searching there was much easier than with any tools we had at work. It’s a good way to work in the public eye and solicit ideas and feedback.

Well, my boss took notice. I blog, and so does one of my coworkers (I hesitate to mention him by name, as it might give away my employer, which I’d still rather not do). He visits from time to time, though the only time he’s tried to post a comment, my DSL connection went down (he naturally asked what I was doing to sabotage IE).

At the meeting, where we were talking about new ways to do things, he asked me point-blank to “Set up a weblog like you and [the guy in the cube next to me] have.”

So this morning I asked my mentor in the cube next to me for a MySQL account on one of our Linux servers. Then I installed Movable Type, mostly because both of us have heard great things about it but neither of us (so far) has been willing to risk everything by switching to it. (I know it’s not free for commercial use; call this “evaluation.” For all I know we’ll end up using b2, which is under the GPL, because for internal, intranet purposes, I don’t know that MT offers anything that b2 doesn’t. But if the boss decides he wants us to go live with MT, we’ll fork over the $150.)

The idea is, we can all log onto the blog at the end of the day and write down any significant things we did. Along the way, hopefully we’ll all learn something. And, as far as I can tell, we won’t block our clients from seeing the blog either. That way they can catch a glimpse into what we do. They won’t understand it all (I know I won’t understand all the VMS stuff on there, and the VMS guys may not understand all the NT stuff) but they’ll see something.

We talked about the cluetrain philosophy a little bit. Essentially, both of us understand it as the idea of being completely open, or at least as open as possible, with the customer. Let them see the internal operations. Let them make suggestions. Let them participate in the design of the product or service.

And I think that’s good up to a point.

Robert Lutz, one of the executives who turned Chrysler around before Daimler-Benz bought the automaker and ran it into the ground, wrote a marketing book called Guts: The Seven Laws of Business That Made Chrysler the World’s Hottest Car Company. I’ve got a copy of it on my shelf at work. One of the chapters of the book is titled, “The Customer Isn’t Always Right.” He argued that customers will follow trends and not necessarily tell the truth. Put out a survey asking people if they’d like a heated cupholder in their car, and most of them will say, yes, they’d love a heated cupholder. Everybody knows that a heated cupholder is a useless gadget no one will use, it won’t work right, and it’ll increase the cost of the car without adding any value, but nobody wants to look cheap.

Lutz argued that experts should make decisions. Since cars are the love of Lutz’s life, Lutz knows how to make killer cars. Lutz observed that the redesigned Dodge Ram pickup elicited extreme reactions. People either loved it or hated it. 70% of respondents loved it; 30% of respondents said they’d never go near the thing. Lutz argued that their then-current design had roughly 30% marketshare, so if half the people who said they loved it bought one, they’d gain 5%. So they brought it to market, and gained marketshare.

I suspect the biggest reason why the cluetrain philosophy works is that it helps to make you experts. See enough opinions, and you’ll learn how to recognize the good ones. When you’re clueless, the cluetrain people are right and you look like geniuses. Eventually, you stop being clueless, and at that point, Lutz is right.

The main reason I’m excited about having a blog in place at work isn’t because blogs in IT are trendy and popular and glitzy. (I’d still be using an Amiga if I could get a 68060 accelerator and a Zorro II Ethernet board without spending a grand.) I’m excited about blogs because I think it’ll get us a clue.

My boss typed apt-get install aclue at work today. I don’t think that’ll get us anything. Bgirwf that blog doesn’t get us a clue, I don’t think anything will.

An easy way to get Debian 3.0 before you can buy it

Debian 3.0 hasn’t officially been released yet, but that hasn’t stopped people from making unofficial installation floppies and CDs.
I just built a Debian 3.0 system that will be hosting this site and another (I’m not going to talk yet about the other site, but it won’t be hosted by R. Collins Farquhar IV–do I hear cheers?–and it won’t be fiction). I used this 185 MB CD image to do the install. The system used up a whopping 88 megs when I finished initial installation. After I installed Apache, MySQL and PHP4 to make a usable web server, disk usage rocketed to 118 megs. Not shabby at all in this era of multi-gigabyte installs. Read more

Linkfest Friday…

Let’s start things off with some links. Web development’s been on my mind the last few days. There’s a whole other world I’ve been wanting to explore for a couple of years, and I’ve finally collected the information that’ll let me do it.
Redirecting virus attacks — Your neighbor’s got Nimda? Here’s how to get his IIS server to quit harassing your Apache server. (Suggests redirecting to a bogus address; I’m inclined to redirect either to 127.0.0.1 or www.microsoft.com, personally.)

DJG’s help setting up MySQL. Apache, MySQL and PHP are a fabulous combination, but bootstrapping it can be a painful process. People talk about writing a sendmail.cf file as their loss of innocence, but I’ve written one of those and I’ve tried to set up the LAMP quartet. The sendmail.cf file was easier because there’s a whole lot more written about it.

Short version: Use Debian. Forget all the other distributions, because they’ll install the pieces, but rarely do they put the conduits in place for the three pieces to talk. It’s much easier to just download and compile the source. If that doesn’t sound like fun to you, use Debian and save some heartache. If you’re stuck with the distro you have, download ApacheToolbox and use it. You’ll probably have to configure your C/C++ compiler and development libraries. That’s not as bad as it sounds, but I’m biased. I’ve compiled entire distributions by hand–to the point that I’ve taken Linux From Scratch, decided I didn’t like some of the components they used because they were too bloated for me, and replaced them with slimmer alternatives. (The result mostly worked. Mostly.) You’ve gotta be a bit of a gearhead to take that approach.

Debian’s easier. Let’s follow that. Use this command sequence:

apt-get install apache
apt-get install php4-mysql
apt-get install mysql-server

Next, edit /etc/apache/httpd.conf. There’s a commented-out line in there that loads the php4 module. Uncomment that. Just search for php. It’ll be the third or fourth instance. Also, search for index.html. To that line, add the argument index.php. If you make index.php the first argument, access to PHP pages will be slightly faster. Pull out any filetypes you’re not using–if you’ll never make an index page called anything but index.html or index.php, pull the others and Apache will perform better.

Got that? Apache’s configured. Yes, the php installation could make those changes for you. It doesn’t. I’m not sure why. But trust me, this is a whole lot less painful than it is under Red Hat.

But you’re not ready to go just yet. If you try to go now, MySQL will just deny everything. Read this to get you the rest of the way.

Once you’ve got that in place, there are literally thousands of PHP and PHP/MySQL apps and applets out there. If you can imagine it, you can build it. If HTML is a 2D world, PHP and MySQL are the third and fourth dimension.

Am I going to be playing in that world? You’d better believe it. How soon? It depends on how quickly I can get my content whipped into shape for importing.

This is the holy grail. My first editing job was doing markup for the Digital Missourian, which the faculty at the University of Missouri School of Journalism believe was the first electronic newspaper (it came into being in 1986 or so). By the time I was working there in the late summer of 1995, it had been on the ‘Net for several years. About eight of us sat in a room that was originally a big storage closet, hunched in front of 486s, pulling stories off the copydesk, adding HTML markup, and FTPing them to a big Unix cluster on the MU campus. We ran a programmable word processor called DeScribe, and we worked out some macros to help speed along the markup.

No big operation works that way anymore. There aren’t enough college students in the world. You feed your content to a database, be it Oracle or IBM DB2 or Microsoft SQL Server or MySQL or PostgresSQL. Rather than coding in straight HTML, you use a scripting language–be it PHP or ASP–that queries the database, pulls the content, applies a template, and generates the HTML on the fly. The story goes from the copy editor’s desk to the Web with no human intervention.

There are distinct advantages to this approach even for a small-time operation like me. Putting the content in a database gives you much more versatility. Some people want overdesigned Web sites. Some want something middle-ground, like this one. Others want black text on a gray background like we had in 1994. You can offer selectable formats to them. You can offer printer-friendly pages. You can even generate PDFs on the fly if you want–something some sites are doing now in an effort to gain revenue. If you have content from various sources, you can slice and dice and combine it in any imaginable way.

I can’t wait.