Roll your own news aggregator in PHP

M.Kelley: I’m also wondering how hard would it be to pull a PHP/MySQL (or .Net like BH uses) tool to scrape the syndicated feeds off of websites and put together a dynamic, constantly updated website.
It’s almost trivial. So simple that I hesitate to even call it “programming.” And there’s no need for MySQL at all–it can be done with a tiny bit of PHP. Since it’s so simple, and potentially so useful, it’s a great first project in PHP.

It’s also terribly addictive–I quickly found myself assembling my favorite news sources and creating my own online newspaper. To a former newspaper editor (hey, they were student papers, but one of them was at Mizzou, and in my book, if you can be sued for libel and anyone will care, it counts), it’s great fun.

All you need is a little web space and a writable directory. If you administer your own Linux webserver, you’re golden. If you have a shell account on a Unix system somewhere, you’re golden.

First, grab ShowRDF.php by Ian Monroe, a simple GPL-licensed PHP script that does all the work of grabbing and decoding an RDF or RSS file. There are tons of tutorials online that tell you how to code your own solution to do this, but I like this one because you can pass options to it to limit the number of entries, and the length of time to cache the feed. Many RDF decoders fetch the file every time you call them, and some feeds impose a once-an-hour limit and yell at you (or just flat ban you) if you go over. Using existing code is a good way to get started; you can write your own decoder that works the way you want at some later date.

ShowRDF includes a PHP function called InsertRDF that uses the following syntax:
InsertRDF("feed URL", "name of file to cache to", TRUE, number of entries to show, number of seconds to cache feed);

Given that, here’s a simple PHP page that grabs my newsfeed:


<html><body>

<?php include("showrdf.php"); ?>

<?php

// Gimme 5 entries and update once an hour (3600 seconds)

InsertRDF("https://dfarq.homeip.net/b2rss.xml", "~/farquhar.cache", TRUE, 5, 3600);

?>

</body></html>

And that’s literally all there is to it. That’ll give you a very simple HTML page with a bulleted list of my five most recent entries. Unfortunately it gives you the entries in their entirety, but that’s b2’s fault, and my fault for not modifying it. I’ll be doing that soon.

You can see the script in action by copying and pasting it into your Web server. It’s not very impressive, but it also wasn’t any effort either.

You can pretty it up by making yourself a nice table, or you can grab a nice CSS layout from glish.com.

I can actually code tables without stealing even more code, so here’s an example of a fluid three-column layout using tables that’ll make a CSS advocate’s skin crawl. But this’ll get you started, even if that’s the only useful purpose it serves.


<html><body>

<?php include("showrdf.php"); ?>

<table width="99%" border="0" cellpadding="6">

<tr>

<td colspan="3" align="left">
<h1>My personal newspaper</h1>
</td>

</tr>

<tr>

<td width="25%">

<!--- This is the leftmost column's contents -->

<!--- Hey, how about a navigation bar? -->

<?php include("navigationbar.html"); ?>

</td>

<!--- Middle column -->

<td width="50%">

<p><h1>Dave Farquhar</h1></p>

<?php

// Gimme 5 entries and update once an hour (3600 seconds)

InsertRDF("https://dfarq.homeip.net/b2rss.xml", "~/farquhar.cache", TRUE, 5, 3600);

?>

</td>

<!--- Right sidebar column -->

<td width="25%">

<p><h2>Freshmeat</h2></p>

<?php

InsertRDF("http://www.freshmeat.net/backend/fm-releases-software.rdf", "~/fm.cache", TRUE, 10, 3600);

?>

<p><h2>Slashdot</h2></p>

<?php

InsertRDF("http://slashdot.org/developers.rdf", "~/slash.cache", TRUE, 10, 3600);

?>

</td>

</tr>

</table>

</body></html>

Pretty it up to suit your tastes by adding color elements to the <td> tags and using font tags. Better yet, use the knowledge you just gained to sprinkle PHP statements into a pleasing CSS layout you find somewhere.

Finding newsfeeds is easy. You can find everything you ever wanted and then some at Newsisfree.com.

Using something like this, you can create multiple pages, just like a newspaper, and put links to each of your files in a file called navigationbar.html. Every time you create a new page containing a set of feeds, link to it in navigationbar.html, and all of your other pages will reflect the change. This shows another nice, novel use of PHP’s niceties–managing things like navigation bars is one of the worst things about static HTML pages. PHP makes it very convenient.

4 thoughts on “Roll your own news aggregator in PHP

  • December 19, 2002 at 9:43 pm
    Permalink

    Wow, I’m really amazed at how simple that looks. I’m already thinking about how to utilize that inside of a Movable Type template. Thanks

  • December 20, 2002 at 1:26 pm
    Permalink

    That was fun. Though it took me a moment to figure out what was wrong with the “~/file” parameter for my server. Duh. You should really do the headings extraction though.

  • December 20, 2002 at 1:38 pm
    Permalink

    What I’m trying to figure out now, is how to have the script generate the site’s name and main link that’s included in most rss files. I’m doing some experimenting with it. I’ve emailed the author of ShowRDF and he gave me a few suggestions that I’ll pass along if they work.

    Yeah, Bo I had that same problem, and I finally realized what it was about 10 minutes into it. Works great now.

  • December 20, 2002 at 1:50 pm
    Permalink

    ian sent this about my question

    Could you give a link to the RSS feed your talking about? However, I think I know what your talking about, seems like I remember seeing RSS feeds with titles. Have you tried using ParseIt(“”, “”, $rdf)?
    Assuming ParseIt looks for the first instance of , that should work (I suppose I should know, I wrote the function but it uses some built-in PHP functions and I’m not sure how they work). Ditto for . Assign the results of both to variables and then you can format it and write it to the file using fputs, you can see how to do that from the source.

    If you get it working right, send it to me and I’ll put it in the source
    on my website.

    Ian Monroe
    http://www.monroe.nu

Comments are closed.

%d bloggers like this:
WordPress Appliance - Powered by TurnKey Linux