The Silicon Underground
  Welcome to Dave Farquhar's Silicon Underground Tuesday, November 24 2009 @ 06:19 PM CST  
Theme Changer
Change the look of the site by selecting a theme below:

What's New
STORIES
No new stories

COMMENTS last 48 hrs
No new comments

LINKS last 2 wks
No recent new links

Google Ads

User Functions
Username:

Password:

Don't have an account yet? Sign up as a New User

Firefox


Roll your own news aggregator in PHP   
Thursday, December 19 2002 @ 06:15 PM CST
By David L. Farquhar

M.Kelley: I'm also wondering how hard would it be to pull a PHP/MySQL (or .Net like BH uses) tool to scrape the syndicated feeds off of websites and put together a dynamic, constantly updated website.

It's almost trivial. So simple that I hesitate to even call it "programming." And there's no need for MySQL at all--it can be done with a tiny bit of PHP. Since it's so simple, and potentially so useful, it's a great first project in PHP.

It's also terribly addictive--I quickly found myself assembling my favorite news sources and creating my own online newspaper. To a former newspaper editor (hey, they were student papers, but one of them was at Mizzou, and in my book, if you can be sued for libel and anyone will care, it counts), it's great fun.

All you need is a little web space and a writable directory. If you administer your own Linux webserver, you're golden. If you have a shell account on a Unix system somewhere, you're golden.

First, grab ShowRDF.php by Ian Monroe, a simple GPL-licensed PHP script that does all the work of grabbing and decoding an RDF or RSS file. There are tons of tutorials online that tell you how to code your own solution to do this, but I like this one because you can pass options to it to limit the number of entries, and the length of time to cache the feed. Many RDF decoders fetch the file every time you call them, and some feeds impose a once-an-hour limit and yell at you (or just flat ban you) if you go over. Using existing code is a good way to get started; you can write your own decoder that works the way you want at some later date.

ShowRDF includes a PHP function called InsertRDF that uses the following syntax:
InsertRDF("feed URL", "name of file to cache to", TRUE, number of entries to show, number of seconds to cache feed);

Given that, here's a simple PHP page that grabs my newsfeed:


<html><body>

<?php include("showrdf.php"); ?>

<?php

// Gimme 5 entries and update once an hour (3600 seconds)

InsertRDF("http://dfarq.homeip.net/b2rss.xml", "~/farquhar.cache", TRUE, 5, 3600);

?>

</body></html>



And that's literally all there is to it. That'll give you a very simple HTML page with a bulleted list of my five most recent entries. Unfortunately it gives you the entries in their entirety, but that's b2's fault, and my fault for not modifying it. I'll be doing that soon.

You can see the script in action by copying and pasting it into your Web server. It's not very impressive, but it also wasn't any effort either.

You can pretty it up by making yourself a nice table, or you can grab a nice CSS layout from glish.com.

I can actually code tables without stealing even more code, so here's an example of a fluid three-column layout using tables that'll make a CSS advocate's skin crawl. But this'll get you started, even if that's the only useful purpose it serves.


<html><body>

<?php include("showrdf.php"); ?>



<table width="99%" border="0" cellpadding="6">

<tr>



<td colspan="3" align="left">
<h1>My personal newspaper</h1>
</td>

</tr>

<tr>

<td width="25%">

<!--- This is the leftmost column's contents -->

<!--- Hey, how about a navigation bar? -->

<?php include("navigationbar.html"); ?>

</td>




<!--- Middle column -->

<td width="50%">

<p><h1>Dave Farquhar</h1></p>

<?php

// Gimme 5 entries and update once an hour (3600 seconds)

InsertRDF("http://dfarq.homeip.net/b2rss.xml", "~/farquhar.cache", TRUE, 5, 3600);

?>

</td>



<!--- Right sidebar column -->

<td width="25%">

<p><h2>Freshmeat</h2></p>

<?php

InsertRDF("http://www.freshmeat.net/backend/fm-releases-software.rdf", "~/fm.cache", TRUE, 10, 3600);

?>



<p><h2>Slashdot</h2></p>

<?php

InsertRDF("http://slashdot.org/developers.rdf", "~/slash.cache", TRUE, 10, 3600);

?>

</td>



</tr>

</table>

</body></html>



Pretty it up to suit your tastes by adding color elements to the <td> tags and using font tags. Better yet, use the knowledge you just gained to sprinkle PHP statements into a pleasing CSS layout you find somewhere.

Finding newsfeeds is easy. You can find everything you ever wanted and then some at Newsisfree.com.

Using something like this, you can create multiple pages, just like a newspaper, and put links to each of your files in a file called navigationbar.html. Every time you create a new page containing a set of feeds, link to it in navigationbar.html, and all of your other pages will reflect the change. This shows another nice, novel use of PHP's niceties--managing things like navigation bars is one of the worst things about static HTML pages. PHP makes it very convenient.

  [ Views: 3286 ]  


Roll your own news aggregator in PHP | 4 comments | Create New Account
The following comments are owned by whomever posted them. This site is not responsible for what they say.
M.Kelley
Authored by: ImportedComment on Thursday, December 19 2002 @ 09:43 PM CST
Wow, I'm really amazed at how simple that looks. I'm already thinking about how to utilize that inside of a Movable Type template. Thanks

[ Reply to This ]

Bo
Authored by: ImportedComment on Friday, December 20 2002 @ 01:26 PM CST
That was fun. Though it took me a moment to figure out what was wrong with the "~/file" parameter for my server. Duh. You should really do the headings extraction though.

[ Reply to This ]

M.Kelley
Authored by: ImportedComment on Friday, December 20 2002 @ 01:38 PM CST
What I'm trying to figure out now, is how to have the script generate the site's name and main link that's included in most rss files. I'm doing some experimenting with it. I've emailed the author of ShowRDF and he gave me a few suggestions that I'll pass along if they work.

Yeah, Bo I had that same problem, and I finally realized what it was about 10 minutes into it. Works great now.

[ Reply to This ]

M.Kelley
Authored by: ImportedComment on Friday, December 20 2002 @ 01:50 PM CST
ian sent this about my question

Could you give a link to the RSS feed your talking about? However, I think I know what your talking about, seems like I remember seeing RSS feeds with titles. Have you tried using ParseIt("", "", $rdf)?
Assuming ParseIt looks for the first instance of , that should work (I suppose I should know, I wrote the function but it uses some built-in PHP functions and I'm not sure how they work). Ditto for . Assign the results of both to variables and then you can format it and write it to the file using fputs, you can see how to do that from the source.

If you get it working right, send it to me and I'll put it in the source
on my website.

Ian Monroe
http://www.monroe.nu

[ Reply to This ]

What's Related
  • M.Kelley
  • ShowRDF.php
  • glish.com
  • Newsisfree.com
  • More by DaveF
  • More from General

  • Story Options
  • Mail Story to a Friend
  • Printable Story Format


  • Calendar
    November 2009
    SuMoTuWeThFrSa
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    1
    2
    3
    4
    5
    Click on any day to see postings and events for that date.

    Referrals

    Top 10 by Comments
    Story TitleComments
    Cheap laptops from Sotec 253
    An untrustworthy vendor 164
    Upgrading an eMachine 125
    eMachine upgrade advice 99
    Why I dislike Microsoft 51
    Upgrade diary: Gateway G6-400 35
    And we're live 30
    The day after the Columbia 22
    How to pray 22
    CD-ROM troubleshooting under Windows 9x 20

    Top 10 Read
    Story TitleViews
    eMachine upgrade advice 74353
    Upgrading an eMachine 63106
    How to view a blg file in Windows 2000 50661
    Cheap laptops from Sotec 32806
    Upgrade diary: Compaq Presario 7360 20003
    Upgrade diary: Gateway G6-400 19881
    CD-ROM troubleshooting under Windows 9x 15559
    Finding an open-source alternative to Ghost 14300
    Big trouble 13827
    Salary cap? Baseball needs something 11806

    Topics
    Home
    Apache (2)
    Baseball (63)
    Book reviews (2)
    Business (1)
    Christianity (57)
    Cooking (1)
    Copyright (16)
    Curmudgeonry (1)
    Design (7)
    DOS (6)
    Games (4)
    Genealogy (11)
    General (507)
    Hardware (168)
    Health (13)
    Human Interest (9)
    Humor/Satire (19)
    Investing (4)
    Journalism (1)
    Linux (93)
    Macintosh (22)
    Model Building (3)
    Music (33)
    net.culture (40)
    Personal (88)
    Photography (6)
    Politics (3)
    Retro Computing (26)
    Saving money (72)
    Servers and Networking (18)
    Society (49)
    Software (55)
    Spam (13)
    St. Louis (23)
    This weblog (14)
    Toy trains (74)
    Troubleshooting (7)
    Useless Trivia (1)
    Vendors (6)
    Video (21)
    Viruses (12)
    Windows (120)
    Writing (16)

    Older Stories
    Wednesday 30-Sep
  • 401(K) Paperwork (0)

  • Sunday 27-Sep
  • First impressions: HP Mini 110 (1)

  • Saturday 26-Sep
  • Getting more screen real estate in Firefox (0)

  • Wednesday 23-Sep
  • Barfy. (4)

  • Monday 21-Sep
  • Why I quit my job (2)

  • Saturday 12-Sep
  • Slimming down Windows XP for SSDs and nettops (0)

  • Thursday 10-Sep
  • And... bailing out. (3)

  • Friday 04-Sep
  • End of the innocence (0)

  • Monday 31-Aug
  • Installing Windows off USB (1)

  • Friday 21-Aug
  • Diving into real estate (0)

  • Who's Online
    Guest Users: 10

    Syndicate!
    Get your RSS/RDF fix here.

    List of all stories
    Click here for a list of all the entries on this site


    Created this page in 1.04 seconds


     Copyright © 2009 Dave Farquhar's Silicon Underground
     All trademarks and copyrights on this page are owned by their respective owners.

    Powered by GL 1.3.x