Entry 110 - Turning Del.icio.us Feeds into Content: Part 1


Turning Del.icio.us Feeds into Content: Part 1

Social Bookmarking Kicks Ass.

Anyone who has discovered the benefits of del.icio.us knows how easy it makes gathering up the good stuff you find on the web; It is one of those brilliant ideas along the lines of the paperclip. The simplicity of the system is its greatest asset, and the potential for even greater flexibility and ingenuity has been realized with doodads like the del.icio.us extension for Firefox, which makes taking advantage of the benefits of social bookmarking almost effortless.

Really?

Yep. The real beauty, though, lies in the RSS feeds created by del.icio.us. These feeds act like a personal doodlepad for power surfers, and with a little elbow grease, some organization, and some PHP, you can seamlessly integrate these links into your website—allowing you to manage and freshen your content from anywhere you might find a connection to the int'rweb.

So what are you on about here?

Well, I know this is not the newest idea, and many types of blogware include this sort of functionality, but I don't use Wordpress or Movable Type, so I had to figure it out on my own. I have decided to write down how I do it, putting the code in fancy colors so others who might be interested in a nifty source of content for their website might do the same.

Okay then, Giddyup.

Am I Ready?

To Begin!

This method has four distinct parts:

  1. A script that grabs remote RSS files from del.icio.us and caches them on the local web server
  2. A script to read these files and output them as proper HTML
  3. A method for easy and flexible inclusion into your existing pages
  4. A cron job to automate the RSS retrieval process.

I'm going to explain these steps individually, trying my best to keep them as coherent as possible.

First, In

1. As I mentioned, the first step in the process is setting up a script to retrieve your feeds over the web. There are practically thousands of ways to do this, but I am just gonna dump my version on you.

<?php


/* feed_puller.php(s) ************************ *** ** *

Simple RSS grabber, run with wget and a CRON job

CLI version : 
http://www.hinkybox.com/devpages/delish-ex3.phps

All code by JBE, 2004 hinkybox.com (GPL)
http://www.hinkybox.com  (sysadmin@hinkybox.com)

*/

// This is the name of the feed you
// want to capture and cache. In this
// example, it is called from a URL,
// like this:
//
// feed_puller.php?feed=http://otherserver.com/feed.rss
//

// make sure this script is only run locally, 
// or specify an IP that is trusted
//

if ($_SERVER['REMOTE_ADDR'] != $_SERVER['SERVER_ADDR']) {

// Lie to the scoundrels!
//

die("Illegal operation. This has been logged. "
    . $_SERVER['REMOTE_ADDR']);

}

$source = $_GET['feed'];

// path to cache directory
// make sure this directory is writeable
//

$path = "/path/to/feeds/cache";



// Your user name on the server. 
// This is used to automatically change the file 
// ownership of your cached feeds.
// You can probably leave this blank if you are 
// in a shared hosting environment, as your only 
// access will be within your user account.

$usr = "apache";

// Permissions.  I use 755, which works fine. 
// change this if you need to. 
//
$perm = 0755;


/* Establishes naming convention 
   for RSS cache file 
******************************************************/

// Choose a default file extension. 
// They are all XML after all, so this should do.
//
$ext = "xml";

// Feed prefix, can be anything 
//
$prefix = "feed_";

// rip out protocol prefix
//
$resource = str_replace("http://", "", $source);

// rip out forward slashes
//
$resource = strtolower(str_replace("/", "_", $resource));

// if the file is a directory, usually something mod_rewritten
// append an .xml for good recordkeeping
//
if(substr($resource, -4, 4) != ".".$ext ) {

$rssfile = $prefix.$resource.".xml";

} else {

$rssfile = $prefix.$resource;

}
// RFC 2822 Date format
//
$timestamp =  date("r");                        


// @ supresses PHP warnings, uses custom error messages.
// 
$feed_url = @file_get_contents($source);

/* file handling 
******************************************************/

// if the URL is accessible, open that puppy
if ($feed_url) {
  $feed = utf8_encode($feed_url); 
echo "Feed grab from ".$source." successful.\\n\\n";

} else {
// Output error, for debugging
 echo "Feed grab from ".$source." has failed.\\n\\n";    

 exit(0);

}

if (!file_exists($path.'/'.$rssfile)) {

    $rsswrite = fopen($path.'/'.$rssfile, "x+");
      fwrite($rsswrite, $feed);
  fclose($rsswrite);

// automagically change file permissions to $permissions
//
chmod($path.'/'.$rssfile, $perm);


// only change ownership if necessary
//
if(!empty($usr)) {

// assign ownership to apache
//
chown($path.'/'.$rssfile, $usr);
// assign group rights to apache
//
chgrp($path.'/'.$rssfile, $usr);

}

echo "New cache file is\\n\\n "
     .$path.'/'.$rssfile."\\n\\n"
     ."Created ".date('r')."\\n\\n";

} else {

    $rsswrite = fopen($path.'/'.$rssfile, "w+");
      fwrite($rsswrite, $feed);
  fclose($rsswrite);

echo "Putting newest version of feed:\\n\\n "
      .$path.'/'.$rssfile.".\\n\\n" 
      ."Created ".$timestamp."\\n\\n";
}



?>

This script is pretty much a skeletal implementation of PHP's filesystem manipulation capabilities. Here is a generalized idea of how the script works:

The script is called as if it was an HTML page, usually with a linux program called wget. By default, this script is prohibited from use by a user agent not located on the same server, but it is possible to specify a remote server IP that is allowed to process the feeds. I used this method for about a year, running a cron job from a protected server within my LAN. For safety's sake, you should not allow blanket access to this script, instead restricting it to a trusted server.

Using the URL passed as a $_GET['feed'] variable, it retrieves the file and writes it to the specified cache location. The RSS url is transmogrified with a little PHP string two-stepping; for example, a feed located at

http://www.foobar.baz/blog/foobar_rss.xml

would be cached as

feed_www.foobar.baz_blog_foobar_rss.xml

There is some text output associated with this file; its error states are useful for debugging. When using wget, a command-line switch redirects the standard hypertext output (what would normally show up in your browser window) to /dev/null—geekspeak for the void.

An alternative and much preffered method utilizes the PHP Command Line Interpreter, and bypasses the need for an http client such as wget altogether. Not a lot of folks are comfortable with the command line, and usually they don't have shell access in the traditional sense. I've provided a source file for this alternative script, and will explain the differences in greater detail in the section about cron jobs.

Note: Depending on your hosting setup, you might need to know your username and php include path for this to work correctly. The script is designed to automagically change file permissions, owner and group upon initial creation of the cached file. This can be bypassed and configured manually, but that is a headache in most cases.

Then, Out

2. The real workhorse of this method is the XML interpreter that reads, parses and formats the feeds for inclusion into your pages. Two such parsers, PEAR XML_RSS and MagpieRSS are ideal for this purpose; though in my experience, magpieRSS is a nice, lightweight way to get the results needed without a lot of fancy extras that only serve to slow script execution. XML_RSS is a great tool to have for more complex projects, but for now, I'll just assume we are working with Magpie.

Get MagpieRSS and install it a directory of your choosing–somewhere safe, but accessible by your scripts. There are multiple files in the package, but most of them are used as includes when needed by the main script, rss_parse.inc.php.

As soon as you have the parser where you want it, you can start sticking together the scripty bits. You'll need to have an RSS file handy for testing; if you don't already have a cached version of a del.icio.us feed, you can use this example page. Any well-formed RSS feed will do, but for the sake of continuity I supplied one from my del.icio.us account.

Now that the foundations are in place, you can begin to build the script to parse the XML in the feed. Here is a simple example script you can use as a framework or to get your paths and such in line:


<!-- 
We know this will be a definition list,
so we open the element here.
-->
<dl>
<?php


/* delish.php ************************ *** ** *

RSS parse-and-include script.
All code by JBE, 2004 hinkybox.com (GPL)
http://www.hinkybox.com  (sysadmin@hinkybox.com)

*/


// usually this is a server path,
// e.g., mine is /var/www/hb/public_html/feeds/scripts/
//
require_once('/path/to/rss_parse.inc.php');

// For now, the location of the test file. **
//
$rss_file = "/path/to/test/rss_file";

// PHP needs to "handle" the file and see what's in it
//
$rss_string = file_get_contents($rss_file);

// Magpie creates an XML Parser
//
$rss = new MagpieRSS( $rss_string );

// Proceed if the parser was created successfully 
// and there are no error states
//
if ( $rss and !$rss->ERROR) {

// The parser turns the feed into an array
//
$items = $rss->items;

// Loop through the array, pick out the 
// bits we want, and format them accordingly
//
foreach ($items as $item) {

// the link string from the feed
// that goes in the href=""
//
$href = $item['link'];

//  The title of the bookmark, used in the link
//
$title = $item['title'];

// the string representing the description 
// you've given the bookmark
//
$description = $item['description'];

// The markup we want for the links
// This can be as complex as you need, 
// but you should probably keep it flexible,
// so you can reuse it for multiple content sections    
//

echo '<dt><a href="'.$href.'">'.$title.'</a><dt>'."\n".
     '<dd>'.$description.'</dd>'."\n";    
    
// closes the foreach
//
}


// if there is any failure in the loop, exit 
// and issue an error message
//

    } else {

    echo $rss->ERROR;

    }

?>

<!-- 
"Close the definition list.
Aaaaaaaaaaaand SCENE!" 
-->
</dl> 

Now its time to test this script out and see if it works. The business end of the script is the HTML, and you can modify this to suit your needs; some people choose to use a list, like this:

<ul>
<li><a href="LINK">TITLE</a> – DESCRIPTION</li>
<li><a href="LINK">TITLE</a> – DESCRIPTION</li>
<li><a href="LINK">TITLE</a> – DESCRIPTION</li>
<li><a href="LINK">TITLE</a> – DESCRIPTION</li>
</ul>

which is perfectly fine. This is semantically correct, of course, but I choose to use a different markup method, the definition list, like so:

<dl>
<dt><a href="LINK">TITLE</a></dt>
<dd>DESCRIPTION</dd>
<dt><a href="LINK">TITLE</a></dt>
<dd>DESCRIPTION</dd>
<dt><a href="LINK">TITLE</a></dt>
<dd>DESCRIPTION</dd>
<dt><a href="LINK">TITLE</a></dt>
<dd>DESCRIPTION</dd>
</dl>

In practice, both are acceptable; it just comes down to a personal preference. Here is an example of this script in action – Both methods are shown. To see the innards of this example, check out the source.

If you have troubles with the PHP part of the script, check the paths to both rss_parse.inc.php and the test RSS file. If you have modified the markup, check to make sure all your quotes and parentheses are in the right places. If you get nothing at all when you run this script, chances are your host has error reporting turned off. Some web hosts restrict PHP's error reporting level to keep the size of log files in check, or to keep things tidy—to get around this, you can temporarily add

error_reporting(E_ALL);

at the very top of this script. This overrides that directive on a per-script basis, and is a great trick for debugging on stingy servers.

Now, Show Me

3.The basic usage of this script is fairly straightforward; it is deliberately vague in its variable definitions, which adds flexibility when it is called as an PHP include. For example, let's say you have a page that contains a sidebar, like hinkybox. Once you have an idea where you want these links to appear, you can just stick this code in the page:


<?php 

$rss_file = "/path/to/cached/file.rss"; 

include ("includes/delish.php"); 

?>

Take a look at the source of this page—the sidebar is contained in <div id="content2">. It is commented fairly well, and you will see where I've use this method.

Note: Make sure to remove the $rss_file variable I defined inside the script for testing purposes, otherwise you will get exactly the same links every time you use the script.

Nifty, right? Because PHP's variable scope is so flexible, you can use this bit of code as many times as you want; the $rss_file variable is simply overwritten each time you define it. There is another customization you can make that will limit the number of links displayed; I will get to that tweak in the next installment. I'll also explain how I use cron jobs to keep my feeds up to date, as well as a few customizations you can use to further personalize your content. ◊

Note on the code: I am a self-taught programmer; PHP is my favorite scripting environment, but I know better than anyone that I have some idiosyncrasies and some dark spots in my vocabulary that might make my code appear to the trained eye, well — shitty. I am not proud, and if you are reading this and see something amateurish or, god forbid, dangerous, please feel free to voice it or any other WTF?!? moments in the comment section.

Resources

4 Missives So Far


01 Mad said on Thu Jun 16 3:36:25 EDT

Hold on! Hold on. What's this del.ico.us thing again? I didn't get that bit, I've heard it mentioned all over t' intawebb but I haven't checked it out as yet. Don't tell me it's something I must have?


02 josh said on Thu Jun 16 3:47:49 EDT

Well, not a must, per se, but it is a cool way to keep up your bookmarks, and as I am getting at with this whole spiel, a good way to add current and relevant content to your site. it goes like this:

You see cool page —>

You bookmark said page with the Del.icio.us plugin —>

Bookmark becomes part of a Del.icio.us feed —>

Script running under a cron job pulls down RSS feed containing bookmark of said page —>

Bookmark served as an include on your page.

Check it out, here's my stash : http://del.icio.us/hink

I dunno how much you surf thesedays, and I know the general concept is not new, but it is certainly the niftiest idea I've toyed with lately.


03 Noah said on Thu Jun 16 13:14:14 EDT

Hink - Dope write up. I wish you had done this like, 3 months ago. I found, and use a plugin for Movable Type called GetXML 1.1 that handles the work for me... not on the Futon, but on other sites I've done.


04 josh said on Thu Jun 16 13:31:49 EDT

Alas—always the bridesmaid, never the bride.

I had a feeling when I started this that it was far too clunky, but I think it explains the actual process pretty well. I am seriously thinking about going into the plugin business.

Thanks for visiting, Noah.

Comments are currently off for this entry.

Past Entries

Minutiae Today

Using the Blogger Data API
Reference Material: Bringing Wheatblog up to date with XML-RPC and the GData API: Authors will be able to post from their favorite Blogger-aware authoring tool.
Gmail: Help Center - What are the keyboard shortcuts?
Keyboard Shortcuts for Gmail. Delete those chain emails from your batty 2nd-cousin lightning-fast and with ease.
Free! Icons for your website or application at MaxPower
Nice little blog post about CC, GPL, LGPL and royalty-free, and what they mean. Also a nice bunch of links to GPL/Free icon sets.

EtCetera

Feedburner.
Save Me Some Gas Money.
(mt) Media Temple - Web Hosting Services