Generating OPML From del.icio.us (And Getting All Your Links)

UPDATE: Please note that del.icio.us has updated their API in the time since I wrote this. I have updated the URLs in my examples accordingly. The authentication is much safer since they are now using SSL.

The other day, I mentioned that I generate my blogroll from an OPML file. Until recently, this OPML file was generated manually by me whenever I updated my blogroll in NewsFire.

Now, there are some problems with NewsFire, one of which is that you cannot generate OPML for a specific group or selection of blogs (NetNewsWire excels in this area, however). Instead, the OPML file generated is a dump of all your feeds. This was not ideal for me, since I then had to go through and clean up the OPML before uploading it to my site. Needless to say, I updated my OPML as little as possible.

After my last blog post, I posted a comment to Chris’s blog saying that he should do something similar. That is, I suggested he generate his blogroll from an OPML file instead of del.icio.us, which has the unfortunate limitation of only 30 items per RSS feed. He replied, saying that he would then have to update two places (his feed reader and del.icio.us) instead of only one. In short, it would create more work. After assessing my practices, I cannot agree more.

So, I investigated a way to get around the 30-feed limitation in del.icio.us, and I found that their API allows you to do just this, albeit with a few restrictions of its own, which I’ll explain in a few moments. Using the del.icio.us API, instead of their RSS feeds, I was able to use the following code to first check whether my del.icio.us account has been updated since I last cached their data, and, if so, to grab all of the links for a particular tag and cache the data to a file for later use:

<?php
$username = 'your_username';
$password = 'your_password';
$cache_file = '/tmp/blogroll.xml';
// check for updates to del.icio.us account
$update = simplexml_load_file("https://{$username}:{$password}@api.del.icio.us/v1/posts/update");
if (strtotime($update['time']) > filemtime($cache_file))
{
// del.icio.us has been updated since last cache; recache
$data = file_get_contents("https://{$username}:{$password}@api.del.icio.us/v1/posts/all?tag=blogroll");
file_put_contents($cache_file, $data);
}
// read links from cached del.icio.us data
$blogroll = simplexml_load_file($cache_file);
foreach ($blogroll->post as $blog)
{
echo '<a href="' . htmlentities($blog['href']) . '">';
echo htmlentities($blog['description']);
echo "</a><br />\n";
}
?>

This code sample uses SimpleXML to traverse the nodes of the XML documents retrieved from del.icio.us. It first polls del.icio.us with a request to see the update timestamp. Checking this before executing a query to see all posts saves bandwidth for del.icio.us and can help us determine whether our cached file needs updating. This is also the recommended practice (from del.icio.us).

This approach can help you grab all of your links (or all links for a specific tag) from del.icio.us. I’ve noted several issues, however:

  • You can specify a tag to the /posts/all command with ?tag=tagname, but you cannot specify a combination of tags to retrieve
  • You cannot specify a tag to the /posts/update command, so the update timestamp received is for your last overall update to del.icio.us – if specifying a tag to /posts/all, then it would be beneficial to specify a tag to this command, saving del.icio.us even more bandwidth
  • This method requires HTTP authentication; you’ll notice how I passed a username and password through the URL to retrieve the data – you can retrieve only your links; whereas, using RSS, you may retrieve anyone’s links, but you are limited to their 30 most recent links

Finally, I have implemented this approach using my blogroll tag on del.icio.us. My OPML file is now generated nightly (if detecting updates to del.icio.us). This will end up saving me quite a bit of time, since I can use the del.icio.us plugin for Firefox to add any site I visit to my blogroll, and, then, I can import the generated OPML into my feed reader at my leisure, with little to no headache.

You can see my script, which is executed by a nightly cron job, here. You may notice that it’s using a getRSSLocation() function to auto-discover the RSS feed from each site for inclusion in the OPML file. I grabbed this function from Keith Deven’s site, and you can see my implementation of it here.