Generating OPML From (And Getting All Your Links)

UPDATE: Please note that has updated their API in the time since I wrote this. I have updated the URLs in my examples accordingly. The authentication is much safer since they are now using SSL.

The other day, I mentioned that I generate my blogroll from an OPML file. Until recently, this OPML file was generated manually by me whenever I updated my blogroll in NewsFire.

Now, there are some problems with NewsFire, one of which is that you cannot generate OPML for a specific group or selection of blogs (NetNewsWire excels in this area, however). Instead, the OPML file generated is a dump of all your feeds. This was not ideal for me, since I then had to go through and clean up the OPML before uploading it to my site. Needless to say, I updated my OPML as little as possible.

After my last blog post, I posted a comment to Chris’s blog saying that he should do something similar. That is, I suggested he generate his blogroll from an OPML file instead of, which has the unfortunate limitation of only 30 items per RSS feed. He replied, saying that he would then have to update two places (his feed reader and instead of only one. In short, it would create more work. After assessing my practices, I cannot agree more.

So, I investigated a way to get around the 30-feed limitation in, and I found that their API allows you to do just this, albeit with a few restrictions of its own, which I’ll explain in a few moments. Using the API, instead of their RSS feeds, I was able to use the following code to first check whether my account has been updated since I last cached their data, and, if so, to grab all of the links for a particular tag and cache the data to a file for later use:

$username = 'your_username';
$password = 'your_password';
$cache_file = '/tmp/blogroll.xml';
// check for updates to account
$update = simplexml_load_file("https://{$username}:{$password}");
if (strtotime($update['time']) > filemtime($cache_file))
// has been updated since last cache; recache
$data = file_get_contents("https://{$username}:{$password}");
file_put_contents($cache_file, $data);
// read links from cached data
$blogroll = simplexml_load_file($cache_file);
foreach ($blogroll->post as $blog)
echo '<a href="' . htmlentities($blog['href']) . '">';
echo htmlentities($blog['description']);
echo "</a><br />\n";

This code sample uses SimpleXML to traverse the nodes of the XML documents retrieved from It first polls with a request to see the update timestamp. Checking this before executing a query to see all posts saves bandwidth for and can help us determine whether our cached file needs updating. This is also the recommended practice (from

This approach can help you grab all of your links (or all links for a specific tag) from I’ve noted several issues, however:

  • You can specify a tag to the /posts/all command with ?tag=tagname, but you cannot specify a combination of tags to retrieve
  • You cannot specify a tag to the /posts/update command, so the update timestamp received is for your last overall update to – if specifying a tag to /posts/all, then it would be beneficial to specify a tag to this command, saving even more bandwidth
  • This method requires HTTP authentication; you’ll notice how I passed a username and password through the URL to retrieve the data – you can retrieve only your links; whereas, using RSS, you may retrieve anyone’s links, but you are limited to their 30 most recent links

Finally, I have implemented this approach using my blogroll tag on My OPML file is now generated nightly (if detecting updates to This will end up saving me quite a bit of time, since I can use the plugin for Firefox to add any site I visit to my blogroll, and, then, I can import the generated OPML into my feed reader at my leisure, with little to no headache.

You can see my script, which is executed by a nightly cron job, here. You may notice that it’s using a getRSSLocation() function to auto-discover the RSS feed from each site for inclusion in the OPML file. I grabbed this function from Keith Deven’s site, and you can see my implementation of it here.