Simple Caching of RSS Content (Edge Article Follow Up)
Posted on Apr 27, 2007
As some of you may recall, I wrote an beginners article for the Adobe Edge that appeared about a month ago regarding how to use sites like Blogger, Flickr and del.icio.us to manage content for your site that you retrieve via RSS. As I state in the article, the caveat is that this can be slow since the RSS is retrieved via standard HTTP calls. I have received questions regarding how you could cache this content to improve performance, and I wanted to cover that quickly here (my advanced readers will probably find my discussion here overly simplistic, but the idea of the article was keeping it simple to introduce people to the power of ColdFusion).
The three methods of caching this data locally would be 1) in a persistent scope such as the application scope; 2) on disk (i.e. written into text files); 3) in a local database. Since it is probably the easiest to implement, I am going to show a simple caching system using the application scope here.First of all, you will need to make sure you have an Application.cfm file in your web site root folder. I am going to put two lines in there:
<cfapplication name="edgearticle"> <cfparam name="application.content" default="#structNew()#" />
First, replace "[ApplicationName]" with your application's name which is a unique name you are giving to your site within ColdFusion. The second line simply create a structure where we are going to put our cached content.
In my sample code for the article, I only had a home page (i.e. index.cfm), you will clearly have more than that on your site, so within the application.content structure I am going to create a substructure keyed on my page name (in this case I am calling index.cfm home). Within that substructure I can have variables containing the returned data of my page content (for example the Blogger call), the links (del.icio.us) and the images (flickr). These HTTP calls for the external RSS feeds will only be made if 1) the key for that content item does not exist or 2) I specify that I would like the content reloaded for a specific key. It is important to remember that if you add a new Blogger post that feeds your home page, for example, it will not show up on your site until you tell the page to refresh the cache of that. Ok, let's see the code for how that works:
<cfparam name="application.content.home" default="#structNew()#" /> <cfparam name="url.reload" default="" /> <P>
<!--- get main content ---> <cfif not structKeyExists(application.content.home,"content") or url.reload eq "content"> <cfinvoke component="rss" method="getEntries" returnvariable="application.content.home.content" xmlData="http://www.remotesynthesis.com/blog/rss.cfm?mode=full" /> </cfif> <!--- get the links for the home page ---> <cfif not structKeyExists(application.content.home,"links") or url.reload eq "links"> <cfinvoke component="rss" method="getEntries" returnvariable="application.content.home.links" xmlData="http://del.icio.us/rss/remotesynth/css" /> </cfif> <!--- get pics ---> <cfif not structKeyExists(application.content.home,"pics") or url.reload eq "pics"> <cfinvoke component="rss" method="getEntries" returnvariable="application.content.home.pics" xmlData="http://api.flickr.com/services/feeds/photos_public.gne?id=74682459@N00&format=atom" /> </cfif> <cfset randPic = randRange(1,application.content.home.pics.recordCount) /> <cfsavecontent variable="regex"><img.*/></cfsavecontent> <cfset findImage = REFindNoCase(regex,application.content.home.pics.description[randPic],1,true) />
The first line simple creates the cache structure for this page. The second defaults the reload URL variable; this will allow us to refresh the links pulled from del.icio.us on this page by appending ?reload=links to the URL. The following call for the content RSS is wrapped in an if statement that says, if the content key does not exist in the cache structure for this page *or* I specify I want the content reloaded then call the RSS feed. This same logic is repeated for the links and images, though each is put in its appropriate key. (also note that the image code is now in line with the updated code I posted due to changes at Flickr).
The next step is to replace the calls within the code to use the application scoped cache. For example, my example output of the del.icio.us links would be changed to this:
<cfoutput query="application.content.home.links" maxrows="2"> <div class="boxtop"></div> <div class="box"> <p><img src="images/image.gif" alt="Image" title="Image" class="image" /><b>#title#</b><br />#description#<br /></p> <div class="buttons"><p><a href="#link#" class="bluebtn">&nbsp;Visit&nbsp;</a></p></div> </div> </cfoutput>
The only thing that changed in this case is that my query attribute on the output is "application.content.home.links" instead of simply "links".
Obviously this process would need to be repeated on each page where you want to implement caching, but you will notice that the lag caused by the external HTTP calls is gone on consecutive page refreshes.
Comments
Couple your caching with header checking so that the feed is only downloaded when it's actually changed and you have a pretty sophisticated feed reader. I made some updates to BlogCFC not too long ago that do just that. The key is caching the ETag and Last-Modified response headers and passing those values in the If-None-Match and If-Modified-Since request headers. The HTTP request is made every time, but if the request and the response respect those headers then no bandwidth is wasted downloading the feed if it hasn't changed.
Using this method, the cache updates itself when as needed based on the feed content rather than manual intervention. I wrote a blog post about this stuff at http://musetracks.instantspot.com/blog/index.cfm/2007/4/19/BlogCFC-Enhancement if anyone's interested. There are also a few other RSS-related posts in there.
Posted By Rob Wilkerson / Posted on 04/28/2007 at 12:16 PM