Friday, April 4, 2008

RSSToolkit: Environment for creating, aggregating & crawling content for RSS

RSS = "Rich Site Summary" or "Really Simple Syndication" - it's an XML (Extensible Markup Language) based short summary of any type of information and is usually hosted for people to access via the internet. Most mobile phones can read RSS and there are many desktop widgets that use RSS to feed information to them. News, weather and website updates are common types of information that an RSS feed cam contain (GeoRSS is a standard that adds location to an RSS feed). You can even subscribe to this blog via RSS!

Anyway - it was at the time our team was building a vision of what we thought our next-generation information systems should look like that we realised that RSS was going to be an important element. We felt that with desktop widgets and smart phones becoming ever more popular that we could dessiminate information to an even wider client-base if we summarised the info we were collecting in the company and very specifically targeted it at different user groups via RSS. In this way our users would get ONLY the information that was of interest to them in a very succint manner and didn't have to pore through complex, often difficult to use web applications... even better they could get it on the device of their choice!

Field engineers for example could subscribe to feeds that gave them information on the status and operating envelopes of their assets and facilities directly to their mobile phones. No need to go the office, log on, start up an app, select an asset, select the parameters, interpret results... its all there summarised for them ready to go. All they need to concentrate on is doing their job.

You can see that the possibilities are almost endless and we got quite excited about this idea. We realised however that we would eventually be running hundreds of feeds and supporting the services and logic behind them. What we needed was some sort of toolkit that would allow us to easily interrogate databases, websites other feeds etc and aggregate the results into RSS feeds. What we came up with was the "RSS Toolkit". It consisted of:
  1. A COM library of objects and functions that handled database query, web crawling, searching and feed aggregation.
  2. An IDE (Integrated Development Environment) that allowed us to build small scripts on top of the COM library (to add business intelligence).
  3. A daemon that scheduled and ran the scripts that we built.
The COM library was at the heart of the toolkit and the IDE used the microsoft scripting run time - so the main scripting languages supported were VB Script and J Script. You could however also write in any .NET language and also Python - although these were not supported by the IDE or the daemon.

Screen capture of the IDE showing help documentation and code snippets to the left, VB Script code in the center, Project explorer and properties to the right and debug output at the bottom. The floating window is the command-line daemon.

Here is a copy of the COM Library Object Model