e報休士頓休斯敦 Houston Asian News

The roots of RSS go all the way back to the infamous “push” revolution of late 1996/ early 1997. At that point in time, Pointcast captured the technology world’s imagination with a vision of the web in which relevant, personalized content would be “pushed” to end users freeing them from the drudgery of actually having to visit individual websites. The revolution reached its apex in February of 1997 when Wired Magazine published a “Push” cover story in which they dramatically declared the web dead and “push” the heir apparent. Soon technology heavyweights such as Microsoft were pushing their own “push” platforms and for a brief moment in time the “push revolution” actually looked like it might happen. Then, almost as quickly as took off, the push revolution imploded. There doesn’t appear to be one single cause of the implosion (outside of Wired’s endorsement), some say it was the inability to agree on standards while others finger clumsy and proprietary “push” software, but whatever the reasons “push” turned out to be a big yawn for most consumers. Like any other fad, they toyed with it for a few months and then moved on the big next thing. Push was dead.

Or was it? For while Push, as conceived of by PointCast, Marimba and Microsoft had died an ugly and (most would say richly deserved) public death, the early seeds of a much different kind of push, one embodied by RSS, had been planted in the minds of its eventual creators. From the outset, RSS was far different from the original “push” platforms. Instead of a complicated proprietary software platform designed to capture revenue from content providers, RSS was just a simple text-based standard. In fact, from a technical perspective RSS was actually much more “pull” than “push” (RSS clients must poll sites to get the latest content updates) but from the end-user’s perspective, the effect was basically the same. As an unfunded, collective effort RSS lacked huge marketing and development budgets, and so, outside of a few passionate advocates, it remained relatively unknown many years after its initial creation.

Recently though, RSS has emerged from its relative obscurity, thanks in large part to the growing popularity of RSS “readers” such as Feedemon, Newsgator, and Sharpreader. These readers allows users to subscribe to several RSS “feeds” at once, thereby consolidating information from around the web into one highly efficient, highly personalized, and easy-to-use interface. With it’s newfound popularity, proponents of RSS have begun hailing it as the foundation for creating a much more personalized and relevant web experience which will ultimately transform the web from an impenetrable clutter of passive websites, into a constant, personalized stream of highly relevant data that can reach a user no matter where they are or what device they are using.

Such rhetoric is reminiscent of the “push” craze, but this time it may have a bit more substance. The creators of RSS clearly learned a lot from push’s failures and they have incorporated a number of features which suggest that RSS will not suffer the same fate. Unlike “push”, RSS is web friendly. It uses the many of same protocols and standards the power the web today and uses them in the classic REST-based “request/response” architecture that underpins web. RSS is also an open standard that anyone is free to use in whatever way they see fit. This openness is directly responsible for the large crop of diverse RSS readers and the growing base of RSS friendly web sites and applications. Thus, by embracing the web instead of attempting to replace it, RSS has been able to leverage the web to help spur its own adoption.

One measure of RSS’s success is the number of RSS compliant, feeds or channels available on the web. At Syndicat8.com, a large aggregator of RSS feeds, the total number of feeds listed has grown over 2000% in just 2.5 years from about 2,500 in the middle of 2001 to almost 53,000 in February of 2004. The growth rate also appears to be accelerating as a record 7,326 feeds were added in January of 2004, which is 2X the previous monthly record.

The irony of RSS’s success though is that this same success may ultimately contribute to its failure. To understand why this might be the case, it helps to imagine the RSS community as a giant Cable TV operator. From this perspective, RSS has now has tens of thousands of channels and will probably hundreds of thousands of channels by the end of the year. While some of the channels are branded, most are little known blogs and websites. Now imagine that you want to tune into channels about, let’s say, Cricket. Sure there will probably be a few channels with 100% of their content dedicated to Cricket, but most of the Cricket information will inevitably be spread out in bits and pieces across the 100,000’s of channels. Thus, in order to get all of the Cricket information you will have to tune into hundreds, if not thousands, of channels and then try to filter out all the “noise” or irrelevant programs that have nothing to do with Cricket. That’s a lot of channel surfing!

The problem is only going to get worse. Each day as the number of RSS channels grows, the “noise” created by these different channels (especially by individual blogs which often have lots of small posts on widely disparate topics) also grows, making it more and more difficult for users to actually realize the “personalized” promise of RSS. After all, what’s the point of sifting through thousands of articles with your reader just to find the ten that interest you? You might as well just go back to visiting individual web sites.

What RSS desperately needs are enhancements that will allow users to take advantage of the breadth of RSS feeds without being buried in irrelevant information. One potential solution is to apply search technologies, such as key word filters, to incoming articles (such as pubsub.com is doing). This approach has two main problems: 1) The majority of RSS feeds include just short summaries, not the entire article, which means that 95% of the content can’t even be indexed. 2) While key-word filters can reduce the number of irrelevant articles, they will still become overwhelmed given a sufficiently large number of feeds. This “information overload” problem is not unique to RSS but one of the primary problems of the search industry where the dirty secret is that the quality of search results generally declines the more documents you have to search.

While search technology may not solve the “information overload” problem, its closely related cousins, classification and taxonomies, may have just what it takes. Classification technology uses advanced statistical models to automatically assign categories to content. These categories can be stored as meta-data with the article. Taxonomy technology creates detailed tree structures that establish the hierarchical relationships between different categories. A venerable example of these two technologies working together is Yahoo!’s Website Directory. Here Yahoo has created a taxonomy, or hierarchical list of categories, of Internet sites. Yahoo has then used classification technology to assign each web site one or more categories within the taxonomy. With the help of these two technologies, a user can sort through millions of internet sites to find just those websites that deal with say, Cricket, in just a couple of clicks.

It’s easy to see how RSS could benefit from the same technology. Assigning articles to categories and associating them with taxonomies will allow users to subscribe to “Meta-feeds” that are based on categories of interest, not specific sites. With such a system in place, users will be able to have their cake and eat it to as they will effectively be subscribing to all RSS channels at once, but due to the use of categories they will only see those pieces of information that are personally relevant. Bye-bye noise!

In fact, the authors of the RSS anticipated the importance of categories and taxonomies early on and the standard actually supports including both category and taxonomy information within an RSS message, so the good news is that RSS is already “category and taxonomy ready”.

But there’s a catch. Even though RSS supports the inclusion of categories and taxonomies, there’s no standard for how to determine what category an article should be in or which taxonomy to use. Thus there’s no guarantee that that two sites with very similar articles will categorize them the same way or use the same taxonomy. This raises the very real prospect that, for example, the “Football” category will contain a jumbled group of articles including articles on both the New England Patriots and Manchester United. Such as situation leads us back to an environment filled with “noise” and thus no better off when we started.

The theoretical solution to this problem is get everyone in a room and agree on a common way to establish categories and on a universal taxonomy. Unfortunately, despite the best efforts of academics around the world, this has so far proven impossible. Another idea might be to try and figure out a way to map relationships between different concepts and taxonomies and then provide some kind secret decoder ring that enables computers to infer how everything is interrelated. This is basically what the Semantic Web movement is trying to do. This sounds great, but it will likely be a long time before the Semantic Web is perfected and everyone will easily lose patience with RSS before then. (There is actually a big debate within the RSS community over how Semantic-web centric RSS should be.)

The practical solution will likely be to create a series of meta-directories that collect RSS feeds and then apply their own classification tools and taxonomies to those feeds. These intermediaries would then either publish new “meta-feeds” based on particular categories or they would return the category and taxonomy meta-data to the original publisher which would then incorporate the metadata into their own feeds.

There actually is strong precedent for such intermediaries. In the publishing world, major information services like Reuters and Thompson have divisions that aggregate information from disparate sources, classify the information and then resell those classified news feeds. There are also traditional syndicators, such as United Media, who collect content and then redistribute it to other publications. In addition to these establish intermediaries, some RSS-focused start-ups such as Syndic8 and pubsub.com also looked poised to fulfill these roles should they choose to do so.

Even if these meta-directories are created, it’s not clear that the RSS community will embrace them as they introduce a centralized intermediary into an otherwise highly decentralized and simplistic system. However, it is clear that without the use of meta-directories and their standardized classifications and taxonomies the RSS community is in danger of collapsing under the weight of its own success and becoming the “push” of 2004. Let’s hope they learned from the mistakes of their forefathers. ::

Comments

The planeterium (http://planetplanet.org/, http://planet.debian.net/, http://planet.gnome.org/ (the original) and others) is a good example of simple meta-directories - they collate all the feeds of interest to various free software communities (Debian, Gnome, Perl, etc). They also publish RSS and OPML of the resulting collations. The only problem is that there's a fair amount of overlap between some of them, which isn't solved by either viewing on the web or in any client that I know of. The central server model also solves the problem of the low-level DDoS of many RSS clients can have on a feed.

Posted by: James at February 22, 2004 01:43 PM

I'm sorry, but this is just plain wrong on so many levels (other than the history of 'push' technology).

Of course the signal:noise ratio goes down if there are more RSS feeds available. There are a lot of websites available too (more, actually), and people still look at those. People will find good feeds just like they find good websites (or they will read the feeds FROM the good websites, so they know when to look at them). It's nothing like Cable TV, it's a pyramid scheme.. if there are too many feeds, there will always be people that pick out good stuff from the people that pick out good stuff.. and well, turtles all the way down.

You can search RSS feeds just like we search the web with all the same advantages and disadvantages (well, more advantages and fewer disadvantages). Search engines and directories already exist, they just aren't called google and yahoo. Next topic!

Of course it's hard to categorize things, and to get people to categorize things accurately. RSS is better at this than the web, beacuse it has *some* means of doing it. I don't understand how this has anything to do with RSS dying, unless the web is already dead.

Meta-directories and such already exist. Bigger and better ones will spring up when the users show up to read them.

The only thing that could kill RSS is if a workalike with a different acronym comes around and beats it out de facto style, like ATOM might. Either way, it's the same deal.

Posted by: Bob Ippolito at February 22, 2004 10:50 PM

Because there are a lot of RSS feeds to choose from, the future of RSS/Syndication/Blogging/Whatyoumacallit is uncertain? That is like saying the web will come to an end because there are zillions of web pages and search engines are not capable of indexing 'em all! People will gravitate towards the feeds they eventually start liking. How they find these feeds? Look at the above comment.

In any town, there are dozens of newspapers published. Over the years we have got used to read one newspaper and skim the rest of the others. Likewise for RSS feeds - whether by word of mouth or AI tools - we will get used to a few of them and skim the others.

Posted by: Manoj Sati at February 22, 2004 11:40 PM

Trying to come up with standard categories or taxonomies (a word I now hate) and enforcing them across the millions of existing RSS feeds is impossible. Bloglines's search engine is probably the easiest and best way to find RSS feeds on a given topic. Index the items that are listed in all of the RSS files and return those RSS feeds when someone queries them. Its too bad Google isn't more supportive of this technology. It should be easy to wire into its existing index.

Posted by: Mike at February 23, 2004 10:18 AM

Excellent points in a very timely article.

I have completed a few hours before an essay that covers much of the same grounds with some thought provoking ideas on this and one new emergent professional role: the NewsMaster.

http://tinyurl.com/ypflu

Posted by: Robin Good at February 23, 2004 05:44 PM

RSS: A Big Success In Danger of Failure.... RSS is a baby running the risk of growing up too fast, but that's ok, RSS is serving it's purpose... it is merely allowing information to flow slightly better than it has in the past. RSS will not save lives but it does give us an idea of what the next evolution of internet publishing will be. & RSS and Blogging allows us all to post an opinion, this text we write is a testament.

Posted by: Louis Moynihan at February 23, 2004 08:57 PM

I have to agree with some prior comments, which took issue with your analogy of RSS to Cable TV.
With that analogy you are equating RSS with traditional media delivery mechanisms, which obviously
it is not. My perspective of RSS is abit less grandiose than yours.

The issues of categorization is not new and can be solved the way it currently is, through 'brands'.
If the co's you mentioned can aggregate and categorize the content their brands will rise, just as
Google's has. But if they will not succeed in categorizing content to folks' satisfaction than that
role is handled by 'hand' (e.g. Instapundit, etc).

Of course, the difference between your view and the above is above categorization is being done for
the consumer where as in your article categorization is being done for the content aggregators (local
and national news papers/wires) who than are delivering news to the consumer.

The web creates opportunities for very direct contact, so I don't see the adding of another layer as
bringing anything to the table. Some of the appeal/success of RSS derives from the direct connections
blogs allow. 'Official' news at arms length is easily available at many newspapers' websites, how much
more benefit would RSS derive if taxonomy was perfected? And more importantly how would it improve
life and consumption for people? My opinion is not by much.

RSS as a supporting actor to the primary theatre of weblogs. Mainly because many of the folks who run
weblogs do so for personal reasons and at their own expense. In return for their efforts they build an
audience and participate in a community. If their content was aggregated their anonymity would not
motivate them to produce the same content. According to this view RSS works as long it does not
supplant the primary source of distribution, the website/weblog. Other forms of compensation are
possible, probably resulting in 'new leaner-meaner newspapers'.

My two-cents. You get what you pay for.

Posted by: grant at February 24, 2004 11:37 AM

I'm not so sure I agree that RSS is as deeply rooted in push technology as this article suggests, but I am in overall agreement with the main thesis. In particular, I believe that neither search nor classification will address the complexities introduced by large numbers of feeds. But alternative approaches will address these shortfalls.

What we need to keep in mind is twofold: first, that for the most part most content of most feeds will be irrelevant to any given reader. And second, as is suggested, metadirectories organizing and categorizing feed content provide enough filtering for most people.

As evidence, I cite my Edu_RSS service, and in particular, Edu_RSS Topics (see http://www.downes.ca/cgi-bin/xml/edu_rss.cgi). The output from thsi service is a set of highly focus, yet comprehensive, RSS feeds of interest to educational technologists.

Edu_RSS approaches the problem from two directions. First, it harvests from only a small subset of feeds - 300 or so out of the hundreds of thousands available. These feeds are representative - that is, since most of them are blogs, a collective gathering and filtering effort has already taken place. The actual list of sources numbers in the thousands, arguably the entire set of sources in the field.

After aggregating these feeds, Edu_RSS combines the content and organizes into a set of categories (or 'topics'). The topics are defined using Perl (and unix) regular expressions, a flexible filtering mechanism that allows the selection of numerous expressions within a single phrase. The use of regular expressions allows the service to identify string combinations characteristic of a given topic, and thus results in a well selected set of resources.

According to my website statistics, Edu_RSS is consistently one of the most popular URLs on my website, following only the two files that generate my referrer system (which is another story). The filtering system is very effective: if something significant is published on, say, learning objects, it will appear as one of the less than a half dozen daily items in the 'learning objects' feed.

The mistake made by the early advocates of push - and by a commentator just above - lies in the idea that 'brand' will replace intelligent filtering. Brand fails because in order for something to be a brand, it must appeal to a large mass of people. But if it appeals to a large mass of people, it will invariably disappoint people looking for something more specific. The early advocates of push tried to promote existing brands, and readers found in push nothing they couldn't find in mass media.

I have argued elswhere that the only way to aproach content location on the internet is to treat it as a self-organizing network. What this means is that inherent in the structure of the internet there are distinct layers of filtering mechanisms, each consisting of a 'gather filter forward' mechanism. In some cases, the mechanism is fulfilled by a human agent, as in the case of blogs. In others, it is fulfilled by automatic mechanisms, such as Edu_RSS. And it is likely that Robin Good's newsmasters will in their own way also play the same role.

What's important here is that each node of each layer need not worry about the rest, and need not be focused on the goal of the system. The agent seeks what is available, the way a retinal cell gathers light, and passes on what is relevant, the way a neuron passes on a signal. The filtering occurs not in the individual node, but through the independent actions of the aggregation of nodes.

The reason why this system works, while other approaches do not, is that there is no reasonable mechanism which can apply the vast requirements of filtering on a single resource. If we use metadata, the indexing soon outweighs the content. If we use search engines, each resource must be subject to extensive analysis to determine context (or, we do without context, which results in a search for 'calf' linking to sites on agriculture an anatomy).

The layered mechanism works because at no point is the entire weight of the filtering process concentrated in a single individual or a single resource. Decisions about selection and classification are made on a case by case basis using very coarse, and unregulated, mechanisms. It means that individual agents can work without the need for central control, with the only requirement for a functional system being an open set of connections between the agents.

RSS is, today, the transport mechanism of choice. There is nothing magical about RSS, except for the fact that it just is an autonomous agent system providing a high degree of connectivity. As tye system matures, additional encoding systems, such as FOAF, say, or ODRL, will play their own important roles, offering different kinds of connections within the same network. The decisions make will become richer, without a corresponding increase in the complexity of the system.

So, RSS could succeed. It will probably succeed. But it is important to keep our focus on what it does well: it allows an individual to scan, filter, and pass forward. That's all it ever has to do. The network will do the rest.

Posted by: Stephen Downes at February 24, 2004 10:14 PM

For more comments see Bill's original post on this topic

http://billburnham.blogs.com/burnhamsbeat/2004/02/rss_a_big_succe.html

Posted by: Babak Nivi at February 25, 2004 03:00 PM

Another part of the basic premise that is flawed is the ignorance of the potential of human editors creating their own aggregated feeds. 'The practical solution will likely be to create a series of meta-directories that collect RSS feeds and then apply their own classification tools and taxonomies to those feeds.' Does it have to be so big and expensive? No. For a topic that lots of people obsess about, like cricket, there will be hardcore fans who spend lots of time scanning feeds, picking out the favorite items, and assembling them into their own feed. More casual cricket fans will subscribe to the feeds of their one or two favorite 'aggregators,' just like many people read weblogs to find pointers to interesting things on the web instead of scanning hundreds of sites looking for those interesting things.

It's yet another example of where this article's author would have been better off comparing RSS scaling issues with web scaling issues. Comparing it with cable TV? Huh?

Push 2.0

Back to the Future?

A Victim of Its Own Success

Searching In Vain

Classification and Taxonomies to the Rescue

What Do You Really Mean?

Meta-Directories And Meta-Feeds

Comments


城市電通 6100 Corporate Dr. #300, Houston, TX 77036 713-484-8181
Home \| About eBao \| Events \| Member Support ©Copyright(C) 2003 by AmbitUSA. All Rights Reserved. All trademark are property of their respective owners. This Website layout is designed by AmbitUSA