[ Home ]
Tagging and the Semantic Web
Tagging
Tagging, i.e. on-the-fly user generated keyword categorization looks like it is becoming the standard way to categorize weblog content, replacing things like fixed pre-set categories. In other words items are categorized at the point of posting, at the level of individual posts rather than according to a pre-existing taxonomy.
Linkblogging and bookmarking
In addition it looks like the there is an intersection between bookmarking and weblogging, where ‘link blogging’, the process of creating content that is a link to a site with a short comment, rather than a full length blog post, is possibly better handled through a dedicated bookmarking system such as Del.icio.us or Wists posting to a blog via standard API’s while adding to a shared directory in parallel.
RSS and metadata.
RSS 1.0 and 2.0 extend the notion of the earliest versions of RSS as a means to syndicate simple headlines and links with extensible modules, allowing for any metadata to be syndicated through RSS.
Despite the fact that RSS has been around for nearly 10 years and that extending RSS via modules has been around for 5, there is not a single RSS aggregator that can read a new RSS module or extension, on-the-fly, allow for results to be filtered by this metadata and display it correctly. Although this sounds complicated, it need not be – an example would be addition of a tag called price - allowing a user to show items from product RSS feeds within a specific price range. This is clearly the future of RSS.
On the publisher side, one of the reasons why modules are not cropping up everywhere is that there are no simple tools for people to create RSS modules. At the moment people have to sit down and agree on a module and draft a spec. For example here is one that I worked on that is used for biographical information: http://vocab.org/bio/0.1/
Top down vs bottom up.
RSS itself is a grassroots phenomenon, an example of a standard that has reached widespread adoption from grassroots involvement by the developer community, rather than a standards organization. Tagging is also a grassroots phenomenon, but even more so – it is available to end users rather than developers and so is truly a bottom up system of user community classification. In light of this, perhaps there is something to be done to make the process of adding metadata to RSS a real community driven, bottom up, activity, by placing the right tools in the hands of end users.
The Semantic Web.
Tim Berners Lee, the inventor of the web, has long been a champion of the Semantic Web, which he saw as the next major development in the use of the web. The idea of the Semantic Web is that by defining how meaning is encoded within documents and applications on the web, underlying meaning could be extracted automatically by computers linking pieces of meaning together, allowing for all sorts of new uses of the web. At its core this involves defining a data model that defines meaning and links it to things, which on the web are URLs, or more generally URIs. The model for how this information is laid out mirrors everyday language, where sentences consist of subject, predicates and objects – triples. The web is defined as a graph – a spiders web of interconnected points, nodes, with interconnecting relationships, edges. The Semantic Web defines the data model, RDF, for how the web, becomes an edge labeled graph of meaning. This data model is very simple, but beyond the data model, syntaxes needed to be defined as to how to implement it. The most widely known syntax for RDF is in XML and is the on used by the original version of RSS at Netscape and also RSS 1.0. Some of the elements of modularity of RDF are also present in RSS 2.0. The problem is that the XML RDF syntax is tricky and difficult for end users to understand, and so that is an impediment to people creating Semantic Web style metadata extensions to RSS.
The semantic web.
Recently people have been talking about the ‘lower case’ semantic web, where some of the ideas and aims of the semantic web are achieved without dependence on some of the standards that were part of the wider Semantic Web initiative.
Tagging and semantics
When you tag an item with a keyword such as “turkey”. what you are implicitly saying is “category=turkey”. The problem with this is that sometimes “category” is not enough context for a tag. Meaning always requires context. Wists allows you to label the context of a tag with anything (where the default is implicitly “category”). In the above example you could label something as “food=turkey” or “country=turkey”. These groups of tags, “metatags” allow you to indicate the context of a tag and give tags greater meaning and less ambiguity. Popular metatags that people have already created include location= for places and fav= for people’s favorite movies and books etc.
By allowing people to create metatags and attaching these metatags to their own ‘namespace’ you allow for the possibility of formally defining groups of metatags as an RSS module for a specific industry. In theory one can create a marketplace for RSS modules where the people creating the modules need not know or care about the technicalities of what this means. In other words if people involved in apartment rentals start to tags things in the following manner: rooms=3 square_feet=2000 monthly_rent=2000 etc., one has the beginnings of something that could be formalized as a standard module for apartment rentals with elements defined in a standard namespace.
It is possible that these early steps in grass roots classification via tagging could evolve into something more along the lines of what the original aims of the semantic web promised.