When looking at a listing of links on del.icio.us, have you noticed that some people use next to no tags while others have 10 or even 20? I used to get annoyed at people who (in my mind) insufficiently tagged their posts, but I’ve been reconsidering my position on these core-taggers. I think they may paradoxically improve the relevance of search results.
Warning: Poorly-collected thoughts ahead. Caveat lector.
When you post a link, you are given the option of entering a space-delimited list of tags to describe that link. Even though all the tags have different degrees of actual relevance to the link, the system treats them as equally relevant. This is a type of boolean indexing: each tag is either fully present or fully absent.
All the tags a person uses for a post hold the same weight. If I tag this entry as “blog article post tagging del.icio.us analysis longtail emergent graph”, then ‘del.icio.us’, ‘analysis’, and ‘graph’ receive equal weight. Naturally, more people will use ‘del.icio.us’ than ‘graph’ — that’s how the head-tail distribution emerges. But for a given post, the system will treat ‘graph’ and ‘del.icio.us’ as equally good descriptors of the link.
If only one person tags a link, each term will have the same value. There will only be tail — terms that only a few people have used (a.k.a. fringe tags). Imagine that everyone is a fringe tagger — the graph for a given link will be quite long and quite flat, possessing no distinct head (Fig. 1). An overall theme will not arise for each link, leaving the search results filled with junk matches. There will still be a head of sorts (composed of the intersection of peoples’ tag choices), but it will be much broader and will have no internal definition, since each tagger will likely use all of the terms composing the head.
So, we can see that core taggers are extremely important in a tagsonomy. They provide definition and body to the tagscape, isolating the few most important terms. Disagreements between core taggers make for a slightly more diverse head, but they will likely agree on the main terms. Unfortunately, with only core taggers, the tail disappears and tagging becomes nothing but cross-categorization (Fig. 2). All the tangentially-related terms fall by the wayside in favor of the most obvious ones, and niche links go unfound, rendering the system useless to the tail-searchers.
I’ve really come to respect the way diverse tagging styles are necessary for the nice distributions we see on del.icio.us (Fig. 3). Have you noticed anything about how diversity affects del.icio.us and similar systems?