Anatomy of a del.icio.us post

What goes into a del.icio.us post? How do people choose the tags to apply? How does the interface affect tag choice? How does this influence flocking behavior? I’d like to take a stab here at these questions and more.

Throughout this post I’ll be using the example of 23 Photo Sharing. I found it through “nontraditional” means (not a web search, physical advertisement, or by word-of-mouth), and while the front page of the site is rich, it does not show exactly how the site works. The combination of information and ambiguity present on the tagged page is good for this discussion.

I’ll go ahead and tag this entry — hold on a second.

Okay, I’m back. I’ve posted it to my general bookmarks account, del.icio.us/phyzome.

Basic models

Needless to say, everyone has their own unique tagging style, so one model isn’t going to cover everything. But I have noticed a few variables that seem to be fairly independent of each other and which can be used to describe a large portion of the del.icio.us population. There are a few tagging style continua to consider. Where are you on each continuum?

fringe vs. core (tail vs. head)
When tagging, do you tend to use the obvious, central ideas as tags, or do you also include the more tangentially related items? I’m the latter, a fringe tagger. I include any and every descriptive aspect of an object in its tags. Others, core taggers, tend to stay with the main concepts. Maybe they don’t like the clutter.
implicit vs. explicit
Do you use redundant tags? Explicit taggers might tag 23hq.com as ‘webservice’, ‘service’, and ‘web’, whereas implicit taggers might choose only ‘service’ and ‘web’. In fact, implicit taggers would likely drop the ‘web’ as well, since it is implied and not the topic of the site. Would you be more likely to tag a post both ‘linux’ and ‘computer’, or just ‘computer’? I vary in my position on this spectrum.
innovated vs. suggested
When I tag a post, I try to avoid looking at the suggestions until I’ve exhausted all of my own associations. Then I check out the recommended tags. This way, I can avoid getting distracted by similar terms, which would wipe out the better-fitting ones I come up with unaided. That makes me a highly innovated tagger — and I suspect we are rare.

Sample Data

Tagging on the edge

Have a look at the URL page for a recent news item. There are several things to notice here.

First, the tag ‘absurd’ appears twice — an interesting tag to use, and apparently a shared sentiment. This shows a move away from strict subject-based tagging and towards reaction-subject-context tagging. More on this later.

Second, the tags ‘RIAA‘ and ‘MPAA‘ both show up, as well as ‘DRM‘. They aren’t actually mentioned in the article, which does mention the MPA, a completely different organization. DRM is also not a subject of the article. While the ‘MPAA’ tag is likely an honest mistake, the ‘DRM’ and ‘RIAA’ tags are indicative of “associative tagging”. The article describes a situation very similar to the current DRM, MPAA, and RIAA controversies, and taggers are either assuming there is a relationship or consciously tagging the similarity itself.

Tag suggestion

Del.icio.us gives you a list of recommended tags, but I prefer to first write down all of the tags that spontaneously pop into my head. For comparison, I’ve put together a table for comparing several tag lists (the popular tags, the tags I spontaneously used, the tags that del.icio.us recommends, and the recommendations I accepted) for 23hq.com.

Comparison of tag lists for a link post
List source Description Tags
Popular Most commonly-used tags for a link, often misleading
Spontaneous I thought of these as I looked at the site
Recommended Del.icio.us recommended these, I know not the algorithms
Accepted These are tags I didn’t think of until I looked at the Recommended list

You may wish to see the recommendations for yourself (account required), or just the del.icio.us page for the link.

Observations & Ruminations

Homologous tagging

The first thing that really caught my attention was the presence of the tag ‘flickr’. I assume people are using “flickr” to indicate that 23 is a flickr-like site (simultaneously genericizing flickr and promoting it). I wonder how the folks at flickr feel about this? In fact, I wonder how I feel about this. Perhaps it is a sign of things to come: tagging sites with the names of similar sites. Maybe I’ll start doing that as well. It can’t hurt; the tag ‘flickr’ can only otherwise be legitimately applied to flickr itself, a few galleries, and perhaps an article on flickr — that’s a pretty tight niche. Tagging “flickr-homologous” sites with ‘flickr’ shouldn’t interfere greatly with actual flickr-related sites — those should have ‘flickr’ in their core tagset, not their tail.

Contextual tagging

Remember how several people had tagged an article with ‘absurd’? I am extremely interested in this. People are beginning to use tags that go beyond content to the realm of reaction and context. This “contextual tagging”, as I call it, gives an intriguing window into taggers’ minds. Just as subject tagging on del.icio.us shows how interest waxes and wanes over time in certain subject areas, contextual tagging shows attitudes over time as well.

I use a wonderful program called KimDaBa to organize my photos, and I started using contextual tagging there long before I used it for link tagging. I’m not exactly sure where the difference lies — perhaps because the photos are designed to invoke feelings, reaction tags are more important. Perhaps also context is sometimes lacking in the photo itself, and needs to be added more explicitly.

I confess that I am merely fascinated by contextual tagging, and don’t have much more analysis to present on the topic. I do feel that it will be important in some way, though.

Conclusions

Innovated vs. Suggested

Suggested taggers create a swarm effect, scanning the information less critically and letting more crap through. They can skew the core of the link’s tags by over-emphasizing an “incorrect” set of tags. If a poor tag gets established early on, it can rocket up to the top (though this isn’t always bad — see the ‘flickr’ example). Innovated taggers tend to balance the link’s tags toward a more appropriate description of the link.

Fringe vs. Core

Fringe taggers have a tendency to add cruft to the search results, especially in an environment like del.icio.us. The stochastic effect of thousands of taggers generally smooths this out, but in smaller tagsonomies, this can overpower a user browsing the tags. That’s why tag clouds are so handy — they visually filter out the fringe tags. Fringe taggers also tend to reduce the height of the tag graph — the core is not as distinct from the fringe. I call this flattening.

Core taggers strengthen the best result for a particular search (where a clearly best result exists at all). This does not work for searches where many unique but overlapping good results exist — and the core tags lack the power to distinguish between variants on the desired results.

I would like to add that fringe taggers aid fringe searchers. The effectiveness of a tagging style is directly related to the search styles. I tend to be a fringe searcher — I use less common tags to home in on the perfect result.

Implicit vs. Explicit

Implicit and explicit tagging styles have an extremely powerful effect on search results. Without even an attempt at a term hierarchy in place, as with del.icio.us, a search for ‘technology’ will turn up none of the results tagged only with ‘computer’, though they are obviously relevant. At the same time, explicit tags can cause major clutter. Some tagsonomies support semi-hierarchies, where tags can have parents or even ancestries. Potentially, the tag bundle feature on del.icio.us could provide rough tag-similarity guessing to improve implicit search results.

Further Thought

Post types

There are several types of del.icio.us posts that I have noticed, and perhaps I should have handled these separately to some extent:

Spread-the-word
I have an account dedicated solely to news alerts and important events. Each post is outdated within the week, so I like to keep them separate. The other type of spread-the-word post is the maybe-someone-else-will-need-this post.
Reference
These make up the largest chunk of del.icio.us posts, I believe. Perhaps the trend is shifting slightly towards the previous category, but this still seems to be the reason for most posts.
Read-this-later
Does anybody actually go back and look at their to-read list? It’s rare that I do.
Categorize
What if you don’t have a tagging system for your website? Just create a del.icio.us account for your site, and tag all your pages. Then use the del.icious engine to retrieve and search. (I lied a bit. This is somewhat theoretical — I haven’t actually seen any of these, but I was tempted to make an account to do this before I found a tagging plugin for WordPress.) I’ve also tagged a few obvious resources like Google even though no one would ever need the link. It’s just an obsessive-compulsive thing, I guess.

Commas

Raise your hand if you’d prefer comma-delimited tags instead of space-delimited tags. Uh-huh, that’s what I thought. No more CamelCase and dropped punctuation, more coherent tagging!

Tagsonomy management

Sometimes I want to smack someone for the way they’ve used a tag. But that impulse runs counter to the nature of these emergent taxonomies. I can only manage my own tagging, and what happens to the rest of the tagosphere is out of my hands. (That’s probably a good thing.)

You can give people the technology, and at most suggest what they might use it for, but that’s it. You can’t control what end-users will do with a product or service. Spam is a classic example, as is Google Maps. One has a bad ending, the other a good one.

The tag is an elemental structure, just like a collection or a look-up table. Whatever bells and whistles you put around it, the essence doesn’t change — shared brief descriptors in a many-to-many relationship with discrete data nodes.

What power and meaning people choose to imbue a tag with evolves out of the control of anyone.


Responses: 8 so far Feed icon

  1. Jacqui says:

    This is a great analysis. I'm sticking a "libraries" tag on it when I del.icio.us it, because that is my tag for anything having to do with library and information science. :) (Did you know you were an information scientist too?)

  2. Jacqui says:

    Sorry for posting twice, but I just thought of a question. How common do you think it is for people to go back and re-tag something? I tend to do this a lot, as I collect a huge amount of links in one del.icio.us account. I do things like tell del to add the tag "technology" to all things also tagged "computer" or "software." I also sometimes do things like tell it to add "science" as a tag to all things already tagged "medicine," realizing that not quite all "medicine" tags will also be about science, exactly, and go back and weed out. One time I noticed that people were using tags like "web" and "Internet" and realized I didn't have a tag for anything directly related to the usage, history or function of the Internet. Now "web" is one of my biggest tags.

  3. Xaprb says:

    Nicely written. You bring up a lot of stuff here. I'm also interested in how the interface itself affects the way people tag. Del.icio.us has several different interfaces -- the bookmarklet, the direct-post, the Firefox extension. I find they really influence the way I post things. I hate the "other people's tags" and "suggestions" feature because I feel it co-opts my own thought process. The only thing it's really convenient for (for me) is making sure I don't misspell a tag.

    Some other lines you could squint along: plural vs. singular, descriptive vs. prescriptive.

    I use WP categories as tags. I just don't nest categories, and I put posts into as many as I want. This entry is showing up as Uncategorized at the moment. I'm curious how you're using the tagging plugin.

  4. Tim McCormack says:

    @Jacqui: Yes, information science has always been one of my main interests -- the structure of knowledge, the essential nature of informational structures, etc.

    @Jacqui: That's a good question about retagging. I usually edit posts that I've marked '@toread' after finally reading them, and I have a better idea of the actual content. I have a few ideas for programs I'd like to run on the del.icio.us database, and post editing is certainly grist for the mill.

  5. Tim McCormack says:

    @Xaprb: I'm playing with the tagging plugin in a throwaway blog I created. It isn't installed here yet. (Until I get back to Wooster I'll be on a dialup connection, so web dev is painfully slow.)

    I've definitely come up against with the plural/singular decision a number of times in tagging my photos. I'd like to know how people resolve that for themselves.

    What do you mean by prescriptive vs. descriptive? Is that related to the contextual tagging?

  6. Sally Carson says:

    Great post! Very thought-provoking, I'll be chewing on this one for a while. And yes, I am raising my hand, you can't see it but I am -- I want comma-delimited tags!

  7. Tim McCormack says:

    I just came across this (in the process rooting out posts containing ImageShack-hosted graphics) and I will note that nearly 4 years later, del.icio.us still doesn't allow spaces in tags.

  8. booty.licio.us – del.icio.us tail | Brain on Fire says:

    […] my mind) insufficiently tagged their posts, but I’ve been reconsidering my position on these core-taggers. I think they may paradoxically improve the relevance of search […]

Commenting is not yet reimplemented after the Wordpress migration, sorry! For now, you can email me and I can manually add comments.