Advertising vs. semantic design

I'm seeing a very interesting conflict of interest on the web recently, involving semantic content and advertising. If you're involved with either of those, you may already know what I'm talking about. It's an issue that affects anybody who wants to either a) find/present information or b) avoid/push advertisements. In other words, everybody.


I'll first define a few terms people may be unfamiliar with: semantic web design and ad-blocking software.

Semantic web design

The idea of the semantic web is that the structure in which information is presented should be informative as well. Humans rely heavily on context to interpret the content they see on the web, but for today's computers, this is a non-trivial task. Computers work best when they are told exactly what is what. Website designers can increase the amount of computer-readable information on their websites by labeling content with the appropriate tags and including extra data only visible in the code and to disabled readers.

Ad-blocking software

To avoid those damned blinking punch-the-monkey-and-win ads that are sprinkled across so much of the web, many users install software to filter out data from advertisers. This is predicated on the idea that most ads are loaded from a different source than the rest of the media on the site, often from a dedicated third-party advertisement server, sometimes from a dedicated subdomain, or less often from a certain real or virtual directory on the site itself. The easiest ads to block are those served from dedicated advertising domains, like, or from a dedicated subdomain, like Slightly harder to catch are ads served through pages, where the ad URL contains the string /adview.php?. Often a page can be cleared of ads by blocking a javascript file that inserts the ads into the page.

Software that blocks ads can range from domain-blocking programs that sit between the browser and the internet connection, to programs or plugins that block any URLs matching a regular expression or substring, to browser plugins or extensions that remove parts of a web page known to contain advertisements. I use AdBlock, an extension for Firefox, which uses the regular expression technique. (For other users of AdBlock for Firefox, I have made available my filter library on my file download page.)


Users of ad-blocking programs like can avoid seeing most of the ads on the internet. I hardly ever see one of those annoying flashing banner ads, manage to avoid a number of interstitial ads, and generally don't see even the lowly text ads. (The ethics of this is left to a separate conversation.) As site owners find that ads are being blocked, they move to stealthier strategies.

Domain blocking leads to subdomain ad servers. Subdomain blocking leads to ads served by dedicated server-side scripts. Blocking of URLs by pattern-matching leads to ads being served by the same external mechanism as the rest of the site's graphics, distinguishable only to the server by some numerical parameter.

Enter Platypus. Platypus is Firefox extension that allows the user to modify a web page, then have that modification be automatically applied every time a page on that site is visited. One use of Platypus is the removal of ads based on the id attribute of a surrounding block. For example, an ad on some web page may be located in a div with an id of top_banner_ad. Platypus, trained by the user, can find and remove this block of code automatically every time a page on the site is loaded. Some sites now use obfuscated ids for their ads, but it isn't enough. Others have begun to strip away identifying information from the ad container, but programs like Platypus can often still locate the ad by its position in the page, relative to other identifiable elements. The latest technique is the placement of text ads immediately above or below content that the user wants to see, with no identifying boundary between the content and the ad. Even Platypus cannot remove these ads, but the user is required to make the distinction based on the verbal content of the text, which is even more annoying and distracting.

Conflict of interest

Essentially, sites are now trying to hide the distinction between content and ads on the code level, obfuscating content boundaries and identifiers. The web designer is forced to make a compromise between semantic design and marketing. Any semantic information is a possible hook for an adblocker.

This raises a deep conflict of interest for conscientious designers. They are on the one hand trying to provide as much machine readable metadata as possible, yet attempting to avoid the attention of one particular part of the machine. As the semantic web grows, ads may be the only part of the page that does not contain metadata, and thus may still be identifiable.

The future

Another ad insertion approach is to make the ad containers integral to the structure of the page, so that removal of a block causes a structural collapse or overlap of text. This would be regarded by the user as anothe annoyance and possibly be misinterpreted as poor site design or browser incompatibility. However, this may require building a page around an ad, which is not at all a sustainable method of page design. Additionally, scripts could easily be adapted to replace ads with objects of identical dimensions.

Of course, each stage of escalation is inhabited by fewer users and web designers than the previous stage. Most users don't have ad-blocking software installed, and for most web designers the time and energy involved in escalation will only give diminishing returns. Still, the question remains: Can advertising and semantic design peaceably coexist?

No comments yet. Commenting is not yet reimplemented after the Wordpress migration, sorry! For now, you can email me and I can manually add comments. Feed icon