Sunday, July 10, 2011


Some have argued that the web's ever-breaking-links are the major reason for the web's success. (See, for example, this discussion of the earlier Xanadu system and how such problems as link rot are, in fact, symptoms of why HTTP and HTML have been world-transforming successes, whereas Xanadu was not).

One consequence of the web's broken links is the need for 404 pages. (It is said that 404 pages are named for a room in the CERN laboratory). Original 404 pages solved a simple problem - something went wrong and a link broke, perhaps due to a web template programming error. Gradually, site designers realized, though, that 404 pages were a way to have a bit more fun. Here are a few of those more creative 404 pages that I've come across recently. What are your favourites?
ufortunate placement of yahoo! ad by jkenning
Lost in the Clouds

The Most Awkward 404 Not Found Page on the Internet

Bad Web Address Not Handled

A Thing of Beauty is a Joy Forever
Of course, such is the beauty / tragedy of the web that eventually the above links will, themselves, rot away and these nice 404 pages will disappear, to be replaced by not so nice ones.

Oh, well.

Tuesday, July 5, 2011

On the Difficulty of Defining Concepts

There are certain concepts whose meanings seem intuitively obvious - until you need to write them down.

I run into the need to craft a definition for a concept quite often in my work as Deputy Director of Schema Standards for the AP and in the process of defining standards for news at the IPTC.
Unidentified Structure by alanenlgish
Define - Structure - Define
Routinely, we will decide that we need to represent something. So, we start by giving it a name and will generally sketch out a definition, and probably cite some examples. We will figure out the structure and properties of the concept, testing the structure against all the examples we can think of, to ensure we cover any edge cases. That process can be quite difficult in itself. However, once we think we are done, the original name and tentative definition are often called into question. That's because the process of thinking through the structure of a concept and how to represent it will often uncover variants that don't neatly fit within the original definition. Sometimes, this can lead to arguments over whether what we are modeling is a single core concept after all or is better described as a set of similar but distinct concepts.
Avoid by phunk
Avoiding Tag Abuse
It is all too tempting to provide a definition that tautologically uses the name of the concept itself. Or - only slightly better - uses a synonym. Names and definitions are important when crafting a standard. They help others to convey information - avoiding the perils of tag abuse - and can highlight when a standard is genuinely missing an important concept.
Source #7 by nostri-imago
Lose the Source
Right now, the IPTC is going through that frustrating process once again over the concept of "Source". News standards such as hNews and rNews have the concept of Source, as did NewsML 1, the pre-cursor to the G2 Family of news standards (NewsML-G2, EventsML-G2 and SportsML-G2). And yet, we're debating whether and how to represent the Source concept within G2. The main hold up appears to be the lack of a clear definition: how to clearly define "Source" by saying what it is that all sources hold in common and what makes Source different than other concepts (such as Copyright Holder or Provider or Author)?
Language games by smackbox
Language Games
This problem of defining a slippery concept is not new - or restricted simply to the practice of information modeling. Ludwig Wittgenstein dealt with this question in his Philosophical Investigations. In his aphorisms 65-69, Wittgenstein points out that the concept "game" is a useful one and people can intuitively claim they know what things are games. And yet games do not have "one thing in common" but "are related to one another in many different ways". I suggest that there are many useful concepts (like the concept of a "Source" in news) which cannot be reduced to a single essential set of characteristics but are instead a set of different things that resemble each other.
Ampersands by lwr
If all this talk of "concepts" is too abstract, you might prefer a more visual illustration. Consider any letter or alphabetical symbol. Such is the inventiveness of typographers - and the adaptability of human pattern recognition - that it is difficult to identify the core set of rules that identify a particular symbol. What makes an ampersand an ampersand?

"A rose by any other name would smell as sweet" by johnkay

What's in a Name?
Wittgenstein was not the first to point out dividing up the world into named and well-defined concepts is hard. The Sorites Paradox deals with the problem - and importance - of vagueness in defining concepts. Sorites derives from the Greek word soros (meaning ‘heap’):
Would you describe a single grain of wheat as a heap? No. Would you describe two grains of wheat as a heap? No. … You must admit the presence of a heap sooner or later, so where do you draw the line?

And, in the Tao Te Ching, Lao Tsu says
Once the whole is divided, the parts need names.
 There are already enough names.
 One must know when to stop.
 Knowing when to stop averts trouble.