SyntaxHighlighter

SyntaxHighlighter

Sunday, October 30, 2011

selenium + python = web automation ftw!

This past snowy weekend, I decided I wanted to try out automating one of the routine tasks I have to perform each week via a web browser. After a little googling, I decided that Selenium was my best option. And, with a little bit of work, I was able to use it to create a script to automate this chore - even though the web application I'm automating turned out to be quite unwebfriendly.

Before I forget, I thought I should quickly jot down what I learnt in the process, with the hope that this will be useful to others (or, perhaps, my future self).
Selenium Roses by bcostin
http://www.flickr.com/photos/bcostin/155246852/
Selenium
The Selenium website succinctly describes the software: "Selenium automates browsers". Primarily known as a web site testing tool, the seleniumhq site points out that "[b]oring web-based administration tasks can (and should!) also be automated as well".

It turns out that Selenium is, in fact, a suite of tools. There's Selenium IDE, a Firefox plugin that is ideal for quickly prototyping an automation script. And there are various server side components with the core being the WebDriver API. (The WebDriver is also known as "Selenium 2", as it is a merger of the original Selenium with a project started by a tester at Google. See the Selenium project history for more). The WebDriver lets you control various browsers - Internet Explorer, Chrome, Opera, HtmlUnit. It even has ways to simulate iPhone and Android devices. But the browser best-supported by Selenium is Firefox. I had to download a fresh copy, since I've almost entirely switched to Chrome these days. But it was worth it.
Firefox magnet (wallpaper) by flod
http://www.flickr.com/photos/flod/2568092124/

How I Did It
I wound up basing my script very much on the example in the Getting Started section. Along the way, I developed a few patterns that I'd like to pass along, in the hopes that they will help.

As I mentioned earlier, the web application I was trying to automate turned out to be quite unwebfriendly. What I mean is that it used all kinds of complicated tricks to ensure that, for example, the URL doesn't change as you navigate to different pages within the application. Similarly, rather than using anchor tags (<a href>) for links, it used Javascript "onclick" scripts to dynamically construct them. The good news is that Python-connected-to-Selenium-powering-Firefox is perfect to automate this kind of application, as it fully supports Javascript, frames, pop up windows and the like.
Waiting by amchu
http://www.flickr.com/photos/amchu/5261511319/
Waiting
One thing I quickly figured out was that my script needed to wait for the complicated, multi-part pages to finish loading before going onto the next step. The documentation explains how to use either explicit or implicit waiting in Selenium. At first, I followed the lead of the example and looked for a particular web page title. Or I thought I could detect changes using the URL. Except that my web application doesn't change title or URL as the pages change!

However, I hit upon a solution: in each step, my script searches for a particular element by name or id. So, I realized that waiting for that element to be appear was the most effective strategy. This lead me to code like this:

try:
    # we have to wait for the page to load
    WebDriverWait(driver, 10).until(lambda driver : driver.find_element_by_name("login"))
    
except:
    driver.quit()

(You might also notice that I altered my code to use "except:" rather than "finally:", since I don't want my script to always exit, whether or not the page load completes in time).

Use the Source
I also figured out a variation on the try/wait pattern to help me figure out the next step. Often, I wrote a WebDriverWait function that would time out, without succeeding. Eventually, I realized that simply quitting the driver wasn't the most useful way to debug things. Instead, I would use the page_source property to help me see what was really loaded into the browser at that point:

try:
    WebDriverWait(driver, 10).until(lambda driver : driver.find_element_by_name("mainfs"))
except:
    print driver.page_source

IDE Clues
I also found that it was useful to try to figure out how to navigate the web application using the Selenium-IDE plugin. I would record my actions in the IDE, then review the script it generated, to get clues as to how to write my WebDriver script.

I say "clues", because the IDE generates Selenium 1.0 commands, not the 2.0 syntax I needed to use. Generally, it gave me enough information that, with some searching of the API documentation, I could figure out the right equivalents. However, in certain cases, there are capabilities in Selenium 1.0 that have no direct equivalent in the 2.0.
Grandpa by conowithonen
http://www.flickr.com/photos/cmogle/2907198746/
A Problem of Focus
One example: in 1.0, there's a way to do relative navigation between frames. It turned out I needed this, since at one point in the web application I was trying to automate, it leaves the focus inside a subframe. Experimenting with the IDE, I saw that it wriggled out of this problem by executing

selectFrame(relative=up)

The problem is that there is no equivalent relative move between frames in Selenium 2. Eventually, I figured out that I needed to switch the focus to the entire window (by making use of the "current_window_handle" property), which then let me select the particular subframe I was looking for

driver.switch_to_window(driver.current_window_handle)
driver.switch_to_frame("body")

Don't Give Up!
Once I figured out these patterns of working by experimenting with the IDE, and printing the source whenever I got stuck, I found that automating the web application with Selenium was fairly straightforward. It would be nice if the documentation was a bit fuller. And it would be wonderful if the IDE generated Selenium 2 commands. But I think that my small investment in figuring it all out was worthwhile. So, now I'm looking for more things to try with Selenium. For example, could I use it to try out RESTful testing of APIs or Linked Data? Are there other web-based chores that I could (even partially) automate?

Monday, October 17, 2011

Why I Love Twitter


Recently, someone who doesn't use Twitter asked me to explain why I love that service which describes itself as a "real time information network".
learning by SReinwald
http://www.flickr.com/photos/d1sasta/2321821367/
As with most of these kinds of things, it is a little hard for me to explain why I love Twitter so, particularly to someone who has never used it. But maybe I will learn something from trying?
Tweet of Dweams by born1945
http://www.flickr.com/photos/12567713@N00/2369056718/
Perhaps Twitter is like interesting people from around the world texting and sometimes I text back? But it is more than that. I get most of my news through Twitter and it is like instant messaging with various friends and colleagues and heroes and comedians and ... I used to describe Twitter as being like just the status postings from Facebook. But Facebook has changed so much since then that I'm not sure that description still applies. And the network of people you interact with on Facebook and Twitter are quite different (certainly, the way I use those services).

But, really, Twitter isn't like texting or RSS feeds or instant messaging or Facebooking. Or, actually, it is kind of like all of those things and more and less, all at once. 

Thursday, October 6, 2011

REST APIs and Errors

What's the "right way" to indicate errors in a RESTful API?

Rest Area? by joeshlabotnik
http://www.flickr.com/photos/joeshlabotnik/419914250/

HTTP Response Codes
As with many aspects of RESTful APIs, it turns out that this has already been worked out and standardized as a fundamental part of the web architecture.

If you're building a RESTful API, you should use the HTTP response code to indicate whether or not a call to the webservice succeeded (http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html). If everything is OK, then return “200 OK”. If something goes wrong, return an appropriate response code in the 3xx, 4xx, or 5xx range. Codes in the 3xx range indicate different types of redirect, e.g. 307 Temporary Redirect. Codes in the 4xx range are conditions that a client can (generally) address, e.g. 400 Bad Request. Codes in the 5xx range indicate types of server error, e.g. 500 Internal Server Error.

The advantages of using this standard mechanism and these standard error codes include:
1. Good documentation of how they work, what they mean, what to do in each case (e.g. http://en.wikipedia.org/wiki/List_of_HTTP_status_codes)
2. There are quite a few existing implementations of this scheme on both the client side (e.g. web browsers) and on the server side (e.g. web servers and web frameworks)

Error Documents
If a REST API returns a response code in one of the error ranges (3xx, 4xx, 5xx), it should be a signal to the client that any response metadata and payload should not be interpreted as a response to the request, but as an explanation of the error. 

I haven't been able to identify any standard for the details of error responses, probably because the details of errors for different APIs can vary so widely. However, I like Amazon’s S3 REST webservice error document schema:


An example is
<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>NoSuchKey</Code>
  <Message>The resource you requested does not exist</Message>
  <Resource>/mybucket/myfoto.jpg</Resource>
  <RequestId>4442587FB7D0A2F9</RequestId>
</Error>

The schema is that each <Error> document contains a <Code> (a unique id for the error) and a <Message> (an English-language description of the error). Optionally, <Error> documents can contain additional XML structures that are specific to each kind of error.

Sunday, September 11, 2011

Deciding to Remember


In many countries, certainly in Europe, shared national memory has a reality in the naming of public squares and boulevards by dates. The equivalent for us would be if the Fourth of July were one term in a vocabulary of dates with civic emotional meaning, rather than unique. (There are not likely to be boulevards named for December 7 or November 22.)

In October 1999, American Poet Laureate Robert Pinsky reflected on

what makes the American people "a people" -- and what our poetry can teach us about the "fragile, heroic enterprise of remembering"


Wednesday, September 7, 2011

Six Ways You Can Help Us Get to rNews 1.0

The IPTC has been working on rNews since late 2010. We're now getting close to unveiling rNews 1.0 - and you can help get us there!

What is rNews?
rNews is a way to embed news-specific metadata into HTML pages. With a standard for representing that news-specific metadata, providers could consistently apply it to web pages in a way that makes it easier to use for tool makers (such as search engines). One key benefit of making it a standard is that it lets smaller players - both publishers and tool makers - participate on an equal footing with larger players. As I discussed in Seven Ideas for rNews you can think of rNews as being like a news-specific API for webpages to encourage open innovation (but with less technical and managerial overhead than typical APIs).

How has the IPTC been Crafting rNews?
Amongst other things, the IPTC's Semantic Web Working Group has been working steadily on the underlying model for rNews. We did lots of outreach to publishers and people interested in consuming news metadata on the web, including SemWeb meetups in New York, Berlin and London. Yesterday, the IPTC SemWeb group agreed to rNews 0.7, which you can see via this rNews spreadsheet.

What are the Specifics of rNews 0.7?
The spreadsheet includes the rNews classes, properties and definitions. It indicates the differences from the previous version of rNews. It also includes a potential alignment with schema.org, which has a similar model but is aimed at microdata specifically, whereas we see the rNews model as working with multiple different syntaxes, including both microdata and RDFa. The rNews site hasn't yet been updated to reflect all of the changes in version 0.7. Partly, that's because we only just agreed the details yesterday. But in part, that's also because we're about to move to rNews 1.0 - the first full Production Release of rNews!

How Can I Help Get to rNews 1.0?
The IPTC plans to vote on rNews 1.0 in the first week of October, at our next face-to-face meeting. You can help ensure that the rNews 1.0 launch is a success by doing any or all of the following:

1. Examine the rNews 0.7 spreadsheet and point out problems, omissions or contradictions via the rNews forum
2. A great way to get to grips with the details is to try marking up a news webpage using rNews 0.7 (you can see some examples linked from the rNews main page)
3. Try extracting rNews properties from an example page using an RDFa or microdata distiller
4. If you like your rNews example marked-up webpage, consider contributing it to the IPTC, so we can share it with others
5. Think about an innovative service or tool you can build using news markup applied consistently and at scale across the web
6. If you work for a publisher, start to talk to your colleagues about the potential to unlock innovation for news on the web via improved news metadata markup

Since the beginning, the rNews work has been a group effort, greeted with interest and enthusiasm from a variety of people. Now - with your help - we are within sight of reaching a very  significant milestone: the release of rNews 1.0.

Sunday, July 10, 2011

404

Some have argued that the web's ever-breaking-links are the major reason for the web's success. (See, for example, this discussion of the earlier Xanadu system and how such problems as link rot are, in fact, symptoms of why HTTP and HTML have been world-transforming successes, whereas Xanadu was not).

One consequence of the web's broken links is the need for 404 pages. (It is said that 404 pages are named for a room in the CERN laboratory). Original 404 pages solved a simple problem - something went wrong and a link broke, perhaps due to a web template programming error. Gradually, site designers realized, though, that 404 pages were a way to have a bit more fun. Here are a few of those more creative 404 pages that I've come across recently. What are your favourites?
ufortunate placement of yahoo! ad by jkenning
http://www.flickr.com/photos/jkenning/464845773/
Lost in the Clouds
http://shwn.me/ftyffyt

The Most Awkward 404 Not Found Page on the Internet
http://visitsteve.com/404.html

Bad Web Address Not Handled
http://huml.org/404.shtml

A Thing of Beauty is a Joy Forever
Of course, such is the beauty / tragedy of the web that eventually the above links will, themselves, rot away and these nice 404 pages will disappear, to be replaced by not so nice ones.

Oh, well.

Tuesday, July 5, 2011

On the Difficulty of Defining Concepts

There are certain concepts whose meanings seem intuitively obvious - until you need to write them down.

I run into the need to craft a definition for a concept quite often in my work as Deputy Director of Schema Standards for the AP and in the process of defining standards for news at the IPTC.
Unidentified Structure by alanenlgish
http://www.flickr.com/photos/alanenglish/5728890490/
Define - Structure - Define
Routinely, we will decide that we need to represent something. So, we start by giving it a name and will generally sketch out a definition, and probably cite some examples. We will figure out the structure and properties of the concept, testing the structure against all the examples we can think of, to ensure we cover any edge cases. That process can be quite difficult in itself. However, once we think we are done, the original name and tentative definition are often called into question. That's because the process of thinking through the structure of a concept and how to represent it will often uncover variants that don't neatly fit within the original definition. Sometimes, this can lead to arguments over whether what we are modeling is a single core concept after all or is better described as a set of similar but distinct concepts.
Avoid by phunk
http://www.flickr.com/photos/phunk/4849730748/
Avoiding Tag Abuse
It is all too tempting to provide a definition that tautologically uses the name of the concept itself. Or - only slightly better - uses a synonym. Names and definitions are important when crafting a standard. They help others to convey information - avoiding the perils of tag abuse - and can highlight when a standard is genuinely missing an important concept.
Source #7 by nostri-imago
http://www.flickr.com/photos/nostri-imago/2872060566/
Lose the Source
Right now, the IPTC is going through that frustrating process once again over the concept of "Source". News standards such as hNews and rNews have the concept of Source, as did NewsML 1, the pre-cursor to the G2 Family of news standards (NewsML-G2, EventsML-G2 and SportsML-G2). And yet, we're debating whether and how to represent the Source concept within G2. The main hold up appears to be the lack of a clear definition: how to clearly define "Source" by saying what it is that all sources hold in common and what makes Source different than other concepts (such as Copyright Holder or Provider or Author)?
Language games by smackbox
http://www.flickr.com/photos/smackbox/70171055/
Language Games
This problem of defining a slippery concept is not new - or restricted simply to the practice of information modeling. Ludwig Wittgenstein dealt with this question in his Philosophical Investigations. In his aphorisms 65-69, Wittgenstein points out that the concept "game" is a useful one and people can intuitively claim they know what things are games. And yet games do not have "one thing in common" but "are related to one another in many different ways". I suggest that there are many useful concepts (like the concept of a "Source" in news) which cannot be reduced to a single essential set of characteristics but are instead a set of different things that resemble each other.
Ampersands by lwr
http://www.flickr.com/photos/lwr/3940977128/
&?
If all this talk of "concepts" is too abstract, you might prefer a more visual illustration. Consider any letter or alphabetical symbol. Such is the inventiveness of typographers - and the adaptability of human pattern recognition - that it is difficult to identify the core set of rules that identify a particular symbol. What makes an ampersand an ampersand?

"A rose by any other name would smell as sweet" by johnkay
http://www.flickr.com/photos/johnkay/3517370850/

What's in a Name?
Wittgenstein was not the first to point out dividing up the world into named and well-defined concepts is hard. The Sorites Paradox deals with the problem - and importance - of vagueness in defining concepts. Sorites derives from the Greek word soros (meaning ‘heap’):
Would you describe a single grain of wheat as a heap? No. Would you describe two grains of wheat as a heap? No. … You must admit the presence of a heap sooner or later, so where do you draw the line?

And, in the Tao Te Ching, Lao Tsu says
Once the whole is divided, the parts need names.
 There are already enough names.
 One must know when to stop.
 Knowing when to stop averts trouble.

Wednesday, June 22, 2011

Stuart Myles Has Twitter Handle @smyles

I like that - when I searched Google for images of me just now - this was one of the top results:


Saturday, April 16, 2011

rNews - News Metadata in HTML

I chair the IPTC's SemWeb group. This is the third in a series of short posts about IPTC's work on using Semantic Web technologies for news. (The first post discussed where we stand with Linked Data for News and the second post looked at News Ontologies).

The big news at our March 2011 meeting was that the IPTC voted to approve Draft 0.1 of rNews. This kicks off an experimental phase, in which we ask people to learn about rNews and give us feedback via the rNews Forum. We plan to incorporate all the feedback (and fix a couple of little errors that we've found) so that Draft 0.2 of rNews will be ready for the Berlin IPTC meeting in June.
The Path by cuppini
http://www.flickr.com/photos/cuppini/539692848/
rNews and hNews
rNews is a set of specifications and best practices for using RDFa to embed news-specific metadata into HTML documents. It serves a similar purpose to hNews, although it uses a different technical approach. (hNews is a microformat - a set of conventions that make use of standard HTML elements and attributes to convey metadata). Although it is tempting to see hNews and rNews as rivals, I actually see them as supporting one another. And my suspicion is that tools that support one will pretty easily be able to support the other.

We're already starting to get some feedback on rNews, which is great. We've gotten some excellent questions about the rNews design philosophy from those with in-depth knowledge of the Semantic Web but little direct news publishing experience. And publishers who are old hands at the news business are looking at rNews as a way to learn more about semantic markup for their web content. Whatever your background, I encourage you to look at rNews and let us know what you think.

Thank You by theredproject
http://www.flickr.com/photos/theredproject/3302110152/
Getting There
The road to rNews started in the summer of 2010. Like any worthwhile technical standard, it required both articulating a vision of what we wanted to achieve and a lot of long conference calls, poring over the details in spreadsheets. I'd like to take this opportunity to particularly thank Dave ComptonJohn EvansAndreas GebhardJayson LorenzenEvan Sandhaus and Michael Steidl for their dedication in creating and refining rNews Draft 0.1

rNews Live!
I am excited about rNews and hNews and their potential to stimulate an ecosystem of tools for news on the web. I have some ideas for how this might happen, some of which I talked about in the recently-published interview about rNews with semanticweb.com.
NYC New York Times Building by wallyg
http://www.flickr.com/photos/wallyg/2259318046/
I plan to share some more ideas about rNews at the upcoming New York Semantic Meetup "Meet the IPTC and learn about rNews", hosted by the New York Times. You should come by and find out more.

Monday, April 4, 2011

News Ontology - Large Pieces, Loosely Joined

I chair the IPTC's SemWeb group. This is the second in a short series of short posts about IPTC's work on using Semantic Web technologies for news. (The first post discussed where we stand with Linked Data for News).

News Ontology - or, actually, ontologies
I am more-or-less aware of several projects that are underway to build ontologies relating to news. These various efforts seem more-or-less aware of each other and are, in some cases, working with each other directly. Specifically, I believe that PA, AFP, BBC, EBU, W3C and IPTC are each crafting ontologies - and there are probably more besides.

Wait - Onto What?
The word "ontology" often seems to scare people. But it just means a formal representation of knowledge in a particular domain, expressed using Semantic Web technologies (specifically OWL - the Web Ontology Language).

An OWL or a WOL? by dullhunk
http://www.flickr.com/photos/dullhunk/422076496/

You may also find this discussion of how ontologies relate to controlled vocabularies and taxonomies helpful. (Although, capriciously, I have linked to an ontology definition that is outside of the W3C SemWeb Technology orthodoxy. Or maybe that was deliberate?)

So, What Good is an Ontology?
Mike Atherton, User Experierence Designer at RedUXD, published a well-received presentation that nicely illustrates one powerful benefit of an ontology. However, his 100 slide Beyond the Polar Bear deck doesn't mention any ontologies until the 99th slide. Instead, it discusses the important of domain modeling and how that helps him build compelling user experiences for substantial websites, chock full of different types of media  and content types - he discusses various BBC websites and microsites. (I highly recommend you check it out - don't let the apparent length put you off).
Won't you be my friend? by ucumari
http://www.flickr.com/photos/ucumari/2440393713/
Ontologies in themselves don't directly deliver a benefit. Instead, they are infrastructure: they are a helpful way to structure information that - in combination with other technologies and practices - can deliver significant advantages over shallower ways of working in a particular domain, such as news. And not just for user experience; there are ways to exploit ontologies in other areas, including mining news (such as for sentiment analysis) or for data journalism.

Examples,  Please
Not all of the various ontologies are public at this time, but a few are.

For example, the BBC has ontologies for programmes, wildlife and  sport. The W3C recently published version 1.0 of their Ontology for Media Resources.

The EBU have experimented with an ontology based on IPTC's NewsML-G2, as have others. And at the recent IPTC face-to-face meeting, Paul Kelly of XML Team discussed the potential for Sports and Semantic Technologies.

Can't We Just Have One?
Rather than having all these different, overlapping ontologies, is it possible to just have one super, unified ontology?

In fact, the decisions about what to include, omit, emphasize or downplay in your ontology depend on what you're trying to do. I believe that each ontology therefore reflects a particular point of view, a specific editorial voice, in deciding what is and isn't important within a domain. That is not to say there would be no benefit from a coordinated, standardized news ontology. The work to create (never mind understand and use) an ontology is significant; even when you agree on the key things to model, there are different choices about the best way to express that model (sometimes driven by the limitations in the tools you have to work with). And having a standard model promotes greater interoperability amongst providers and more choice for clients.

One of the key benefits of Semantic Web technologies is the ability to mix and match different ontologies. And modern methods for developing ontologies (such as NeOn) emphasize reuse and composition. (See also the interesting Master Thesis "Analyzing and Ranking Multimedia Ontologies for their Reuse" by Ghislain Auguste Atemezin) So, I see the independent but somewhat coordinated efforts to create news-related ontologies as being a strength.
Mochuelo de hoyo by barloventomagico
http://www.flickr.com/photos/barloventomagico/2435316564/

Get Involved
If you would like to find out more about the work that the IPTC is doing to help standardize the use of Semantic Web technologies for news, then get in touch.

Tuesday, March 29, 2011

Linked Data for News - An Update on IPTC and Semantic Web Technologies

At the IPTC's most recent face-to-face meeting in Dubai, much of the discussions revolved around Semantic Web technologies. The big news was that the IPTC voted to approve Draft 0.1 of rNews, but this wasn't the only matter discussed.

As I blogged about before, the news standards body has been looking at three areas and how they relate to news:

I chair the IPTC's SemWeb group. Here is the first in a short series of posts on where each of these areas stand and how you can get involved.

The Semantic Web
Semantic Web technologies extend today's web with machine readable information and links between data and services. The "Semantic Web" has also been termed "The Giant Global Graph" and "Web 3.0", amongst other names; there are several different theories about exactly what it is and exactly how to get there.

IPTC News Codes using Linked Data
The IPTC has explored the technical aspects of representing news codes using the technologies and conventions of Linked Data.
Link by manel
http://www.flickr.com/photos/manel/315901872/
Linked Data is a set of best practices for publishing data using a subset of Semantic Web technologies. The IPTC News Codes are a set of metadata taxonomies designed for use by the news industry. The codes are already expressed in machine-readable XML, using IPTC's in G2 KnowledgeItem mechanism; it seemed a natural fit to explore expressing the news codes using the Linked Data principles.


IPTC and MINDS
Inspired by this IPTC work, MINDS (an association of European and US news agencies) and the IPTC have been mulling a joint project based on Linked Data for news. At the Dubai meeting, I reviewed the presentation I gave in February 2011 to MINDS about IPTC's Semantic Web and Linked Data work. There's a MINDS meeting in London the week of March 14th, so I expect we will learn more about any joint work.

The Chain by intherough
http://www.flickr.com/photos/intherough/3244476512/
If you'd like to access the IPTC news codes in SKOS (not to mention XHTML and G2) then visit http://www.iptc.org/site/NewsCodes/NewsCodes_Retrieval_in_Different_Formats and find out more about their full content-negotiation glory.

Get Involved
You can get involved with the news codes work by contacting the IPTC; one easy way to do that is to join the news codes Yahoo! email list.

Thursday, March 24, 2011

BBC TV and Video Futures

The boffins at the BBC have been doing a lot of interesting work with an eye to the future of tv and video. And they have been kind enough to share a lot of their work via their blog.
Boffin by ajc1
http://www.flickr.com/photos/ajc1/2368836604/
A Connected Future for TV
Many TVs are able to connect to the Internet, either directly or via a plethora of IP-connected boxes. However, for most viewers, the Net is just a different way to receive programming - an alternative to cable, satellite or over-the-air broadcast channels. In March 2011, Roly Keating, BBC Director of Archive Content, used his keynote speech at the Digital Television Group's Annual Summit to describe his vision for Connected TV.
televsion by waltjabsco
http://www.flickr.com/photos/waltjabsco/684747788/
For Keating, Connected TV means more than just video on demand or using your TV to browse the Internet. He would like to enrich the experience of watching TV by giving viewers the ability to explore topics in greater depth, by providing just-in-time context, by creating links between TV and Internet programming and viewers. To me, this means bringing the power of hypermedia to visual content, just as the web has brought hypermedia to text in recent decades.

Frame Accurate Video in HTML5
Dirk-Willem van Gulik, BBC Chief Technical Architect, reviews how he and his team have worked with the open source community to create frame accurate video editing capabilities using web technologies, specifically HTML5. This means that using off-the-shelf browsers, it will be possible to work on video at professional levels of precision, on virtually any kind of web-connected device.

At this point, the capabilities are only in the bleeding-edge versions of (certain) browsers. And the frame accurate browser facilities are just foundational; the actual tools to perform full-fledged video editing can be built on top of this platform, but don't exist quite yet, it seems. But I think it is great that the BBC have really invested in building out missing capabilities in open source tools that will ultimately benefit not only themselves, but may other publishers, large and small.

BBC's Technology Vision
Spencer Piggott, Head of Technology Direction for BBC Technology, reveals the BBC Technology Strategy. In a comprehensive set of bullet points in the linked powerpoint, he covers the plans for such topics as high definition, content acquisition, development platforms, transcoding, internet distribution, rights management and search, amongst many others.
Vision sign by hamptonroadspartnership
http://www.flickr.com/photos/hamptonroadspartnership/5351621035/
It is great to get this kind of insight into the BBC Technology challenges and plans. It would seem that the BBC boffins are working on lots of interesting things, which is reassuring.
Reassuring by arenamontanus
http://www.flickr.com/photos/arenamontanus/2849737658/

Sunday, February 20, 2011

Recommendations and Randomness: How I Pick Which Books to Read

There was a time when I went to a library or a bookshop and was immediately faced with a problem: how to pick which book to read?  Over time, I have changed how I solve this problem, which has changed what I read.

Maybe I should change again?

How Did I Use to Do It?
One technique would be to find the section that corresponded to an interesting-to-me topic and randomly browse through those books, trying to find one that looked worth reading.  I would look at the table of contents, skim the first page or two.  This was generally pretty successful  - and explains why I would read so many non-fiction books.

The other thing I would do is to see if I could find the latest book by an author I had previously read and liked.  Which would explain why I read so many mystery or science fiction books (genres which thrive on series and name-brand authors) and virtually no literary fiction.  Also, I like mysteries and science fiction.

Obviously, I was not alone in these techniques.  Bookshops and libraries continue to be organized by topic and type (although bookshops in America seem to be evolving into coffee shops and ebook device vendors, with paper books as something of an afterthought).  And the book publishing industry loves when a big name author releases the latest in a series.

However, over time, I decided to change things up a little bit.

Why Did I Feel I Needed to Change My Ways?
The revolution in self-publishing means that a lot more good reading is easily accessible online.  (Together with a lot more bad reading, too, of course - let's not forget Sturgeon's Law).  And it is not only accessibility which has changed.  The good stuff is a lot more discoverable, too.  For me, the advent of blogs (and later Twitter) has meant that I get directed to the best non fiction reading all the time.  More, in fact, than I can read.  (It is possible that I don't actually find out about the best non fiction online via blogs and tweets, but ignorance is bliss).

At the same time, the quality of non fiction books appeared to me to decline.  Too often, it seemed that the essential idea is captured in the title.  The rest of the book is then simply a rehearsing of that idea over and over, typically lacking nuance and depressingly often an over-simplified look at the world through a single lens.

My fiction technique held obvious flaws - how to learn about new authors or read something other than genre fiction?  In the immortal words of the Stranglers - Something Better Change.

I've Got a Little List
Really, the new-to-me technique I adopted is quite old and very obvious.  But here it is.  On the one hand, I extended the techniques for discovering the good stuff online to the world of books.  Once in a while, the blog posts and tweets would mention an interesting-sounding book.  So, I would add the author and title to an online list. (I use backpack).  But the real source of recommendations is The New York Times' Sunday Book Review (I read the dead trees version, but you may prefer it online).  Yes, I find that critics are the best source of information about books.  And, often for non fiction, you get enough of the idea from the review, so that you don't need to read the actual book.  (I try to supplement my source of critical information, including the Guardian, but the NYT is my main source).

Randomness and Routine
So, I now have a set of great recommendations for books and authors in an online list.  (At the time of writing, I have somewhere between one hundred and two hundred entries). But how do I actually select what to read?

Well, my normal routine is: go to the library, consult my list of books to read and randomly trawl through the list, consulting the library's catalogue until I find some that are available. I will also try to balance things out a little bit by trying to read some non-fiction, along with non-fiction. Plus I like to get at least one smallish, paperback book that I can read on the train. But random factors like who else happens to have checked out a book on my list, plus whether the library happens to have decided to stock it in the first place, heavily influence what I read in practice (rather than the theory of my recommendations list).


How is it Working Out?
I've been pretty happy with this set of techniques. I read way more fiction - and different types of fiction - than I used to. I tend not to read all that much non-fiction anymore. Although, having noticed this, I've resolved to do better with that. It means that I almost exclusively read books from the library (although I will read a book if you give or lend me it, too).

Areas where it doesn't work include books that are too obscure or controversial for my local library to stock. (Typically, this limits the non-fiction work I might read). And it also means that the idea of e-books (the Nook, the Kindle, etc.) doesn't make any sense to me. Half of my technique relies on the constraints of what is available at the library. I imagine that e-books have some constraints (maybe not everything is available as an e-book?) but it doesn't seem the same, somehow. Similarly, it means that I don't really visit first-hand bookshops anymore. Unless there's a specific book that I need to buy (generally for someone else) or it is some unique emergency situation (such as in an airport).

How do you pick your books?

Wednesday, February 2, 2011

Do OWL Classes Inherit Properties?

I'm slowly learning about using Semantic Web technologies. Sadly, I'm trying to do this in a rather ad hoc, as needed way, so my understanding is far from complete. I just ran across this interesting question: does an ontology subclass inherit the properties of its superclass?

Searching the web for opinions on this topic leads to a couple of contradictory views:

http://www.semanticoverflow.com/questions/619/rdfs-owl-inheritance-with-josekipellet
The answer to this question says: "Instances of subClasses do not inherit properties from instances of parent classes".

http://eclectic-tech.blogspot.com/2010/05/semantic-web-introduction-part-3-rdf.html
This states "In the example below, Penicillin is declared to be a sub class of both Antibiotic and USRegulatedMedication.It will therefore inherit the properties of those classes."

So, I turned to the W3C RDF Schema Recommendation, to see whether the definition would shed any light.

http://www.w3.org/TR/rdf-schema/#ch_subclassof
"The property rdfs:subClassOf is an instance of rdf:Property that is used to state that all the instances of one class are instances of another."

This reads to me that when C1 rdfs:subClassOf C2 then anything that is a C1 is also a C2.So, that seems to directly support the notion that C1 inherits all the properties of C2.

Certainly, if rdf:subClassOf doesn't mean that a subClass inherits the properties of the parent, then what does it mean? That it is unclear is a bit worrying, though.

Tuesday, January 4, 2011

The Books I Read in 2010

I decided to keep track of all the books I read in 2010.   I read thirty: twenty fiction and ten non-fiction.  It seems that I am old school - I don't read ebooks or listen to audiobooks.  And I didn't count any books that I didn't finish (except one).
http://www.flickr.com/photos/austinevan/1225274637/
Thanks to austinevan for this stack of books photo

I posted the entire lists of what I read, together with my impressions of each one (fiction, non-fiction).  But, just in case you don't want to read about all thirty, I think that these were my favourites:

Fiction
Always Outnumbered, Always Outgunned
Kavalier & Clay
Valley Of Bones
Super Sad True Love Story

Non Fiction
Consider The Lobster
The Pleasures and Sorrows of Work
How to Become a Scandal
The Age of Paradox

Of course, I feel that I should have read more.  Maybe I'll follow the advice in How to Read More Books and aim for more than 100 books in 2011?  Probably not, though.

It is interesting that I read twice as many fiction as non-fiction.  I used to never read novels; but then I made an effort to change that.  Maybe I've now over-shot?  The non-fiction books I read are not (directly) to do with my work (whatever that is).  Though maybe that's because there are not many books about that?  Or maybe because I already read plenty about that stuff online?  I found it interesting that so many of the books I read were by chance (given as gifts or in a library-sale-bag).  I notice that I like novels that are set in places I know (London, Edinburgh, Princeton) or am visiting (Rome, Edinburgh).  And I like to read lots of mysteries and sci-fi, but none of my fiction favourites are exactly any of those genres (although three of the four sort of are).

But assembling these lists and writing my little reviews was a lot of fun - almost like reading a year's worth of books all over again.

Sunday, January 2, 2011

The Fiction Books I Read in 2010

I decided to keep track of all the books I read in 2010. I wondered what I would learn from doing this? To make it easier for me to publish and (hopefully) for you to digest, I've broken it into fiction and non. So, here are the 20 fiction books, in the order I read them in 2010. (You can also read my 2010 non fiction list).

1. The Road by Cormac McCarthy.
Set after an unspecified apocalypse, this book follows an unnamed father and son trudging along a road through a grey wasteland of death and destruction. If this sounds grim, well, it is.

http://www.flickr.com/photos/osiatynska/3287986172/
Thanks to osiantynska for this image of piles of books
Although I like horror films, I don't read many (any?) horror books; this one was a gift. I often regretted reading it right before going to sleep. And, after reading the book, I really didn't want to watch the film version. However, I thoroughly enjoyed the book - grim, yet gripping.

2. A Star Called Henry (The Last Roundup) by Roddy Doyle.
Henry, the son of a one-legged hitman, grows up on the streets of Dublin in the early 20th century. He gets mixed up in the Easter Rising of 1916 and becomes a hitman for the IRA.

Roddy Doyle's convincing portrait of Irish street life is by turns comic and tragic. It cleverly makes you see why someone would want to become a terrorist; but also how Henry is inevitably betrayed by his own cause.

3. Loving Frank: A Novel by Nancy Horan
Just like A Star Called Henry, Loving Frank is a novel based on true events. This one follows Mamah Bothwick Cheney, who had an affair with Frank Lloyd Wright.

There was nothing wrong with this book but it just didn't grip me. I never did finish it... but I'm going to count it anyway, partly because I made it most of the way through. And partly because my total number of books read would be even worse otherwise.

4. The Diamond Age: Or, a Young Lady's Illustrated Primer by Neal Stephenson
This science fiction novel portrays a world that is being reshaped by nanotechnology and being carved up amongst competing tribes, such as the neoconfucians and neovictorians. Several interlocking plot threads follow nicely-drawn characters and weave the details of the fully-imagined future cultures.

I read - and loved - Stephenson's SF classic Snow Crash and his hugely ambitious Baroque Cycle historical fiction books. Diamond Age is quite different again from these in style and subject matter. But, just as with his other books, Stephenson crafted a novel that is both entertaining and thought-provoking.

5. A Most Wanted Man by John le Carre
Le Carre's post-9/11 spy thriller is set in Hamburg and centres on Issa, who may be a Muslim medical student, or a Chechnyan terrorist, or the son of a Russian spy, or all of the above.

Another fictional book that make me feel smarter and better-informed about the real world and its moral complexities.

6. Always Outnumbered, Always Outgunned by Walter Mosley
Each of the perfectly crafted short stories about Socrates Fortlow, an ex-convict now scraping out an existence in a Watts shack, stands on its own. And yet the whole collection hangs together like a novel.

Brilliant stuff.

7. The Tourist by Olen Steinhauer
Portraying the alienation and action in the life of a post Cold War spy is now a well-trodden path. But Steinhauer's tale of a CIA "Tourist" (a spy of no-fixed-abode) is well-executed, with convincing characters and a multi-layered plot, played out against backdrops that combine glamour and squalor in equal measure.

8. Making History by Stephen Fry
This is polymath and over-achiever Stephen Fry's third novel. It is an alternative history (what if Hitler won WWII?) although it is cleverly framed as an alternative reality (what if we could go back and change history to prevent Hitler's rise to power?). As you might guess, there's a be-careful-what-you-wish-for twist and Fry uses the what-if's to illustrate how things are better than they might have been.

Not the best novel I've ever read, but entertaining enough and I particularly liked the details of the London and Princeton settings.

9. American Gods: A Novel by Neil Gaiman
I rarely re-read a book, but American Gods was selected for One Book One Twitter, an experimental online bookclub and I decided to join in the fun (a little bit). AG follows the journey of Shadow as he learns about the gods that followed people to America and wound up as shadows of their former godselves, as people forgot their old gods in the New World.

I thoroughly enjoyed re-reading this book, partly because of the added Twittery dimension, but also because I noticed a lot more the second time through this dense and cleverly written imaginative work.

10. Neverwhere: A Novel by Neil Gaiman
Next, I decided to read a new-to-me Gaiman novel. Neverwhere takes you beneath London into a fantasy informed by punning on the Tube stop names (you meet black friars, an Earl holds court and there's a central character who is an angel called Islington).

Lighter-weight than American Gods, but fun none-the-less.

11. Fleshmarket Alley: An Inspector Rebus Novel by Ian Rankin
As our big holiday this summer revolved around Edinburgh, I followed my tradition and read a book set in that city. I was happy that Fleshmarket Alley is a real location off Cockburn Street - and only slightly disappointed that it is really called Fleshmarket Close (as are the books outside of the USA). We saw Ian Rankin in the flesh (and Alisa spoke to him), as the Edinburgh Book Festival was in full swing when we visited.

Fleshmarket Alley / Close is a solid entry in the seemingly neverending series of well-written Rebus murder mysteries.

12. Best of Philip K Dick by Philip K Dick
I was inspired to read this collection of sci-fi short stories by the World Cup. Remember Paul? Well, Philip K Dick wrote Martian Time-Slip which I hear involves an octopus that can predict the future (or maybe is controlling the future). Sadly, that book wasn't available in the library. But this collection was - and it contained several of his short stories that were later reworked into films (Minority Report, Total Recall and so on).

13. The Amazing Adventures of Kavalier & Clay by Michael Chabon
I was flying somewhere when I ran out of books to read. I think I had a 45 minute layover in Chicago? So, I ran into a bookshop and bought this book. It turned out to be a great choice.

The mid-twentieth century Cambrian explosion of comic books is the backdrop for and inspires the style of this novel about two Jewish cousins and their families in New York City. Although the action roams far and wide, including Prague (with several magical appearances by the Golem) and frozen mad battles in Antartica.

14. Valley of Bones: A Novel by Michael Gruber
Cuban-American detective Jimmy Paz investigates a mysterious, mystic nun implicated in the murder of an Arab arms dealer. This turned out to be a very original entry in the mystery/thriller genre, as it roams back and forth in time and between continents.

15. Murder on the Leviathan: A Novel by Boris Akunin
Translated from the Russian, this is a period murder mystery set onboard a cruise ship. I particularly liked how each chapter was told from the view point of a different character.

16. The Echelon Vendetta by David Stone
At the local library fundraiser, I bought a bag of books for a dollar. This was one of them. But this CIA spy thriller wasn't half bad.

17. Severance Package by Duane Swierczynski
There were several things I liked about this thriller by former Philadelphia City Paper editor Swiercynski. It is simultaneously set in Philadelphia and Edinburgh, both cities I know quite well. The action and plotting are a bit comic book but there are enough comedic touches to make it an entertaining ride.

18. Cold Service by Robert B Parker
19. Hush Money by Robert B Parker
Both of these Parker books were in the for-a-dollar bag. They are each competent entries in the Spenser detective series. Quick and painless reads.

20. Super Sad True Love Story: A Novel by Gary Shteyngart
Picked by several people as their book of the year, this novel is set in Rome and New York City in the near future. It traces Lenny (obviously modeled on Gary) and Eunice (his Korean American gf) as they navigate their way through the fashion-obsessed, technology drenched, book disdaining world. Everyone is glued to their apparats (descendants of today's smartphones) which broadcast constant streams of data about you and bombard you with information about the people and things around you. I particularly liked the portrayals of the Bipartisan party and the ubiquitous National Guard checkpoints.

In a lot of ways, SSTLS reminded me of William Gibson's recent novels, although Shteyngart is more skeptical of the frantic, technology driven future. It reminded me of a thought I had whilst in Rome in November - that there is hope for a happy future after an empire collapses, which is heartening given what the USA is going through currently. There are many clever references ("The Rupture") but there is also a lot of depth. Lenny and Eunice are convincing characters and quite different from one another. My last novel I read in 2010 and one of my favourites.