Tuesday, May 8, 2012

Liking Links

I like the link.

Links by Clips
by RambergMediaImages
This may seem an uncontroversial position to take. But I'm specifically talking about the HTML link element. It is a nice model for expressing relationships between the current document and other resources. It has been imported into various XML standards (such as ATOM and NewsML-G2). But it isn't the only way to express links. Even in HTML, it is much more common to use

 <a href="http://example.com/">

than

<link href="http://example.com" rel="related">.
Chain Links
by MyTudut
The Linky Landscape
In fact, there are a range of types of links. If you think about the img element within HTML, it includes a reference to a remote object to include in a web page (via the img/@src attribue). Even though the img element uses an HTTP reference, it isn't very linky - it is more about composing a document from multiple resources of different types.

Anchor
 by Leo Reynolds
Anchors Aweigh!
The HTML a (anchor) element is a big step up in linkyness. It is generally used to specify an outbound navigational link from the current HTML page to some other resource (such as another HTML page, but it could also be a video clip, an image or anything else that can be addressed via a URL). But the HTML a element lets you specify how the remote resource relates to the current page: the a/@rel attribute lets you specify the relationship from the current document to the destination resource. Similarly, the a/@rev attribute lets you express the reverse relationship - how the remote resource relates to this document. (Remember, not all relationships are symmetrical). So, the anchor tag has some nice (though, in practice, rarely used) attributes for specifying in a machine-readable, controlled way the relationships between web resources. One limitation of the HTML a element is that it is only for use for "inline" links - such as marking up a bit of text within a paragraph.
link
by Profound Whatever
Link
The link element has all the same attributes as the anchor element but it is designed to be purely semantic - it simply defines a relationship between the current document and the resource referred to in the link/@href attribute. The link element cannot contain content and so it is purely about referencing another addressable resource. As with the HTML a element, the relationship between the document containing the link and the resource being referenced can be specified using the @rel and @rev attributes.
Links
by rubygold
Web Link Types
In an effort to provide some shared semantics about types of links, IANA put together a small, but growing, list of link types - values that you can use in your @rel or @rev attributes. Whenever you are adding a link to your document, check out the existing list of link relations, to see if you can provide some hints as to the relationship between your document and the linked resource:
http://www.iana.org/assignments/link-relations/link-relations.xml

Chain Links
by Eric M Martin
Spreading the Link
In fact, the HTML link element has proven so useful, that it has been adopted into other markup standards, notably IETF's ATOM and IPTC's NewsML-G2. The IETF issued RFC 5988 specifically about linking, not simply in the context of HTML documents.

So, if you need to establish relationships between resources, then consider using the link element, with @rel or @rev values. And if you happen to be designing a markup language, my advice is to make sure you include the full link element. I predict you'll find it useful over and over again.

Friday, January 6, 2012

One Good Thing About Not Being Popular

I suppose one good thing about my blog not being popular: I don't have to deal with comment spam!

Sunday, October 30, 2011

selenium + python = web automation ftw!

This past snowy weekend, I decided I wanted to try out automating one of the routine tasks I have to perform each week via a web browser. After a little googling, I decided that Selenium was my best option. And, with a little bit of work, I was able to use it to create a script to automate this chore - even though the web application I'm automating turned out to be quite unwebfriendly.

Before I forget, I thought I should quickly jot down what I learnt in the process, with the hope that this will be useful to others (or, perhaps, my future self).
Selenium Roses by bcostin
http://www.flickr.com/photos/bcostin/155246852/
Selenium
The Selenium website succinctly describes the software: "Selenium automates browsers". Primarily known as a web site testing tool, the seleniumhq site points out that "[b]oring web-based administration tasks can (and should!) also be automated as well".

It turns out that Selenium is, in fact, a suite of tools. There's Selenium IDE, a Firefox plugin that is ideal for quickly prototyping an automation script. And there are various server side components with the core being the WebDriver API. (The WebDriver is also known as "Selenium 2", as it is a merger of the original Selenium with a project started by a tester at Google. See the Selenium project history for more). The WebDriver lets you control various browsers - Internet Explorer, Chrome, Opera, HtmlUnit. It even has ways to simulate iPhone and Android devices. But the browser best-supported by Selenium is Firefox. I had to download a fresh copy, since I've almost entirely switched to Chrome these days. But it was worth it.
Firefox magnet (wallpaper) by flod
http://www.flickr.com/photos/flod/2568092124/

How I Did It
I wound up basing my script very much on the example in the Getting Started section. Along the way, I developed a few patterns that I'd like to pass along, in the hopes that they will help.

As I mentioned earlier, the web application I was trying to automate turned out to be quite unwebfriendly. What I mean is that it used all kinds of complicated tricks to ensure that, for example, the URL doesn't change as you navigate to different pages within the application. Similarly, rather than using anchor tags (<a href>) for links, it used Javascript "onclick" scripts to dynamically construct them. The good news is that Python-connected-to-Selenium-powering-Firefox is perfect to automate this kind of application, as it fully supports Javascript, frames, pop up windows and the like.
Waiting by amchu
http://www.flickr.com/photos/amchu/5261511319/
Waiting
One thing I quickly figured out was that my script needed to wait for the complicated, multi-part pages to finish loading before going onto the next step. The documentation explains how to use either explicit or implicit waiting in Selenium. At first, I followed the lead of the example and looked for a particular web page title. Or I thought I could detect changes using the URL. Except that my web application doesn't change title or URL as the pages change!

However, I hit upon a solution: in each step, my script searches for a particular element by name or id. So, I realized that waiting for that element to be appear was the most effective strategy. This lead me to code like this:

try:
    # we have to wait for the page to load
    WebDriverWait(driver, 10).until(lambda driver : driver.find_element_by_name("login"))
    
except:
    driver.quit()

(You might also notice that I altered my code to use "except:" rather than "finally:", since I don't want my script to always exit, whether or not the page load completes in time).

Use the Source
I also figured out a variation on the try/wait pattern to help me figure out the next step. Often, I wrote a WebDriverWait function that would time out, without succeeding. Eventually, I realized that simply quitting the driver wasn't the most useful way to debug things. Instead, I would use the page_source property to help me see what was really loaded into the browser at that point:

try:
    WebDriverWait(driver, 10).until(lambda driver : driver.find_element_by_name("mainfs"))
except:
    print driver.page_source

IDE Clues
I also found that it was useful to try to figure out how to navigate the web application using the Selenium-IDE plugin. I would record my actions in the IDE, then review the script it generated, to get clues as to how to write my WebDriver script.

I say "clues", because the IDE generates Selenium 1.0 commands, not the 2.0 syntax I needed to use. Generally, it gave me enough information that, with some searching of the API documentation, I could figure out the right equivalents. However, in certain cases, there are capabilities in Selenium 1.0 that have no direct equivalent in the 2.0.
Grandpa by conowithonen
http://www.flickr.com/photos/cmogle/2907198746/
A Problem of Focus
One example: in 1.0, there's a way to do relative navigation between frames. It turned out I needed this, since at one point in the web application I was trying to automate, it leaves the focus inside a subframe. Experimenting with the IDE, I saw that it wriggled out of this problem by executing

selectFrame(relative=up)

The problem is that there is no equivalent relative move between frames in Selenium 2. Eventually, I figured out that I needed to switch the focus to the entire window (by making use of the "current_window_handle" property), which then let me select the particular subframe I was looking for

driver.switch_to_window(driver.current_window_handle)
driver.switch_to_frame("body")

Don't Give Up!
Once I figured out these patterns of working by experimenting with the IDE, and printing the source whenever I got stuck, I found that automating the web application with Selenium was fairly straightforward. It would be nice if the documentation was a bit fuller. And it would be wonderful if the IDE generated Selenium 2 commands. But I think that my small investment in figuring it all out was worthwhile. So, now I'm looking for more things to try with Selenium. For example, could I use it to try out RESTful testing of APIs or Linked Data? Are there other web-based chores that I could (even partially) automate?

Monday, October 17, 2011

Why I Love Twitter


Recently, someone who doesn't use Twitter asked me to explain why I love that service which describes itself as a "real time information network".
learning by SReinwald
http://www.flickr.com/photos/d1sasta/2321821367/
As with most of these kinds of things, it is a little hard for me to explain why I love Twitter so, particularly to someone who has never used it. But maybe I will learn something from trying?
Tweet of Dweams by born1945
http://www.flickr.com/photos/12567713@N00/2369056718/
Perhaps Twitter is like interesting people from around the world texting and sometimes I text back? But it is more than that. I get most of my news through Twitter and it is like instant messaging with various friends and colleagues and heroes and comedians and ... I used to describe Twitter as being like just the status postings from Facebook. But Facebook has changed so much since then that I'm not sure that description still applies. And the network of people you interact with on Facebook and Twitter are quite different (certainly, the way I use those services).

But, really, Twitter isn't like texting or RSS feeds or instant messaging or Facebooking. Or, actually, it is kind of like all of those things and more and less, all at once. 

Thursday, October 6, 2011

REST APIs and Errors

What's the "right way" to indicate errors in a RESTful API?

Rest Area? by joeshlabotnik
http://www.flickr.com/photos/joeshlabotnik/419914250/

HTTP Response Codes
As with many aspects of RESTful APIs, it turns out that this has already been worked out and standardized as a fundamental part of the web architecture.

If you're building a RESTful API, you should use the HTTP response code to indicate whether or not a call to the webservice succeeded (http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html). If everything is OK, then return “200 OK”. If something goes wrong, return an appropriate response code in the 3xx, 4xx, or 5xx range. Codes in the 3xx range indicate different types of redirect, e.g. 307 Temporary Redirect. Codes in the 4xx range are conditions that a client can (generally) address, e.g. 400 Bad Request. Codes in the 5xx range indicate types of server error, e.g. 500 Internal Server Error.

The advantages of using this standard mechanism and these standard error codes include:
1. Good documentation of how they work, what they mean, what to do in each case (e.g. http://en.wikipedia.org/wiki/List_of_HTTP_status_codes)
2. There are quite a few existing implementations of this scheme on both the client side (e.g. web browsers) and on the server side (e.g. web servers and web frameworks)

Error Documents
If a REST API returns a response code in one of the error ranges (3xx, 4xx, 5xx), it should be a signal to the client that any response metadata and payload should not be interpreted as a response to the request, but as an explanation of the error. 

I haven't been able to identify any standard for the details of error responses, probably because the details of errors for different APIs can vary so widely. However, I like Amazon’s S3 REST webservice error document schema:


An example is
<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>NoSuchKey</Code>
  <Message>The resource you requested does not exist</Message>
  <Resource>/mybucket/myfoto.jpg</Resource>
  <RequestId>4442587FB7D0A2F9</RequestId>
</Error>

The schema is that each <Error> document contains a <Code> (a unique id for the error) and a <Message> (an English-language description of the error). Optionally, <Error> documents can contain additional XML structures that are specific to each kind of error.

Sunday, September 11, 2011

Deciding to Remember


In many countries, certainly in Europe, shared national memory has a reality in the naming of public squares and boulevards by dates. The equivalent for us would be if the Fourth of July were one term in a vocabulary of dates with civic emotional meaning, rather than unique. (There are not likely to be boulevards named for December 7 or November 22.)

In October 1999, American Poet Laureate Robert Pinsky reflected on

what makes the American people "a people" -- and what our poetry can teach us about the "fragile, heroic enterprise of remembering"


Wednesday, September 7, 2011

Six Ways You Can Help Us Get to rNews 1.0

The IPTC has been working on rNews since late 2010. We're now getting close to unveiling rNews 1.0 - and you can help get us there!

What is rNews?
rNews is a way to embed news-specific metadata into HTML pages. With a standard for representing that news-specific metadata, providers could consistently apply it to web pages in a way that makes it easier to use for tool makers (such as search engines). One key benefit of making it a standard is that it lets smaller players - both publishers and tool makers - participate on an equal footing with larger players. As I discussed in Seven Ideas for rNews you can think of rNews as being like a news-specific API for webpages to encourage open innovation (but with less technical and managerial overhead than typical APIs).

How has the IPTC been Crafting rNews?
Amongst other things, the IPTC's Semantic Web Working Group has been working steadily on the underlying model for rNews. We did lots of outreach to publishers and people interested in consuming news metadata on the web, including SemWeb meetups in New York, Berlin and London. Yesterday, the IPTC SemWeb group agreed to rNews 0.7, which you can see via this rNews spreadsheet.

What are the Specifics of rNews 0.7?
The spreadsheet includes the rNews classes, properties and definitions. It indicates the differences from the previous version of rNews. It also includes a potential alignment with schema.org, which has a similar model but is aimed at microdata specifically, whereas we see the rNews model as working with multiple different syntaxes, including both microdata and RDFa. The rNews site hasn't yet been updated to reflect all of the changes in version 0.7. Partly, that's because we only just agreed the details yesterday. But in part, that's also because we're about to move to rNews 1.0 - the first full Production Release of rNews!

How Can I Help Get to rNews 1.0?
The IPTC plans to vote on rNews 1.0 in the first week of October, at our next face-to-face meeting. You can help ensure that the rNews 1.0 launch is a success by doing any or all of the following:

1. Examine the rNews 0.7 spreadsheet and point out problems, omissions or contradictions via the rNews forum
2. A great way to get to grips with the details is to try marking up a news webpage using rNews 0.7 (you can see some examples linked from the rNews main page)
3. Try extracting rNews properties from an example page using an RDFa or microdata distiller
4. If you like your rNews example marked-up webpage, consider contributing it to the IPTC, so we can share it with others
5. Think about an innovative service or tool you can build using news markup applied consistently and at scale across the web
6. If you work for a publisher, start to talk to your colleagues about the potential to unlock innovation for news on the web via improved news metadata markup

Since the beginning, the rNews work has been a group effort, greeted with interest and enthusiasm from a variety of people. Now - with your help - we are within sight of reaching a very  significant milestone: the release of rNews 1.0.