Sunday, October 30, 2011

selenium + python = web automation ftw!

This past snowy weekend, I decided I wanted to try out automating one of the routine tasks I have to perform each week via a web browser. After a little googling, I decided that Selenium was my best option. And, with a little bit of work, I was able to use it to create a script to automate this chore - even though the web application I'm automating turned out to be quite unwebfriendly.

Before I forget, I thought I should quickly jot down what I learnt in the process, with the hope that this will be useful to others (or, perhaps, my future self).
Selenium Roses by bcostin
The Selenium website succinctly describes the software: "Selenium automates browsers". Primarily known as a web site testing tool, the seleniumhq site points out that "[b]oring web-based administration tasks can (and should!) also be automated as well".

It turns out that Selenium is, in fact, a suite of tools. There's Selenium IDE, a Firefox plugin that is ideal for quickly prototyping an automation script. And there are various server side components with the core being the WebDriver API. (The WebDriver is also known as "Selenium 2", as it is a merger of the original Selenium with a project started by a tester at Google. See the Selenium project history for more). The WebDriver lets you control various browsers - Internet Explorer, Chrome, Opera, HtmlUnit. It even has ways to simulate iPhone and Android devices. But the browser best-supported by Selenium is Firefox. I had to download a fresh copy, since I've almost entirely switched to Chrome these days. But it was worth it.
Firefox magnet (wallpaper) by flod

How I Did It
I wound up basing my script very much on the example in the Getting Started section. Along the way, I developed a few patterns that I'd like to pass along, in the hopes that they will help.

As I mentioned earlier, the web application I was trying to automate turned out to be quite unwebfriendly. What I mean is that it used all kinds of complicated tricks to ensure that, for example, the URL doesn't change as you navigate to different pages within the application. Similarly, rather than using anchor tags (<a href>) for links, it used Javascript "onclick" scripts to dynamically construct them. The good news is that Python-connected-to-Selenium-powering-Firefox is perfect to automate this kind of application, as it fully supports Javascript, frames, pop up windows and the like.
Waiting by amchu
One thing I quickly figured out was that my script needed to wait for the complicated, multi-part pages to finish loading before going onto the next step. The documentation explains how to use either explicit or implicit waiting in Selenium. At first, I followed the lead of the example and looked for a particular web page title. Or I thought I could detect changes using the URL. Except that my web application doesn't change title or URL as the pages change!

However, I hit upon a solution: in each step, my script searches for a particular element by name or id. So, I realized that waiting for that element to be appear was the most effective strategy. This lead me to code like this:

    # we have to wait for the page to load
    WebDriverWait(driver, 10).until(lambda driver : driver.find_element_by_name("login"))

(You might also notice that I altered my code to use "except:" rather than "finally:", since I don't want my script to always exit, whether or not the page load completes in time).

Use the Source
I also figured out a variation on the try/wait pattern to help me figure out the next step. Often, I wrote a WebDriverWait function that would time out, without succeeding. Eventually, I realized that simply quitting the driver wasn't the most useful way to debug things. Instead, I would use the page_source property to help me see what was really loaded into the browser at that point:

    WebDriverWait(driver, 10).until(lambda driver : driver.find_element_by_name("mainfs"))
    print driver.page_source

IDE Clues
I also found that it was useful to try to figure out how to navigate the web application using the Selenium-IDE plugin. I would record my actions in the IDE, then review the script it generated, to get clues as to how to write my WebDriver script.

I say "clues", because the IDE generates Selenium 1.0 commands, not the 2.0 syntax I needed to use. Generally, it gave me enough information that, with some searching of the API documentation, I could figure out the right equivalents. However, in certain cases, there are capabilities in Selenium 1.0 that have no direct equivalent in the 2.0.
Grandpa by conowithonen
A Problem of Focus
One example: in 1.0, there's a way to do relative navigation between frames. It turned out I needed this, since at one point in the web application I was trying to automate, it leaves the focus inside a subframe. Experimenting with the IDE, I saw that it wriggled out of this problem by executing


The problem is that there is no equivalent relative move between frames in Selenium 2. Eventually, I figured out that I needed to switch the focus to the entire window (by making use of the "current_window_handle" property), which then let me select the particular subframe I was looking for


Don't Give Up!
Once I figured out these patterns of working by experimenting with the IDE, and printing the source whenever I got stuck, I found that automating the web application with Selenium was fairly straightforward. It would be nice if the documentation was a bit fuller. And it would be wonderful if the IDE generated Selenium 2 commands. But I think that my small investment in figuring it all out was worthwhile. So, now I'm looking for more things to try with Selenium. For example, could I use it to try out RESTful testing of APIs or Linked Data? Are there other web-based chores that I could (even partially) automate?

Monday, October 17, 2011

Why I Love Twitter

Recently, someone who doesn't use Twitter asked me to explain why I love that service which describes itself as a "real time information network".
learning by SReinwald
As with most of these kinds of things, it is a little hard for me to explain why I love Twitter so, particularly to someone who has never used it. But maybe I will learn something from trying?
Tweet of Dweams by born1945
Perhaps Twitter is like interesting people from around the world texting and sometimes I text back? But it is more than that. I get most of my news through Twitter and it is like instant messaging with various friends and colleagues and heroes and comedians and ... I used to describe Twitter as being like just the status postings from Facebook. But Facebook has changed so much since then that I'm not sure that description still applies. And the network of people you interact with on Facebook and Twitter are quite different (certainly, the way I use those services).

But, really, Twitter isn't like texting or RSS feeds or instant messaging or Facebooking. Or, actually, it is kind of like all of those things and more and less, all at once. 

Thursday, October 6, 2011

REST APIs and Errors

What's the "right way" to indicate errors in a RESTful API?

Rest Area? by joeshlabotnik

HTTP Response Codes
As with many aspects of RESTful APIs, it turns out that this has already been worked out and standardized as a fundamental part of the web architecture.

If you're building a RESTful API, you should use the HTTP response code to indicate whether or not a call to the webservice succeeded ( If everything is OK, then return “200 OK”. If something goes wrong, return an appropriate response code in the 3xx, 4xx, or 5xx range. Codes in the 3xx range indicate different types of redirect, e.g. 307 Temporary Redirect. Codes in the 4xx range are conditions that a client can (generally) address, e.g. 400 Bad Request. Codes in the 5xx range indicate types of server error, e.g. 500 Internal Server Error.

The advantages of using this standard mechanism and these standard error codes include:
1. Good documentation of how they work, what they mean, what to do in each case (e.g.
2. There are quite a few existing implementations of this scheme on both the client side (e.g. web browsers) and on the server side (e.g. web servers and web frameworks)

Error Documents
If a REST API returns a response code in one of the error ranges (3xx, 4xx, 5xx), it should be a signal to the client that any response metadata and payload should not be interpreted as a response to the request, but as an explanation of the error. 

I haven't been able to identify any standard for the details of error responses, probably because the details of errors for different APIs can vary so widely. However, I like Amazon’s S3 REST webservice error document schema:

An example is
<?xml version="1.0" encoding="UTF-8"?>
  <Message>The resource you requested does not exist</Message>

The schema is that each <Error> document contains a <Code> (a unique id for the error) and a <Message> (an English-language description of the error). Optionally, <Error> documents can contain additional XML structures that are specific to each kind of error.