Tuesday, January 12, 2010

CQL - A Web Friendly Query Language with Metadata Support?

I've been thinking about query languages recently.

As part of my experimentation with MarkLogic, I've been playing with XQUERY, the query language for XML. I find it to be able to do everything I need it to do. I haven't invested enough time and effort to fully grok it (in the way that I did with XSLT last year). However, the fact that XQUERY and XSLT share a common addressing model (XPATH) is a huge win for me. At least I can start *somewhere* with XQUERY. But I can't find an XQUERY equivalent to dpawson's excellent XSL Frequently Asked Questions. I suppose I need to go Old Skool and actually read one of the two books on XQUERY that I bought years ago.

Another query language that I have looked at but have done little more than think about is SPARQL, the query language for RDF. Just like XQUERY (and clearly modeled on the query language category killer SQL) you essentially compose a query document that you post to a service and get back a results document in return (XQUERY lets you construct results in XML, SPARQL lets you construct results as RDF or as variable bindings, as I understand it). There are increasing opportunities to play with SPARQL endpoints, so I suspect that I will have the opportunity to actually work with SPARQL at some point.

However, it is difficult to discuss search without mentioning Google. One of the nice things about Google is that you can compose your searches as URI's. And you can supply additional fields to modify your search parameters using standard URI conventions. I was looking for something that might work in this appealingly simple way but give you access to the full power of searching metadata fields. I didn't want to have to invent my own query syntax.

I looked first at A9's Open Search. This is an attempt to allow search engines to publish a profile of search syntax and formats that they support. Initially very focused on keyword search, there is now a set of draft extensions that are mainly aimed at extended the *results* of a query with additional namespaced information. Although there is some limited support to allow a search to indicate which parameters it supports. So, the Open Search approach seems quite nice, but it doesn't quite hit the sweet spot I was looking for of being able to mix free text search with fielded search, all wrapped up in a RESTful interface.

Then, I came across SRU (Search/Retrieval via URL) and specifically CQL (now the "Contextual Query Language", but formerly known as the "Common Query Language"). It seems that SRU and CQL grew out of efforts to create a fully "of the web" successor to the pre-web Z39.50 library search and retrieve protocol. The SRU part makes it all RESTful (there's an SRW protocol for those of you who prefer to be SOAPy). And the CQL syntax lets you specify fielded search using a nice, extensible mechanism. For the cherry on top, you can return different types of XML (my favourite meta language).

The only thing I am struggling to find is much evidence of widespread adoption or even open source implementation. The Library of Congress are hosting the SRU/CQL pages, so I assume they have adopted it, at least in part. Some other library type organizations (such as COPAC) have SRU/SRW services, alongside their Z39.50 interfaces. And there is some evidence of attempts to somehow bring OpenSearch and SRU/CQL together, apparently by Nature.

Anyone suggestions as to where else I should look?

1 comment: