I've been thinking about query languages recently.
As part of my experimentation with
MarkLogic, I've been playing with
XQUERY, the query language for XML. I find it to be able to do everything I need it to do. I haven't invested enough time and effort to fully grok it (in the way that I did with
XSLT last year). However, the fact that XQUERY and XSLT share a common addressing model (
XPATH) is a huge win for me. At least I can start *somewhere* with XQUERY. But I can't find an XQUERY equivalent to
dpawson's excellent XSL Frequently Asked Questions. I suppose I need to go Old Skool and actually read one of the
two books on XQUERY that I bought years ago.
Another query language that I have looked at but have done little more than think about is
SPARQL, the query language for RDF. Just like XQUERY (and clearly modeled on the query language category killer SQL) you essentially compose a query document that you post to a service and get back a results document in return (XQUERY lets you construct results in XML, SPARQL lets you construct results as RDF or as variable bindings, as I understand it). There are increasing opportunities to play with SPARQL endpoints, so I suspect that I will have the opportunity to actually work with SPARQL at some point.
However, it is difficult to discuss search without mentioning Google. One of the nice things about Google is that you can compose your searches as
URI's. And you can supply additional fields to modify your search parameters using standard URI conventions. I was looking for something that might work in this appealingly simple way but give you access to the full power of searching metadata fields. I didn't want to have to invent my own query syntax.
I looked first at A9's
Open Search. This is an attempt to allow search engines to publish a profile of search syntax and formats that they support. Initially very focused on keyword search, there is now a set of
draft extensions that are mainly aimed at extended the *results* of a query with additional namespaced information. Although there is some limited support to allow a search to indicate
which parameters it supports. So, the Open Search approach seems quite nice, but it doesn't quite hit the sweet spot I was looking for of being able to mix free text search with fielded search, all wrapped up in a RESTful interface.
Then, I came across
SRU (Search/Retrieval via URL) and specifically
CQL (now the "Contextual Query Language", but
formerly known as the "Common Query Language"). It seems that SRU and CQL grew out of efforts to create a fully "of the web" successor to the pre-web
Z39.50 library search and retrieve protocol. The SRU part makes it all RESTful (there's an SRW protocol for those of you who prefer to be SOAPy). And the CQL syntax lets you specify fielded search using a nice, extensible mechanism. For the cherry on top, you can return different types of XML (my favourite meta language).
The only thing I am struggling to find is much evidence of widespread adoption or even open source implementation. The Library of Congress are hosting the SRU/CQL pages, so I assume they have adopted it, at least in part. Some other library type organizations (such as
COPAC) have SRU/SRW services, alongside their Z39.50 interfaces. And there is some evidence of attempts to somehow
bring OpenSearch and SRU/CQL together, apparently by Nature.
Anyone suggestions as to where else I should look?