SyntaxHighlighter

SyntaxHighlighter

Friday, May 9, 2014

JSON Design Principles and Lessons Learnt: Handy JSON Tools and Standards (Part Two of Three)

Lessons Learnt from JSON Designs I've Worked On

Over the last couple of years, I've worked on a few JSON schema. For example, IPTC's NINJS (for representing news) and W3C GC ODRL's ODRL in JSON (for representing permissions and restrictions). I've also done some work on JSON internal to AP, for various APIs and search systems.

Along the way, I've learnt some lessons about better or worse ways to design the JSON - both about the way to do it and some JSON "style" tips. I've broken this into three posts:

A Couple of Handy JSON Tools

For basic syntax checking of your JSON documents, JSONLint is invaluable. Alternatives include JSON Formatter and Validator (online) and demjson (Python). Most of the XML tools (such as XML Spy or oXygen) also support JSON, too.

Even though JSON is touted as being a lightweight alternative to XML, equivalents of many of the features of XML are gradually being added to JSON. One that I make extensive use of is JSON Schema. This is an IETF effort. Even though - at the time of writing - JSON Schema is still a draft, it already has decent software support - including online validation and support in many languages. Having a JSON schema for your format is a great way to document how you intend the format to be used and it can help you spot certain kinds of errors. (My blog post Ban Unknown Properties! discusses some of the finer points of JSON validation).

Selecting and Querying JSON

One of the fundamentals of XML (and related standards including XSLT and XQuery) is XPath. So, imagine my excitement when I discovered JSONPath which has the tag line "XPath for JSON". It holds out the promise of language-independent way to specify properties within a given JSON document. Very handy - and there are a couple of language bindings, already. Unfortunately, it only seems to work for fairly simple expressions - it certainly doesn't have the full power of XPath. And it isn't backed by a standards body or a consortium of companies, so the future path (sic) of JSONPath isn't clear to me.

Perhaps more promising is JSONiq a fully fledged query language for JSON, which claims to be "The SQL of NoSQL". In fact, JSONiq is based very much on XQuery. Again, this is not backed by an independent standards body. It has been implemented on top of some XQuery engines (28.io, zorba.io, IBM's Websphere and Pascal). However, notably, the major JSON-native engines are directly supporting it, which means you need to use their proprietary query languages.

And it seems that there is a bit of a Cambrian Explosion going on in this area. Tim Bray recently published his blog post Fat JSON. In part, he illustrates why you need a tool to pick out properties from within a JSON document (basically, some JSON objects contain way too many properties than you need for a particular purpose). He discusses one approach - support Partial Responses in your API. That works if you're the author of the API but more likely you're the client of an API or are dealing with a complete JSON document from MongoDB or Elasticsearch or the like.

He points out several attempts to recreate XPath for JSON, which are similar to JSONPath (none of which I have tried yet, but which are all imaginatively called "[jJ][Pp]ath"):


Not to be outdone, Mr. Bray has knocked together JWalk - some Java source code to very simply pick out properties based on their names alone (i.e. not based on parent names or child property values as you would want from a more full-fat XPath style library). I suspect that this won't be the last attempt to solve this problem.


JSON Standards

As is probably obvious by now, I'm a big fan of standards. Not just because I've helped to create a few (e.g. MDDL, NewsML-G2, hNews, rNews, RightsML, NINJS) but also because - whenever I'm faced with solving a problem - I think "surely someone has done this before me?". I've found that looking at how someone else has attempted to tackle some domain is very instructive. In the best case, you can simply adopt someone else's hard work, along with documentation, working code and a thriving community who will help to quickly bring you up to speed. Of course, not all prior work is great - the compromises required to create a consensus standard are notorious for producing unwieldy solutions. But, even then, it can be instructive to help you understand what you don't want to do.

Whilst developing IPTC's News in JSON (NINJS), for example, we looked at previous efforts - both public and proprietary - to render articles, blog posts, photos and video using JSON properties. We also researched particular areas that are not directly tied to news. For example, when we were figuring out how to represent place metadata, we found it really helpful to examine the different approaches taken by GeoJSON and Geonames, amongst others. (In the end, rather than pick a winner, we decided to add a "pattern property" into NINJS so that providers could select the JSON geometry representation that best fits their needs).

A somewhat different type of JSON-related standard are things like JSON-LD. JSON Linked Data is a way to serialize the RDF data model to and from the JSON format. This W3C Recommendation is an increasingly popular way to structure JSON and is equivalent to the XML and Turtle serializations of RDF. So, if you are fundamentally working with RDF, then you should consider it (however, there are at least some JSON-LD dissenters). If you are not working with the RDF data model, then I would consider whether the additional features / complexity of JSON-LD is going to be a barrier to adoption.

As I will discuss in the third and final post in this series, one goal I prize when designing a JSON schema is that simple examples make sense "intuitively". I want them to look sufficiently appealing to, say, a Ruby developer that she decides to use that schema rather than make one up herself.

JSON Design: A Series

Part one discussed an approach to designing JSON schema. Part three will discuss JSON style.

Monday, May 5, 2014

JSON Design Principles and Lessons Learnt: An Approach to Designing JSON (Part One of Three)

Lessons Learnt from JSON Designs I've Worked On

Over the last couple of years, I've worked on a few JSON schema. For example, IPTC's NINJS (for representing news) and W3C GC ODRL's ODRL in JSON (for representing permissions and restrictions). I've also done some work on JSON internal to AP, for various APIs and search systems.

Along the way, I've learnt some lessons about better or worse ways to design the JSON - both about the way to do it and some JSON "style" tips. I've broken this into three posts:

Automagic JSON?

One way to create a JSON schema is to automatically generate one from an XML Schema. For any given domain, there's probably a decent XML Schema available, so why not take advantage of that and use of the many tools that are available to automatically generate the JSON for you?

In fact, there are quite a few different ways you can translate between XML and JSON, depending on what you're trying to achieve. Therefore, each tool can potentially generate quite different JSON for a given XML document. For a good overview of the different approaches and techniques involved, I recommend this survey of ways to map between XML and JSON. (That PDF is IBM's submission to the W3C Workshop on Data and Services Integration).

If you have a large amount of XML you want to convert into JSON, you may well need to implement your own tool to do the conversion. Not only does this let you control the choices made, it also can give you the opportunity to fix the niggling issues that inevitably arise in your XML as you extended your design in unexpected ways.

However, I recommend that you hand craft the design of your JSON representation, to make it as natural as possible.

A JSON Design Process

What I've found it a good way to design a JSON schema is to follow this simple process:

  • Identify a list of candidate properties - perhaps by reviewing relevant XML schema for inspiration
  • Think of one or two ways to represent each set of related properties in JSON - and research whether anyone else has designed something like it already
  • Construct sample JSON documents for each of the alternatives
  • Prototype some code to see how they work for your intended use
  • Select the best alternative and add it to your schema
  • Write down the examples and your rationale for picking that representation (otherwise you will forget)
  • Repeat

After a while, you'll see some repeating patterns and you'll need to write fewer prototypes to try things out. But I still recommend writing down your rationale...

Trying out the JSON in code is particularly important if you haven't done a lot of JSON work before. It really gives you a feel for the best, most natural way to work with JSON and can help get you out of your XML Mindset (if that's where you're starting from).

JSON Design: A Series

Part two will discuss JSON tools and standards.