smyles blog: 2016

Tuesday, November 22, 2016

The View From Berlin - IPTC AGM 2016

I Chair the Board of Directors of IPTC, a consortium of news agencies, publishers and system vendors, which develops and maintains technical standards for news, including NewsML-G2, rNews and News-in-JSON. I work with the Board to broaden adoption of IPTC standards, to maximize information sharing between members and to organize successful face-to-face meetings.

We hold face-to-face meetings in several locations throughout the year, although, most of the detailed work of the IPTC is now conducted via teleconferences and email discussions. Our Annual General Meeting for 2016 was held in Berlin in October. As well as being the time for formal votes and elections, the AGM is a chance for the IPTC to look back over the last year and to look ahead about what is in store. What follows are my prepared remarks at the Berlin AGM.

Good morning from #IPTC Chairman @smyles, at the #IPTC Autumn Meeting 2016, #dpa in Berlin! pic.twitter.com/8u1KvBrfEu
— IPTC (@IPTC) October 24, 2016

The Only Constant

It is clear that the news industry is experiencing a great degree of change. The business side of news continues to be under pressure. And, in no small part, this is because the technology involved in the creation and distribution of news continues to rapidly evolve.

However, in many ways, this is a golden age of journalism. The demand for news and information has never been higher. The immediate and widespread distribution of news has never been easier.

The IPTC has been around for 51 years. I've been a delegate to the IPTC since 2000 and Chairman of the Board since June 2014. I'd like to give my perspective on the changes going on within the news industry and how IPTC has and will respond.

We're On a Mission

IPTC is rooted in - and foundational to - the news industry. Our open source standards for news technology enable the operations of hundreds of news and media organizations, large and small. IPTC standards are instrumental in the software used to create, edit, archive and distribute news and information around the world.

We are starting to evolve the scope of our work beyond standards - such as via the EXTRA project to build an open source rules-based classification engine. Much of what we do is relevant to not only news agencies and publishers, but also to photographers, videographers, academics and archivists. By bringing together these diverse groups, we can not only create powerful, efficient standards and technologies, but also learn from each other about what works and what does not.

Ch-ch-changes

We've introduced quite a bit of change within the IPTC since I've become Chairman and that has continued over the last year.

What's Going On?

We're working to improve our existing family of standards by

continuing to improve documentation - to make it easier to get going with a standard and simpler to grasp the nuances when you want to expand your implementation

making our standards more coherent and consistent - as many organizations need to use a combination

Great discussion about #IPTC alliance, collaboration with International Image Interoperability Framework @IIIFramework re: #Photo #Metadata pic.twitter.com/EFWdOf7aC6
— IPTC (@IPTC) October 24, 2016

We're extending the reach of the IPTC, both by working with other organizations (including PRISM, IIIF, WAN-IFRA and W3C). But also by engaging in new types of work such as EXTRA and the Video Metadata Hub, which are not traditional standards but are open source projects for the benefit of the community we serve.

Since I've become Chair, we've renewed our efforts to communicate the great work that we do. You can see a big uptick in our engagement via Twitter and LinkedIn, as well as by refreshing the design of our the IPTC website. Plus we're doing a lot more work "out in the open" on Github.

We're continuing to streamline the operations of the IPTC. We've simplified our processes to better reflect the ways we actually operate these days. For example we have dramatically reduced the number of formal votes we take. But we still have sufficient process in place to ensure that the interests of all members are protected. For 2017, we have decided to have two-plus-one face-to-face meetings, rather than our usual three-plus-one. We will hold two full face-to-face meetings (one in London, the other in Barcelona), plus our one day Photo Metadata conference in association with the CEPIC Conference in Berlin. This will allow us to intensify our work on the meetings, with more ambitious and compelling topics and speakers.

PM session: standards makers PB Core, Media Institute; product vendors inVid, Extensis, Canto, Dextro, Mainstream Data on metadata workflows pic.twitter.com/9EJJQmjDwp
— IPTC (@IPTC) October 25, 2016

Do Better

As I said, we've been changing our processes, particularly for the face-to-face meetings. But what else could we do to simplify our processes whilst at the same time ensuring that there is a balance between the interests of all members? Are there ways for the IPTC to deliver more value to the membership? How do we continue to balance our policy of consensus-driven decision-making with the need to be more flexible and nimble?

IPTC is a membership-driven organization. Membership fees represent the vast majority of the revenue for our organization. As the news industry as a whole continues to feel pressure - including downsizing, mergers and, unfortunately some members going out of business - the IPTC is experiencing downward pressure on its own revenue. So, we are working on ways to reach new members, whilst at the same time ensuring that existing members continue to derive value. We're also open to exploring new ways of generating revenue which fit with our mission - let us know your ideas!

What new areas should the IPTC focus on? Many journalists are experimenting with an array of technologies - Augmented Reality, Virtual Reality, 360 degree photos, drones and bots, to name but a few. And let's not forget about the "Cambrian Explosion" of technologies related to news and metadata on the Web, including AMP, AppleNews, Instant Articles, rNews, Schema.org and OpenGraph. How can IPTC help - negotiating standards? Developing best practices? Navigating the ethics of these technologies?

Happy

If you're happy with the IPTC, then please tell others.

If you're not happy, then please tell me!

I Want to Thank You

Without you, the members of IPTC, literally none of this is possible. So, I'd like to take a moment to thank everyone involved in the organization, particularly everyone involved in all of the detailed work of the IPTC. And I'd like to acknowledge and thank Andreas Gebhard, who is stepping down from the Board, and Johan Lindgren who has been voted on.

Finally, I'd like to extend a special thanks to Michael Steidl, Managing Director of the IPTC, who is personally involved in almost every aspect of what we do.

2017

No doubt, next year will bring us many new and, often, unexpected challenges. I look forward to tackling with all of you, the IPTC.

Thursday, October 6, 2016

Developers Needed For IPTC's EXTRA Rules-based Classification Engine

Over the last several months, I've been working within the IPTC - along with a number of other news organizations - on "EXTRA" (shorthand for EXTraction Rules Apparatus), an open-source source rules based classification engine for news content. I'm thrilled because this week we reached a significant milestone: we started the formal process of looking for developers to implement the EXTRA engine.

“Extra” by Jeremy Brooks https://flic.kr/p/4aKH3c

The IPTC was awarded a grant of €50,000 from Google's Digital News Initiative Innovation Fund to build and freely distribute the initial version of EXTRA. As part of the IPTC, we are working with several news providers to supply sets of news documents, and with linguists to write rules to classify the documents. We've been working on defining the technical requirements and now we’re looking for software developers to design, develop, document and test EXTRA.

Below is the formal announcement. If you know anyone who might be interested, let them know. And if you are interested, please let us know!

Developers Needed For IPTC's EXTRA Rules-based Classification Engine

IPTC https://iptc.org/ is looking for software developers to design, develop, document and test EXTRA https://iptc.github.io/extra/, an open source rules-based classification engine for news. First preference will be given to applications received by 21st October 2016, and review will continue until the positions are filled. Applyhere.

"Classification" means assigning one or more categories to the text of a news document. Rules based classifiers use a set of Boolean rules, rather than machine-learning or statistical techniques, to determine which categories to apply.

EXTRA is the EXTraction Rules Apparatus, a multilingual open-source platform for rules-based classification of news content. IPTC was awarded a grant of €50,000 from the first round of Google’s Digital News Initiative Innovation Fund https://www.digitalnewsinitiative.com/ to build and freely distribute the initial version of EXTRA. DNI granted IPTC €50,000 for the entire project.

We are working with news providers to supply sets of news documents and with linguists to write rules to classify the documents. IPTC is looking for qualified developers to create the rules engine to accurately and efficiently categorize the documents using the rules. mandatory and preferred requirements.

Please consult this page for more information and to let us know if you’re interested in being considered.

Monday, September 26, 2016

An ast "Hello World": Getting Started with Python's Abstract Syntax Trees

I've been working on a Python library which - for a number of reasons - needs to dynamically alter itself. Essentially, I want it to parse a document and to generate some code based on that parsed file.

ast

It turns out that Python's ast module lets me do exactly what I need. I came across some quite useful supplementary documentation on ast. But, to get started, I needed something simpler that those advanced examples. I therefore wrote a "Hello world!" program using ast. Here it is, in case you were looking for that, too.

Hello world!

Since I've become test infected, I wanted to structure my "Hello world!" ast program using unit tests.

So, to start, I tracked down a suitable "Hello world!" unit test in Python.

Hello world! in ast

Then I rewrote the Greeter.py class to use ast. My version constructs an abstract syntax tree for an Assignment. Specifically, it assigns the string "Hello world!" to the variable "m". The code then fixes the locations, compiles the code and executes it dynamically.

Obviously, the above code is a lot more work than simply assigning the string value to the variable directly. But it meant I now had the world's simplest ast program.

Nothing

Armed with this most basic of unit tests, I was then in a position to work out how to support various other types of code in ast.

For example, here's a snippet of ast code which uses ast to generate an abstract syntax tree to assign an empty type to a variable named "nothing". In other words, equivalent to nothing = ()

Invoking Methods

One of the hardest things for me to figure out was how to invoke a method of a class.

First, I worked out to call a function - one not attached to an instance of a class. But to call a method of a class, I needed to understand a bit more about how Python itself is implemented.

Calling Functions

Here's some Python ast code to call a function _foo() and assign the returned value to a variable called "result", i.e. equivalent to result = foo() And here's a variant where you pass in a value, i.e. equivalent to result = bar("some value")

In Python, Methods are Attributes of Classes

Having figured out how to call functions and pass parameters to them, I reckoned that calling a method on a class would be similar.
And it sort of was - I still needed to use ast.Call to invoke the method. But it took me quite a while to figure out how to tell it which class method to call. For example, if I wanted to call

result = self._baz(theResult)

should I pass in a function name of "self._baz"? (I tried that - it didn't work). Eventually, I worked out that self._baz is an attribute of the instance object referred to as "self". In Python, instance objects have two kinds of valid attribute names, data attributes and methods. Which meant that the code to call one method of an instance from another method looks like this:
I had never thought that profoundly about how Python is really implemented behind the scenes. Although many of the Python design decisions are actually quite well documented.

An ast Short Cut

In the process of working out how to invoke instance object methods, I came up with a general-purpose shortcut. It turns out that - since Python 2.6 - ast has a very handy helper method called "ast.parse()". This - in combination with ast.dump() - will let you very quickly figure out the correct ast pattern to use for a given bit of Python code. For example, here's how to figure out how to invoke an instance object method

005-Syntax by vicdunk
https://flic.kr/p/dUdQkm

Hopefully, that will be enough to get you going on your own Python ast adventures!

Monday, September 19, 2016

Falcor and GraphQL: Querying JSON APIs (Part Five of Three)

What's the best way to query a REST API which returns JSON? I look at two popular libraries - Netflix's Falcor and Facebook's GraphQL - which aim to overcome problems with API performance and "chattiness".

Querying JSON

XML (and related standards such as XSLT and XQuery) benefit from the power of XPath for selecting and querying XML. However, JSON has no direct equivalent to XPath. (Although there are a lot of projects which have named them selves [jJ][Pp]ath!)

I still like the approach taken by JSONiq- it is essentially XQuery for JSON. However, in this post, I want to talk about two libraries - Falcor and GraphQL - which address the problem in a somewhat different way: how to get just the JSON you want from an API?

2009APR101606 by Peter Renshaw
https://flic.kr/p/6dYAsw

Trade Offs: Speed and Complexity

When you write a client for a typical REST API, you have to confront two basic problems: performance and complexity. On the one hand, if a REST API contains more data than you need, then you're paying a penalty for every unnecessary byte (being transferred over the network and parsed by your code). On the other hand, if a given API response doesn't have everything you need, then you will need to make follow-up calls, which adds complexity and, of course, more latency as you choreograph the back-and-forth.

The designer of the REST API should try to anticipate likely uses, so that they can provide just the right information, in the right ways. And, as I've previously recommended, it is a good idea to build in support for full or partial API responses. However, part of what is exciting about APIs is that they unlock innovation. So, if your API is a success, you will - by definition - have hard-to-anticipate uses of your design.

Sit! by Craig Sunter
https://flic.kr/p/rZ2tyS

Sitting in the Middle

Rather than rely on the REST API perfectly fitting your needs (or supporting a powerful query language) why not have an adapter which sits in-between your client and the REST API? Both Netflix's Falcor and Facebook's GraphQL take this approach: they are each implemented as servers which you configure to turn the REST API you have to work with into one that you want to work with. They differ somewhat in their philosophy and power, however.

Falcor

Falcor - All of the Data in One Giant Model

Netflix has open-sourced their Falcor library, which they use to power their UIs. At the time of writing, it is still in "Developer Preview", however, many people outside of Netflix are using Falcor. You can try out the demo Falcor application or read the Falcor documentation for more details.

Falcor adds some capabilities to the standard JSON model - such as "virtual JSON models" and a "JSON Graph" - to make it easier to cache data on the client side. Using Falcor, you can

eliminate the need for multiple HTTP requests to get all the data you need
cache the data locally for better performance
deal with data using graphs, which are more flexible than the standard tree-model used in JSON
adapt JSON or non-JSON APIs into a JSON model customized for your application

Falcor is a server-side Javascript library run within a nodejs server. You construct a Falcor data model and define how each component maps to the actual APIs you need to use via "paths". Your application then interacts with the Falcor data model you've defined, while the Falcor server takes care of interacting with the APIs to get you the data you need, including handling caching for greater performance - particularly when you have multiple instances of your application querying a single data model.

A nice overview of working with Falcor is provided by Auth0. And you can find a lot more documentation on the Falcor website.

GraphQL

GraphQL - a Schema and Resolve Functions

Facebook has open-sourced their GraphQL library, which they developed to power their mobile and web apps. At the time of writing, Facebook has released a working draft of the GraphQL spec and a reference implementation in Javascript. They have also created an implementation you can actually download and use. Various people have started to build GraphQL tools and implementations, including GraphQL support in Python (one of my favourite languages). Check out the GraphQL documentation for more details.

As a GraphQL client, you send the server a query, which defines what data you want back. For example

{
user(id: "1") {
name
}
}

Which says "give me back the name of the user who has an id=1".

On the GraphQL server side, you need to configure the schema and the resolve functions. The schema defines the data model which may be fetched from the server. The resolve functions map the fields in the schema into the backend services. A GraphQL resolve function therefore contains whatever code is necessary to fetch and transform data from a backend service - such as a REST API, a MongoDB or a SQL RDBMS - into the form promised by the schema.

There's a nice overview of working with GraphQL on RisingStack. And you can find Facebook's full documentation on GraphQL.

"Choice" by Jeremy Brooks
https://flic.kr/p/nyPkd2

Which One Should You Choose?

Falcor is somewhat simpler to learn than GraphQL. In part, this is because GraphQL is more powerful - in particular it has a much a more sophisticated query capability. Both libraries have been implemented in Javascript, but only GraphQL is designed to be implemented in other languages, too.

Finally, it is worth considering whether you want to adopt either one at all: the REST architecture (when implemented correctly) has tremendous support for caching and scalability. So, rather than abandon a REST API altogether, consider whether you have the option of instead tuning it to perform better (tip: look at the granularity of the resources you've defined).

Designing JSON

This is part of my occasional series on designing and working with JSON:

Part 1: An approach to designing JSON schema
Part 2: JSON tools and standards
Part 3: Lessons learnt from the JSON schema I've worked on
Part 4: Heroku's HTTP API design guide

This post - the fifth in the trilogy - picks up on a topic I discussed in Part 3 - Lessons Learnt - how to select and query the JSON you get back from an API.

Tuesday, July 26, 2016

Making Progress on Rights - W3C Permissions Obligations and Expressions First Public Working Drafts

I've been working within W3C's Permissions & Obligations Expression (POE) Working Group as an Invited Expert. We have just issued our First Public Working Drafts:

The ODRL Information Model
The ODRL Vocabulary & Expression.

"one" by Andre Chinn
https://flic.kr/p/5pGcyx

The W3C POE WG aims to create recommendations for permissions, obligations and licensing statements for digital content. The WG is using the W3C ODRL Community Group specifications as the starting point for its work. These are the same specifications which form the foundation of IPTC's RightsML work.

"poe" by 为民王
https://flic.kr/p/gp2Bc

If you're interested in digital content, then I recommend looking at - and commenting on - the W3C POE drafts. The ODRL Information Model describes the foundational concepts, entities and relationships of ODRL. The ODRL Vocabulary & Expression describes how to encode the ODRL model in XML, JSON and RDF.

"Use in case of emergency" by Katia Sosnowiez
https://flic.kr/p/5MMhFz

The POE WG has also published the Use Case and Requirements Note. I have contributed one of the Use Cases: News Permissions and Restrictions. Again, the Working Group is looking for feedback on - and contributions to - the Use Cases, so that it can derive a detailed set of requirements for the POE work.

SyntaxHighlighter