Wednesday, November 17, 2010

IPTC and Rights Expression Languages

IPTC has been looking at how to express rights for news content. At the Rome IPTC meeting, I presented the work to date (slides are on slideshare). And Daniel Pähler presented ODRL, the Open Digital Rights Language.

The News Industry Need for Machine Readable Rights
Various news publishers have identified the ability to be able to express rights in a machine readable way as being a priority. In part, this reflects the fundamental changes that have been transforming the news industry. Once, an agency such as the Associated Press distributed content to editors at newspapers and broadcasters, who would select which items they would use. In the process of this selection, they would be able to read any editors' notes, which could include any restrictions that needed to be observed. However, increasingly, news outlets are fully automated, with very little - if any - editorial oversight of what is published. Amongst other things, this drives the need for the expression of rights and restrictions in a way that can be evaluated automatically. This automation would allow the editorial process to be more efficient. In general, an editor still needs to exercise their judgement as to whether a particular restriction applies in a particular context. But, automatic evaluation of rights and restrictions can identify the items that need those decisions, rather than having editors inspect every single item. (This exercise of editorial judgement means that these systems are not like DRM, in which particular actions are forbidden and typically enforced by the devices involved).

IPTC and Rights Expression
IPTC reviewed how the various news formats that it maintains allow for the expression of rights. In every case, there are currently semi-structured ways to express rights using natural language, which wouldn't easily allow for the fully machine-readable rights expressions that member companies need to express. On the other hand, the IPTC has consistently decided that it didn't want to develop a machine-readable rights expression itself - members of the IPTC have felt qualified to develop news formats, whereas legal matters are a different domain.

After reviewing several candidates, we felt that ODRL v2 was the best fit for an existing (though not quite yet complete) rights expression language. In particular, it offers the ability to create an industry-specific rights vocabulary that can be "plugged in" to the ODRL framework. The IPTC has been working with ACAP on developing this vocabulary and with the ODRL group to help refine the ODRL v2 framework itself.

Daniel's slides can be downloaded from the ODRL Wiki.  They give a nice introduction to the ODRL effort and the ODRL v2 approach.

Questions on Rights and ODRL
Daniel and I were asked several questions about rights for news in general and ODRL in particular. For example, there was some discussion about whether rights are really only applicable to photographs (the consensus was no - every media type has increasing amounts of rights being applied). There were questions about how to apply rights to the parts of an item (such as the frames within a video). The partMeta structure of NewsML-G2 handles this nicely (assuming we were to include ODRL within the partMeta structure, which I think we would). There were also questions about what industries are using ODRL today. Chiefly, the mobile phone industry uses ODRL v1 as part of the OMA DRM system ( Daniel explained that there are several academic projects that are using or working on ODRL v2 (Daniel himself is from the University of Koblenz).

Get Involved
You can find out more about the ACAP ODRL profile - and even participate in the work - by visiting the a special Wiki page set up by ODRL for ACAP:

Tuesday, November 16, 2010

Adding Foreign Namespace Support to the NITF XML Schema

As previously discussed on this blog, NITF has very limited support for foreign namespaces.  I've been experimenting with ways to remedy this and presented the results at the most recent face-to-face IPTC meeting (in Rome).  I have posted the NITF slides on slideshare.

During the meeting, it was decided to break the NITF 4.0 effort into two:

  • NITF 3.6 release, which would directly address the addition of foreign namespaces and could be approved as early as January 2011
  • NITF 4.0 which would add support for G2 features (such as qcodes) and will likely require significantly more work to develop and approve
It was decided to take this approach, since foreign namespace support addresses pressing needs (in fact, many people do not realize that it isn't legal already to mix and match NITF with other XML schema).  It was also felt that this change is relatively minor and so doesn't merit a major version number change (not sure I agree with that, but ...).

Therefore, I have now created an NITF 3.6 set of schema files, fixing various bugs in the previous experimental schema and adding expansion points that were missing.  So, compared with NITF 3.5, the experimental schema

  • Added any attributes in globalNITFAttributes and commonNITFAttributes
  • Added any element into head, body, docdata, body.head, block, enriched text, after body, media, body.end

I've also versioned the ruby files to 3.6 and altered the comments to discuss NITF 3.6, rather than NITF 3.5. Finally, I've created an NITF instance that exercises the various foreign namespace capabilities.

These files are also available via the NITF 3.6 directory on the IPTC website.

Comments?  Questions?  Critiques?

Monday, November 15, 2010

IPTC and Semantic Web Technologies - Linked Data, Metadata and Ontology

At the IPTC's most recent face-to-face meeting in Rome, we reviewed our explorations of semantic web technologies for news. The news standards body has been looking at three major areas:

Linked Data
We discussed our work to turn IPTC's subject codes into Linked Data using SKOS concepts and Dublin Core properties. Michael Steidl (Managing Director of the IPTC) was planning to demo the IPTC Linked Data ... but, sadly, Internet access was not working in the hotel! He was, however, able to discuss the proposed collaboration between the IPTC and MINDS on Linked Data for news.

Linking and Mapping

Much of the discussion about IPTC's Linked Data work turned on the difficulties of mapping. In addition to representing the IPTC subject codes in RDF/XML and RDF/Turtle, there was some work done to map from the 17 top level IPTC terms to dbpedia concepts. We quickly figured out that these top level terms are chiefly umbrella terms and so don't map very well to individual dbpedia concepts. The meeting felt that it would be good to map the second level terms, but the problem is that this is quite a lot of work and - as usual - it isn't clear who will do it! We then explored some of the challenges of creating and maintaining the links in Linked Data - that is where a lot of the value, but also much of the investment, lies.

My slides about IPTC's Linked Data work are available on slideshare:

Metadata in HTML - rNews and hNews
Many news providers have created feeds to supply news using IPTC formats such as NITF and NewsML-G2. However, there are an increasing number of consumers of news who only want to work with "pure" web technologies, i.e. HTML rather than XML. So, the IPTC has been looking at the two major paths to represent metadata in HTML - microformats and RDFa.

I discussed hNews - the microformat for news that was adopted by the community in late 2009 - which builds upon hAtom by adding a few news-specific fields (such as Source and Dateline). As well as explaining how to add microformats to your HTML templates, I provided some statistics that the Associated Press has gathered on adoption. (As of October 2010, we know of about 1,200 sites using hNews, predominantly in North America). See my Prezi on hNews at for more.

Evan Sandhaus (Semantic Technologist at The New York Times) described rNews - a proposal for an RDFa vocabulary for news. As the names imply, rNews and hNews are similar in intent (news-specific metadata in HTML) but somewhat different in approach. Whereas hNews went through the microformats process, an RDFa vocabulary can be created by anyone. Evan has created an initial rNews draft based somewhat on the NewsML-G2, NITF and hNews models but it is clearly heavily influenced by the needs of the New York Times.

Members of the IPTC's Semantic Web Yahoo! Group can view Evan's rNews draft and are encouraged to discuss it in that email group. At the Rome face-to-face meeting there was quite a lot of interest, but also several issues raised about the details of the first draft. The meeting generally agreed to continue looking at both hNews and rNews, with a view to making a recommendation on both in 2011.

The benefit of getting rNews and hNews adopted by the IPTC is that greater industry support translates into less work for toolmakers: if many news providers support hNews and/or rNews - and do so in very similar ways - then it is easier to build parsers and tools to extract metadata from HTML.

News Ontology
Benoît Sergent of the European Broadcasting Union discussed the work that he and his colleague Jean-Pierre Evain have been doing to create a news ontology, based upon the NewsML-G2 news model. Benoît described how EBU would like to combine the video content that it produces with content from its member organizations and other third parties. If they can represent this information using a flexible, universal model (the news ontology) they could use off-the-shelf tools (such as a triple store) to query, manipulate and recombine that content.

In many ways, this is the most fundamental piece of the semantic web work that the IPTC is undertaking. It is also the least accessible, for many. Members of the IPTC SemWeb group can view a draft of the news ontology and can comment in that email list.