Wednesday, January 25, 2017

Machine Learning LinkLog #1

This is the first in an occasional series of LinkLog posts. I list links to interesting articles, video and audio items on Machine Learning and allied topics. I group them into three broad categories:

  • Introductory Non Technical - Aimed at the general reader or, perhaps, technical manager who wants to learn about Machine Learning, but is not aiming to be a practioner
  • Introductory Technical - Aimed at someone who is comfortable with programming and technology, but wishes to learn how to work with Machine Learning tools or techniques
  • In Depth Technical - Aimed at someone who is comfortable with the fundamentals of Machine Learning technology, but wants to learn more about a particular aspect or wants to master "day two" problems.

I also intend to indicate whether the item is (mainly) a slidedeck, a video, a single article or a series of items. My goal is to avoid the trope of using pictures of robots to illustrate machine learning topics. However, I apologize in advance if I fail (and you'll certainly find plenty of robot pictures in the items I link to).

Introductory Non Technical

What is Machine Learning?

Short, two-and-a-half minute explainer video from Oxford Sparks ("the amazing stories of science taking place at the University of Oxford"). Accurate and accessible though potentially misleading as it implies that people aren't needed at all. (Note that the doesn't say this directly but you have to listen carefully to pick up that nuance). Avoids the use of unnecessary jargon.

Hype vs. Reality: The AI Explainer

Twenty eight slide explainer deck from Luminary Labs. Covers the different aspects of AI that are currently en vogue. Attempts to predict what is likely to actually succeed and what it probably just hype or misunderstanding.

Top 10 Hot Artificial Intelligence (AI) Technologies

An overview of the findings from a Forrester Research report about Artificial Intelligence in 2017.

Kristen Stewart co-wrote a paper on machine learning

The actress and director co-authored a paper on 'style transfers'. This is a neural-network technique to blend the content of one photo with the style of another. (It is popular with apps such as Prisma). Stewart and her team used the technique in her directoral debut "Come Swim" to create dream-like sequences in the film. The paper describes tricks she used to better control the style transfer effects. This article links in depth technical topics (neural networks and visual processing) with consumer products (feature films).

Introductory Technical

Machine Learning is Fun

A series of articles introducing various technical aspects of Machine Learning. Available in several languages.

A Flask API for serving scikit-learn models

Assumes a lot of scikit knowledge, but a good overview of how to create an API using Flask on the front end and scikit on the backend. The API is not truly RESTful (the endpoints are verbs not nouns, for example). However, it is still a useful introduction and contains the full Python code.

Hitchhiker's Guide to Data Science, Machine Learning, R, Python

A "best of" collection of links, relating to Machine Learning and Data Science, with a particular emphasis on Python and R.

In Depth Technical

Google's 43 Practical Rules of Machine Learning in Industry
Martin Zinkevich compiled 43 rules "intended to help those with a basic knowledge of machine learning get the benefit of best practices in machine learning from around Google." Here are three(!) different presentations of those rules.

Predicting with confidence: the best machine learning idea you never heard of

An article advocating the use of "conformal prediction" a technique for calculating conformance intervals for your machine learning predictions, no matter what the forecast technique or dataset you're working with.

Wednesday, January 11, 2017

Developing the Digital Marketplace for Copyrighted Works

I recently spoke at "Developing the Digital Marketplace for Copyrighted Works", organized by the Commerce Department’s Internet Policy Taskforce. The goal of the public meeting was to "facilitate constructive, cross-industry dialogue among stakeholders about ways to promote a more robust and collaborative digital marketplace for copyrighted works". My impression of the roughly 80 attendees was of a mix of publishers and lawyers - who tended to be pretty conservatively dressed - and music people - who tended to be in more sparkly outfits. Most of the discussion revolved around music, photo and video, but it turned out that a lot of the problems and potential solutions were quite similar across industries and media types. I've linked to the video of the event at the end of this post.

I've been involved in rights work at the Associated Press, including adding rights and pricing metadata to AP's Image API. I lead the IPTC's Rights Working Group. And I'm working within the W3C's POE group to turn ODRL into an official standard.

I spoke on the first panel with the topic of "Unique Identifiers and Metadata". I was teased a bit about "fake news" (this was in Alexandria. VA on December 9th 2016, so close both in time and place to the U.S. Presidential Election). Amongst other things, I spoke about how apparently simple things - "let's agree on identifiers for photos" - turned out to be quite complicated - since a text item, a photo or a video is not really a single, simple atomic thing, but more like a molecule of information. (You can watch the entire panel - which turned out to be quite lively, despite the early hour - in the video linked below).

I also moderated a round table, with the topic "What are the practical steps to adopting standards for identifying and controlling copyrighted works?". As everyone at my table introduced themselves, they mostly said "oh, I'm just here to learn, I don't have much to contribute" but, in fact, we had a very vigorous discussion, which covered *lots* of topics! I summarized them during the "Plenary" session (again, I've linked to the video below). We talked about three areas. First, was why we need standards - creators and rights holders should be compensated for their work, which could be financial compensation or it could be getting distribution and recognition. Second, we talked about the big barriers - technology, the culture of the Internet and human nature itself. Finally, we talked about concrete steps which the government and other organizations could take to get standards developed and adopted. (For the details, you'll need to watch the video. My segment runs from about 39:30 to about 45:55 but I recommend watching everyone's summary of their individual breakout sessions)

If you're interested in rights, then you should consider coming to London for the week of May 15th. That's because the BBC is hosting a Rights Day on May 15th, the IPTC will be holding its Spring Meeting (including discussing RightsML) on May 16th and 17th and W3C will hold its face-to-face meeting on May 18th and 19th. If you're interested in any or all, contact me and I will put you in touch with the right rights people.

Opening Remarks and Panel Session 1: Unique Identifiers and Metadata
Panel Session 2: Registries and Rights Expression Languages

Panel Session 3: Digital Marketplaces

Plenary Session