Monday, May 1, 2017

Machine Learning LinkLog #2

I knew that my linklog posts would be occasional. But I wasn't expecting to go quite so long between installments as this... But you know what they say: There is no "AI" in "FAILURE". Oh, wait...

Anyway, here is the third in an occasional series of LinkLog posts. I list links to interesting articles, video and audio items on Machine Learning and allied topics. Just like in my first Machine Learning LinkLog, and the one I did on Serverless, I group the items into three broad categories:

  • Introductory Non Technical - Aimed at the general reader or, perhaps, technical manager who wants to learn about Machine Learning, but is not aiming to be a practitioner
  • Introductory Technical - Aimed at someone who is comfortable with programming and technology, but wishes to learn how to work with Machine Learning tools or techniques
  • In Depth Technical - Aimed at someone who is comfortable with the fundamentals of Machine Learning technology, but wants to learn more about a particular aspect or wants to master "day two" problems.
I also indicate whether the item is (mainly) a slidedeck, a video, a single article or a series of items.

Introductory Non Technical

Mix and match analytics: data, metadata, and machine learning for the win

From ZDNet, explores the use case of YouTube video recommendations to illustrate a practical application of machine learning. Along the way, this article touches on video fingerprinting (via hashes) and the importance of descriptive metadata (a topic dear to my own heart).

Gary Marcus on Advancements in Machine Learning

Via MIT Technology Review, an accessible 18 minute video which gives an overview of the current state and challenges for "Deep Learning".

Machine-learning boffins 'summon demons' in AI to find exploitable bugs

A Register news item about a team of researchers using a semi-automated technique called “steered fuzzing” to comb through machine learning programs for bugs. Failures such as mispredictions, or false outcomes, lead to detectable crashes in the program. These types of failures can potentially be exploited as security holes.

6 areas of AI and machine learning to watch closely

Quick overview article of interesting areas in AI/ML - Reinforcement learning (RL), Generative models, Networks with memory, Learning from less data and building smaller models, Hardware for training and inference and Simulation environments.

Introductory Technical


Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
RAKE (Rapid Automatic Keyword Extraction), is a domain independent keyword extraction algorithm which determines key phrases by analyzing the frequency of word appearance and its co-occurance with other words in the text.
An entire book covering machine learning. The online version is free, although you can donate if you like or buy a print copy. Source for the exercises are available via github (in Python).

Deep Learning Papers Reading Roadmap

A nice, regularly-updated, set of Deep Learning papers, with a particular emphasis on speech and image recognition.

In Depth Technical

Generating Politically-Relevant Event Data

Using convolutional nets (deep learning) for event classification, rather than traditional dictionary-based approaches. Shows good results for both English and Arabic and claims that the technique would work well for ontologies in other domains.

Transparent predictions

Since algorithms are being used more-and-more to make predictions which guide areas such as public policy and policing, should the algorithms be "transparent"? Detailed discussion of what this might mean and why transparency is important. And under circumstances it might not be desirable.

See also which uses data from LinkedIn to predict white collar crime based on people's faces. (Which looks to me to be a parody, however there are quite a few efforts to use Machine Learning techniques to identify potential criminals based on their faces, posture, etc.)

What are Dimentionality Reduction Techniques?

Dimensionality reduction is the process of reducing the number of random variables in a machine learning data set. It can be divided into feature selection and feature extraction.In many problems, the measured data vectors are high-dimensional but we can try to convert into a smaller number of variables to deal with. Outlines several techniques which can be tried. (Links to source code in R).

Previous LinkLogs