Friday, October 5, 2012

Named Entity Recognition

Let's say you would like to automatically identify people, places, organizations or brands within blocks of text. What you need is a system that will perform Named Entity Recognition.

NER is a challenge that has been extensively studied over the last several years. And there are many software packages available that you can use to identify various different types of entities. Here's a list that I recently compiled, mainly open source, mainly English-centric, but in no particular order.

An open source, Java framework for various different types of text processing, including NER.

Open source Java, from the University of Sheffield. Part of GATE.

Python NLTK
Very widely used open source Natural Language Tool Kit for Python.

Commercially supported Java, straight out of Brooklyn. Also, Lingpipe has a nice list of competitors (some of which duplicates what I’ve listed in this post):

Java open source code from CMU.

Stanford Named Entity Recognizer
Java open source code from Stanford.

Illinois Named Entity Tagger
Java open source code from Illinois.

Probabilistic programming with imperatively-defined factor graphs. I don’t know what this means, either. It is written in Scala, though, which is the latest hipster programming language.

Apache Mahout
Java, open source, classification. (I’m not actually certain it does NER, but it looks interesting).

A couple of academic papers I found, that evaluate NER systems:


  1. Great list - thanks!

    I take it that services such as OpenCalais and AlchemyAPI are omitted from this list because they're API-based tools rather than software packages per se?

    1. Glad you like the list! Yes, I was looking for software packages that could be used to develop a service. But, you're right, there are also APIs including

      Thomson Reuters' Calais
      AP's Tagging API

      Anyone know of any others?