Over the last year, I've been leading a project within the IPTC to build an open source rules-based classification engine for news. Dubbed "EXTRA" (shorthand for EXTraction Rules Apparatus), the software will be freely-available under an MIT license. The work is being funded by a grant of €50,000 from Google's Digital News Initiative Innovation Fund.
We have drawn up the technical requirements, hired Infalia PC to partner with us on building the software and selected Elasticsearch's percolator as the fundamental technology. We've licensed two news corpora - English from Reuters and German from the Austrian Press Agency. Linguists are creating rules for classifying those corpora with IPTC’s Media Topics using the EXTRA engine.
I'm thrilled to say that the project is on track to deliver a working version of the engine, together with the sample rules, by the summer of 2017. Read more about the project at https://iptc.org/news/extra-iptc-infalia-elasticsearch-open-source-rules-based-classification-engine/ and feel free to contact me for more information.