English»Software»Natural Language Processing & Information Extraction»Extraction & Summarization

Blog

Directory

Temporal taggers

A Sequential Pattern Mining Framework (SPFM)

Allen AI Open IE - Allen AI database of facts and software to extract it. You can browse/search the database here.

Balie - Baseline Information Extraction BALIE: baseline information extraction - language identification, tokenization, sentence boundary detection, named-entity recognition

Boilerpipe: remove boilerplate HTML code

ClausIE - Clause-Based Open Information Extraction

Clinical Text Analysis and Knowledge Extraction System (cTAKES) - An open-source natural language processing system for information extraction from electronic medical record clinical free-text. It processes clinical notes, identifying types of clinical named entities from various dictionaries including the Unified Medical Language System.

Historical Events Web Extractor - This Web API provides historical events extracted from Wikipedia articles.

IEPY - an open source tool (written in Python) for Information Extraction focused on Relation Extraction.

KEA: key phrase identification - KEA is an algorithm for extracting keyphrases from text documents.

KNIME - The KNIME text processing feature enables to read, process, mine and visualize textual data in a conveniet way (NLP, text mining, IR).

KnowItAll - Information extraction tools.

LingScope - an automatic negation and hedge scope detector for biomedical text.

Luhn summarization algorithm Matthew Russell

Maui indexing and summarization software - Maui automatically identifies main topics in text documents. Depending on the task, topics are tags, keywords, keyphrases, vocabulary terms, descriptors, index terms or titles of Wikipedia articles.

Mavuno: A Hadoop-Based Text Mining Toolkit Donald Metzler

MEAD is a public domain portable multi-document summarization system.

MinorThird William W. Cohen - MinorThird is a collection of Java classes for storing text, annotating text, and learning to extract entities and categorize text. The older repository on SourceForge.

MITIE - MIT library and tools for information extraction.

MontyLingua - A light-weight barebones NLP toolkit (lemmatization, POS-tagging, chunking, and basic semantic parsing)

MultIr - multi-instance learner for information extractor.

Neural Relation Extraction

Newspaper3k Lucas Ou-Yang - extraction of news, full-text, and article metadata (article scraping & curation).

Pattern: A web mining module for the Python programming language.

Pubmed miner Jyoti Rani, S.Ramachandran, Ab Rauf Shah - Text mining of PubMed Abstracts (text and XML)

reach - Reading and Assembling Contextual and Holistic Mechanisms from Text. In plain English, reach is an information extraction system for the biomedical domain, which aims to read scientific literature and extract cancer signaling pathways.

ReVerb - is a program that automatically identifies and extracts binary relationships from English sentences. ReVerb is designed for Web-scale information extraction, where the target relations cannot be specified in advance and speed is important.

SentiWordNet - a lexical resource for opinion mining and sentiment analysis.

Stanford Core NLP relation extractor

Stanford's DeepDive - extracts relationships about entities from unstructured text and performs reasoning over extracted facts.

Stanford's Tregex - a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions").

talon - Mailgun library to extract message quotations and signatures.

TextMarker - The TextMarker system is a rule-based tool for information extraction and text processing tasks. It provides a full-featured development environment based on the DLTK framework4 and a build process for UIMA Type Systems and generic UIMA Analysis Engines.

UCL/UMass BioNLP Event Extractor

UIMA Regex Kenneth Huang - a tool that allows you to query your UIMA annotation graph using regular expressions.