English»Software»Natural Language Processing & Information Extraction

Blog

Directory

Annotation tools

Conversational agents (dialog systems and chatbots)

Coreference resolution

Datasets

Question Answering (QA), Catalogs/lists, Sentiment Analysis and Opinion Mining, ...

Distributional semantics

Document Classification & Categorization

Document Parsers & Cleaners

Extraction & Summarization

Temporal taggers

Frameworks

Knowledge Bases & Knowledge Base Completion

Language Modelling/Generation/Detection

Machine Translation

Statistical systems, Rule-based systems, Example-based systems, ...

Morphology and Stemming

Ontologies/Encyclopedias & Semantic Web

Paraphrasing

Parsing & Tagging

Constituency and Dependency Parsers, Named Entity Recognizers (NER), Part of Speech (POS) tagging, ...

Question Answering (QA)

Slot filling, Question generation

Reasoning & Inference & Rule Engines

Search Engines

Crawlers, Forward & Graph Indices, Indri & Lemur, ...

Sentiment analysis

Toolkits & Frameworks

Topic Modelling

Various "banks"

WordNet, FrameNet

Word and document embeddings

Word segmentation/tokenization

Word Sense Disambiguation (WSD)

A curated list of machine learning and NLP software

A Survey of Text Mining Architectures and the UIMA Standard. Mathias Bank, Martin Schierle

A what-to-use diagram (and a blog post) for opensource NLP software

Deep learning for NLP in PyTorch

DeepNLP-models-Pytorch - Pytorch implementations of various Deep NLP models in cs-224n (Stanford Univ: NLP with Deep Learning).

DKPRO a set of useful open-source UIMA components

English Parser Evaluation Corpus - dependency-parser evaluation data

GramLab - free and open source linguistic tools for the processing of textual information.

Implementation of the Brown hierarchical word clustering algorithm. Percy Liang

Intelligent Archive

Practical PyTorch

Semantic Matching - Semantic matching is a type of ontology matching technique that relies on semantic information encoded in lightweight ontologies to identify nodes that are semantically related in graph-like structures.

Stanford's Tregex - a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions").

TextCat - a language guesser and text categorizer.

UW SPF - The University of Washington Semantic Parsing Framework

warp-ctc - A fast parallel implementation of Connectionist Temporal Classification (CTC), on both CPU and GPU.