English»Software»Natural Language Processing & Information Extraction | searchivarius.org
log in | about 
 



Annotation tools
 

Conversational agents (dialog systems and chatbots)
 

Coreference resolution
 

Datasets
Question Answering (QA), Catalogs/lists, Sentiment Analysis and Opinion Mining...

Distributional semantics
 

Document Classification & Categorization
 

Document Parsers & Cleaners
 

Extraction & Summarization
Temporal taggers

Frameworks
 

Knowledge Bases & Knowledge Base Completion
 

Language Modelling/Generation/Detection
 

Machine Translation
Statistical systems, Rule-based systems, Example-based systems...

Morphology and Stemming
 

Ontologies/Encyclopedias & Semantic Web
 

Paraphrasing
 

Parsing & Tagging
Constituency and Dependency Parsers, Named Entity Recognizers (NER), Part of Speech (POS) tagging...

Question Answering (QA)
Slot filling, Question generation

Reasoning & Inference & Rule Engines
 

Search Engines
Crawlers, Forward & Graph Indices, Indri & Lemur...

Sentiment analysis
 

Toolkits & Frameworks
 

Topic Modelling
 

Various "banks"
WordNet, FrameNet

Word and document embeddings
 

Word segmentation/tokenization
 

Word Sense Disambiguation (WSD)
 

 


A curated list of machine learning and NLP software  
A Survey of Text Mining Architectures and the UIMA Standard.  Mathias Bank, Martin Schierle
A what-to-use diagram (and a blog post) for opensource NLP software  
Deep learning for NLP in PyTorch  
DeepNLP-models-Pytorch   - Pytorch implementations of various Deep NLP models in cs-224n (Stanford Univ: NLP with Deep Learning).
DKPRO a set of useful open-source UIMA components  
English Parser Evaluation Corpus   - dependency-parser evaluation data
GramLab   - free and open source linguistic tools for the processing of textual information.
Implementation of the Brown hierarchical word clustering algorithm.  Percy Liang
Intelligent Archive  
Practical PyTorch  
Semantic Matching   - Semantic matching is a type of ontology matching technique that relies on semantic information encoded in lightweight ontologies to identify nodes that are semantically related in graph-like structures.
Stanford's Tregex   - a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions").
TextCat   - a language guesser and text categorizer.
UW SPF - The University of Washington Semantic Parsing Framework  
warp-ctc   - A fast parallel implementation of Connectionist Temporal Classification (CTC), on both CPU and GPU.