English»Software»Natural Language Processing & Information Extraction»Various "banks" | searchivarius.org
log in | about 

Annotated corpora, lexical DBs, etc...




CCGBank  Julia Hockenmaier and Mark Steedman. - CCGbank is a translation of the Penn Treebank into a corpus of Combinatory Categorial Grammar derivations.
EAT word association thesaurus  
NomBank   - A corpus with noun prepositions and their arguments.
OpenCorpora: the open Russian corpus  
Penn Treebank Project   - The Penn Treebank Project annotates naturally-occuring text for linguistic structure, mostly, skeletal parses showing rough syntactic and semantic information -- a bank of linguistic trees.
PropBank   - a corpus of text annotated with information about basic semantic propositions.
SALSA   - large, frame-based lexicon for German, with rich semantic and syntactic properties.
SemLink   - a project whose aim is to link together different lexical resources via set of mapping.
The Abstract Meaning Representation (AMR) Bank  
Unified Verb Index   - A system which merges links and web pages from four different natural language processing projects
VerbNet is the largest on-line verb lexicon currently available for English.   - It is a hierarchical domain-independent, broad-coverage verb lexicon with mappings to other lexical resources such as WordNet (Miller, 1990; Fellbaum, 1998), Xtag (XTAG Research Group, 2001), and FrameNet (Baker et al., 1998).