A collection of links for streaming algorithms and data structures Debasish Ghosh |
ANN: A Library for Approximate Nearest Neighbor Searching David M. Mount, Sunil Arya |
Annoy (Approximate Nearest Neighbors Something Something) - an efficient library for k-NN search that relies on a forest of space-decomposition trees.
|
Bloom filters for C++11 Matthias Vallentin |
Cover-Tree - C++ implementation of the cover tree data structure for quick k-nearest-neighbor search. Allows single-point insertion and removal.
|
Cover-Tree page John Lang |
CuckooFilter4J Mark Gunlogson - A replacement for the Guava Bloom filter implementation that uses cuckoo filters to obtain similar performance and additional functionality (e.g. concurrency and fast deletions).
|
FAISS - Facebook's library for similarity search for CPU and GPU.
|
FastSS: Fast Similarity Search in Large Dictionaries - A generalization of Mor-Fraenkel method for edit distances > 1.
|
FLANN - FLANN is a library for performing fast approximate nearest neighbor searches in high dimensional spaces.
|
gensim - topic modelling for humans |
java-LSH Thibault Debatty |
java-string-similarity Thibault Debatty - Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity.
|
jerboa - a package for prototyping randomized/streaming algorithms and data structures, primarily intended for HLT applications.
|
k-graph Wei Dong - an method for K-NN graph construction and nearest neighbor search.
|
KNN_CUDA - K-nearest neighbor search on GPU.
|
MetricKnn |
MIH: Multi-Index Hashing - Fast exact nearest neighbor search in Hamming distance on binary codes with Multi-index hashing. See also a GitHub link.
|
nanoflann - a C++ header-only fork of FLANN, a library for KD-trees.
|
Natix - a multi-functional search library in C# (which also supports many similarity-search algorithms)
|
NR-grep: a Fast and Flexible Pattern Matching Tool Gonzalo Navarro |
Panns: Python Approximate Nearest Neighbor Search |
Practical Algorithm Template Library - C++ library on PATRICIA trie |
Rabin fingerprinting Bill Dwyer |
SecondString: Java library for approximate string matching and information retrieval - SecondString is intended primarily for researchers in information integration and other scientists. It does or will include a range of string-matching methods from a variety of communities, including statistics, artificial intelligence, information retrieval, and databases. It also includes tools for systematically evaluating performance on test data. It is not designed for use on very large data sets.
|
SeqAn - an open source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data.
|
simbase - a vector similarity database.
|
Stream Lib - A Java library for approximate summarization, i.e., for approximate counting.
|
The FLAMINGO Project - The project focuses on data cleaning and approximate string matching.
|
Tree distance functions |
WebGlimpse |