English»Software»Data Structures & Algorithms»Approximate Matching & Counting | searchivarius.org
log in | about 
 


Also known as approximate searching or fuzzy searching.


Locality Sensitive Hashing (LSH)
 

 


A collection of links for streaming algorithms and data structures  Debasish Ghosh
ANN: A Library for Approximate Nearest Neighbor Searching  David M. Mount, Sunil Arya
Annoy (Approximate Nearest Neighbors Something Something)   - an efficient library for k-NN search that relies on a forest of space-decomposition trees.
Bloom filters for C++11  Matthias Vallentin
Cover-Tree   - C++ implementation of the cover tree data structure for quick k-nearest-neighbor search. Allows single-point insertion and removal.
Cover-Tree page  John Lang
CuckooFilter4J  Mark Gunlogson - A replacement for the Guava Bloom filter implementation that uses cuckoo filters to obtain similar performance and additional functionality (e.g. concurrency and fast deletions).
FAISS   - Facebook's library for similarity search for CPU and GPU.
FastSS: Fast Similarity Search in Large Dictionaries   - A generalization of Mor-Fraenkel method for edit distances > 1.
FLANN   - FLANN is a library for performing fast approximate nearest neighbor searches in high dimensional spaces.
gensim - topic modelling for humans  
java-LSH  Thibault Debatty
java-string-similarity  Thibault Debatty - Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity.
jerboa   - a package for prototyping randomized/streaming algorithms and data structures, primarily intended for HLT applications.
k-graph  Wei Dong - an method for K-NN graph construction and nearest neighbor search.
KNN_CUDA   - K-nearest neighbor search on GPU.
MetricKnn  
MIH: Multi-Index Hashing   - Fast exact nearest neighbor search in Hamming distance on binary codes with Multi-index hashing. See also a GitHub link.
nanoflann   - a C++ header-only fork of FLANN, a library for KD-trees.
Natix   - a multi-functional search library in C# (which also supports many similarity-search algorithms)
NR-grep: a Fast and Flexible Pattern Matching Tool  Gonzalo Navarro
Panns: Python Approximate Nearest Neighbor Search  
Practical Algorithm Template Library - C++ library on PATRICIA trie  
Rabin fingerprinting  Bill Dwyer
SecondString: Java library for approximate string matching and information retrieval   - SecondString is intended primarily for researchers in information integration and other scientists. It does or will include a range of string-matching methods from a variety of communities, including statistics, artificial intelligence, information retrieval, and databases. It also includes tools for systematically evaluating performance on test data. It is not designed for use on very large data sets.
SeqAn   - an open source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data.
simbase   - a vector similarity database.
Stream Lib   - A Java library for approximate summarization, i.e., for approximate counting.
The FLAMINGO Project   - The project focuses on data cleaning and approximate string matching.
Tree distance functions  
WebGlimpse