Datasets and software used in the following papers:
- SISAP 2012: Leonid Boytsov, Super-Linear Indices for Approximate Dictionary Searching. SISAP 2012: 162-176
- JEA ACM 2011: Leonid Boytsov. 2011. Indexing methods for approximate dictionary searching: Comparative analysis. J. Exp. Algorithmics 16, 1, Article 1 (May 2011).
You can download the articles on this page.
There are virtually no restrictions on using software and data (see details here) that was designed by me. I appreciate if you cite my work.
However, the archives contain several third-party packages (which I used for comparison). These packages may be a subject to different licenses and can be guarded by patents (I do believe that agrep and the underlying shift-and algorithm is patented). These packages include at least the following:
- G. Navarro's implementation of the lazy Levenshtein automaton. (The folder NavarroDFA).
- NR-grep (developed by G. Navarro).
- agrep (developed by Sun Wu and Udi Manber)
- FastSS
Download and build instructions.
The data and sources used in the JEA ACM 2011 paper are also available here (check the tab "Source Materials").
- Download and unpack the source file;
- Download the datasets to the source file directory;
- Check the README file for building/testing instructions.
Source files.
Data sets.