This a personal web page/blog of Leonid Boytsov. He is currently a research engineer at the Bosch Center for Artificial Intelligence (BCAI). Before joining BCAI he was working on speech recognition for the medical domain (at M*Modal/3M). Overall, he has been a professional computer scientist since 1996 (full time since 1997). Leonid remembers dependency parsing & rotary phones. He started working as a full-stack developer, but has been gradually drifting towards the land of computer science research. This drift started from the interest in the search applications and algorithms.
An important by-product of Leonid's research is an efficient and flexible library for k-NN search codenamed NMSLIB created in collaboration with several other folks. Thanks to the contribution of Yury Malkov, the library was adopted by Amazon. The core retrieval method HNSW contributed by Yury was also reimplemented in the Facebook library FAISS. A brief description of this collaboration can be found on this LTI news page. This work is discussed in a podcast with Radim Řehůřek (author of Gensim) in March 2018. Feel free to check our code on GitHub.
NMSLIB integrates with another retrieval toolkit called FlexNeuART, which in August, November and December 2020 produced best neural and traditional submissions on the MS MARCO document ranking leaderboard. In that, the strongest traditional run outperformed a number of neural systems.
Leonid also co-authored an extremely efficient algorithm for light-weight compression of sorted integer numbers. We show that this algorithm can decompress at the speed of reading from memory. You can find software on GitHub. This software grew out of a now-popular library FastPFor. FastPFor has Python bindings.
Leonid was a graduate research assistant (aka PhD student) at the Language Technologies Institute at Carnegie Mellon University (under the supervision of Professor Eric Nyberg). In his thesis "Efficient and Accurate Non-Metric k-NN Search with Applications to Text Matching" he explored how various linguistic, neural, and lexical features can be incorporated directly into a candidate generation component (via k-NN search). He was assisting in teaching the following courses & seminars: Algorithms for NLP (11-711) in 2013, Software Engineering I (11-791) and Data Science Seminar (11-631) in 2014.
Leonid likes collecting material related to search technologies and other AI-related topics (e.g., algorithms, software, interesting papers, and even historical anecdotes). Note that his opinions and views do not necessarily represent opinions of his employer, his dissertation advisor, or the Language Technologies Institute.
Featured blog posts: