log in | about 
 

One great feature of Solr is that you can employ different query parsers, even in the same query. There is a standard Solr/Lucene parser and there are number of extensions. One useful extension is the ExtendedDisMax parser. In this parser, it is possible to specify a percentage of query words (or blocks) that should appear in a document. This is some kind of fuzzy matching.

Consider an example of a two-word query "telegraph invent". To retrieve documents using a 80% threshold for the number of matching words, one can specify the following search request:

_query_: "{!edismax mm=80%} telegraph invent "

There is, however, a catch. One may expect that 80% of matching words in a two-word query means that retrieved documents contain both query words. However, this appears not be the case. Somewhat counter-intuitively, the minimum required number of matching keywords is computed by rounding down rather than by a more standard rounding half up. (or half way down)

What if you want to enforce the minimum number of words appearing in a document in a more transparent way? Turns out that you can still do this. To this end, one needs to specify the minimum number of words explicitly, rather than via a percentage. The above example would need to be rewritten as follows:

_query_: "{!edismax mm=2} telegraph invent "

It should apparently be possible to specify even more complex restricting conditions where, e.g., percentages and absolute thresholds are combined. More details on this can be found here. However, combining conditionals did not work out for me (I got a syntax error).