log in | about 
 

This is written in response to a Quora question. It asks about differences and similarities between the IBM Watson QA system and a deep neural network system described in the DeepMind paper "Teaching Machines to Read and Comprehend Hermann et al 2015".

First I note that we don’t know exactly how IBM Watson works. However, we can clearly see that two systems solve two different problems. While both systems search for an entity that is answer to a question (henceforth answer entity), in the case of the IBM Watson, the answer entity is sought for in a large array of unstructured information. In contrast, the DeepMind approach only need to select an entity from a given document, which is guaranteed to contain the answer.

Finding an answer in a small document is usually an easier problem. My claim is backed up by data in Table 2 of the above-mentioned DeepMind paper. From this table we can see, that in 85% of all cases a correct answer is among top 10 most frequent entities. An easier problem does not mean an easy solution and, in fact, I do find Deepmind accuracy numbers to be impressive. DeepMind neural models beat a simpler word distance model by 20% without doing any feature engineering.

That said, finding an answer entity in a large collection is a more challenging problem, but it is essentially reduced to the problem of finding an answer in a much smaller (pseudo) document. Such a document is created using (mostly) information retrieval techniques by obtaining document snippets that are textually similar to a question. There are alternative approaches to finding relevant pieces of information such as knowledge bases (see also my older post here), but they do not seem to produce many answers.

Reducing the problem to finding an answer in a smaller collection is not enough. A more accurate extraction relies on a number of models and heuristics that we do not know (they are IBM's secret sauce). However, judging by IBM Watson publications, the key heuristic seems to be an answer type. For example, if the question is about a person, the system would extract people's name and focus on analyzing most frequent once. In that, we want to exclude person names that are already mentioned in the question text. To reiterate, the complete answer selection model is, of course, much more complicated and includes many more features such as textual similarity.

To conclude, I want to note that the original IBM Watson approach employs a lot of feature engineering. However, because the search in a large collection is reduced to a search in a much smaller subset of potentially relevant snippets, good results may be obtained by replacing manual feature engineering (and heuristics some of which are known from 60s) with neural network approaches similar to what is described in the DeepMind paper.