MNIST is super easy and few people know it!

Blog

Directory

Submitted by srchvrs on Sun, 02/16/2020 - 14:01

One can never be too surprised by the phenomenal success of the MNIST dataset, which is used in so many image publications. But do people realize how easy this dataset is? One clear measure of hardness is performance of a simplistic k-NN classifier with vanilla L2 metric directly on pixels. As a variant: performance of the k-NN classifier with some basic unsupervised transformations such as the principal component analysis (PCA) or denoising.

I created a small poll to assess what people think about MNIST's k-NN search accuracy. I thank everybody for participation: Fortunately, more than one hundred people responded (most of them are machine learning practitioners and enthusiasts I assume). So, I think the results are rather reliable.

In summary, nearly 40% of the respondents think that the accuracy would be at most 80%, 45% think the accuracy is 95%. Unfortunately, I did not create the option for 90%. I think it would have had quite a few responses as well. That said the vanilla k-NN search on pixels has 97% accuracy and the combination of the PCA and the k-NN classifier has nearly 98% accuracy (here is a notebook to back up 98% claim.). In fact, with a bit of additional pre-processing such as deskewing and denoising, one can get a nearly 99% accuracy.

Turns out that few people realize how effective the k-NN classifier is on MNIST: only 17% voted for 98%. That said, it does not mean that the k-NN classifier is such a good method overall (it can be good for tabular data, see, e.g., this paper by Shlomo Geva, but not for complex image data, check, e.g., out numbers for CIFAR and IMAGENET). It means, however, that MNIST is very easy. Understandably, people need some toy dataset to play and quickly get results with. One better alternative is the fashion MNIST. However, it is not too hard either. A vanilla k-NN classifier has about 85% accuracy and it is probably possible to push the accuracy close to 90% with a bit of preprocessing. Thus, we may need a comparably small, but much more difficult dataset to replace both of them.

srchvrs's blog

You are here