This post on aha moments related to statistical learning duplicates my Quora answer. Feel free to vote and comment there.
I had the same moment a couple of times. It is not the aha moment though, it is a duh moment. Machine learning, which is more appropriately called statistical learning, is so a rear-view window approach. It learns statistical patterns from data, but nothing else. Such a learning creates some sort of a lossy compressed representation of the "past". This compressed representation can be used to predict "the future" as long as the future has the same statistical patterns (as the past). As obvious as it may seem, a clear understanding of this fact helps greatly. In my opinion, this holds for at least the basic supervised learning.
Another duh-moment is that we have been using "machine learning" since the dawn of the civilization to explain natural phenomena. Scientists observed data and came up with some sort of rules to explain why one event follows another one. We clearly started with some basic logical rules (e.g., one can predict that it will be snowing tomorrow given how skies look like today) and progressed to more sophisticated ones that involved math.
Interestingly, human learning has essentially the same flaw as the so called machine learning: Human theories can overfit easily to the data. Given enough degrees of freedom, almost anything can be explained. Yet convoluted theories are rarely true. This is probably one reason why we prefer simple elegant ones: This is some sort of a regularizer that prevents theories from overfitting data.
One well known overfitting example is a Geocentric system, which did not quite agree with observations in the first place. However, it was fixed by introducing a complex scheme of how planets rotate. As a result, the theory predicted planet movements better than alternatives, in particular, better than the simpler Heliocentric system (which was also somewhat flawed in the beginning because it assumed a perfectly circular motion). Many more examples (sadly) arise in a social context, when people try to explain too much while knowing too little. Most of our beliefs and conspiracy theories are probably nothing more than overfitting.