log in | about 
 

Teaching an old cat new dog tricks

They say you cannot teach an old dog new tricks, but what about cats? Let me introduce our Bengal cat Masyanya Markovna Bonus, whose patronymic name Markovna refers to the Markov stochastic memoriless process. Once fed, she quickly transitions to a new state (discarding the history simultaneously) and starts demanding the food again.

Masyanya Markovna is ten years old. By cat's standards this a near-retirement age. This is often the time when cats start developing health problems. For example, our poor kitty lost all her teeth recently. Yet, she stays mentally sharp and eager to learn.

We are not the first owners of Masyanya: she has been living with us for only three years. It was not until two years ago when my wife noticed how our ever hungry animal was reaching out for food. Jokingly she suggested we should teach her to do the ‘Stand!’ trick. We knew cats could be trained, but our cat was already fairly old. Before she joined our family, she was not trained to do any tricks. Neither did I have animal-training experience.

To our astonishment, the experiment was successful. Encouraged by this, we decided to try some new dog tricks. It was a slow process bearing some similarity to training a deep artificial neural network. In short, it can be easy to teach the kitty do one trick, but quite problematic to teach another one. Cats (somewhat similar to sophisticated machine learning algorithms) overtrain easily. As a result, no matter what the command is, the cat may want to perform the trick she learned first. However, once you are done with two tricks, you can do many more.

Another parallel to machine learning: Cats are sensitive to priors (at least this was our case). For example, they may perform the trick quite well in a familiar setting (e.g., the living room), but completely refuse to do anything in another setting. They react to the voice commands, but they are also very sensitive to the body language.

To substantiate our story, we post the video of the latest performance, where our awesome cat does several dog tricks:

What is a takeaway message? One is quite clear: Cats are highly trainable, even mature ones. But, perhaps, more importantly, if mature cats can learn new tricks, people should not be afraid to do the same. Even though lots of people may try to discourage you, uncovering and nurturing your talent should not stop in your 3rd or 4th decade.

This series of cat posts is co-authored with Anna Belova.



If data is new code, we need new design patterns

As the number of data-driven applications increases, data is becoming an important part of the code base. It is such a clear trend that some people even rushed to announce a demise of code. "Code is a commodity", claims Henry Verdier, and "Data is the new code". While this seems to be an exaggeration, our increasing dependence on data has consequences. In fact, as Sculley and colleagues argue in their recently published paper "Machine Learning: The High-Interest Credit Card of Technical Debt" (D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young), the cost of data dependencies outweighs the cost of code dependencies in projects heavily relying on data-driven (aka machine learning) approaches. Forget about the never ending "functional vs objected oriented" debate. Let us get straight on the issue of data dependencies first.

Unfortunately, it is not easy to do. Sculley and colleagues argue that in traditional software engineering, the number of interdependencies can be greatly reduced via encapsulation and modular design. This is possible because we write modules and functions to satisfy certain strict requirements. We know there are certain logical invariants and can check functionality via unit and integration tests. For certain applications, we can even formally verify correctness. However, as note the authors, "... it is difficult to enforce strict abstraction boundaries for machine learning systems by requiring these systems to adhere to specific intended behavior. Indeed, arguably the most important reason for using a machine learning system is precisely that the desired behavior cannot be effectively implemented in software logic without dependency on external data."

At the same time, it is not easy to decompose the data, because there are no truly independent features. It is hard to isolate an improvement (which features did contribute most?) and it is hard to debug problems. Overall, machine learning systems are much more fragile, because small local changes (e.g., in regularization parameters, convergence thresholds) may and often do have ripple effects. Sculley and colleagues call this phenomenon a CACE principle: Changing Anything Changes Everything.

Clearly, as the data-driven applications become even more common, there will be an established set of best practices and design patterns tailored specifically to management of data. Some of the emerging patterns are already discussed by Sculley and colleagues. Among other things, they recommend reducing the amount of glue code, removing little-impact features and experimental code paths.

There are a number of tools (in many languages) to identify code dependencies. Sculley and colleagues argue that data dependencies can be analyzed as well, in an automatic or semi-automatic manner. At the very least, one can catalog all the features used in the company. Different learning modules can report on the usage of the features to a central repository. When a version of the feature changes, or the feature becomes deprecated, it is possible to find all relevant consumers quickly. Such a feature management tool greatly reduces the risk of having a stealthy consumer, e.g., one that reads features from log files, whose behavior is adversarially affected by deprecation or change of certain input signals.

Machine learning is a powerful tool allowing us to quickly build complex systems based on previously observed data patterns instead of laboriously handcrafting the patterns manually. Yet, its performance hinges on the assumption that previously observed statistical properties of the data remain unchanged in the future. A situation, where this assumption is violated, is called a concept drift. As a result of the concept drift, performance of a predictive model deteriorates with time. The more sophisticated is the model, the more likely it is to suffer from this drift. In particular, an error rate of a simple linear model may become equivalent to that of a more sophisticated model!

Unfortunately, in the real world the main machine learning assumption does not hold. An example of the domain, where the concept drift is especially stark, is spam detection. The current anti-spam software is good, but it would not be good without constant retraining and introduction of new features. Again, Sculley and colleagues do discuss this problem and propose a couple of mitigation strategies.

To conclude, I again emphasize that data-driven applications are different from classic software projects. It is expected that new best practices and design patterns will evolve and mature in the future to deal with problems like data dependencies and the ever changing statistical properties of the external world. The paper "Machine Learning: The High-Interest Credit Card of Technical Debt" overviews some of the design practices already used successfully by Google folks. I would recommend reading this paper and following some of the references to everyone interested in building large interconnected machine learning systems.



Michael Jordan on the Delusions of Big Data, P=NP, and singularity.

Michael Jordan gave a very interesting interview on big data, singularity, Turing test, P=NP, and artificial intelligence. Yann LeCun's reaction: "Michael Jordan, like some of us, has strong opinions about certain things."

Below I summarize what seem to be the main points made by Michael Jordan (item 7 is my favorite):

  1. We don’t know how neurons learn.
  2. Neurons in artificial neural networks do not mimic real neurons: "Anyone in electrical engineering would recognize those kinds of nonlinear systems. Calling that a neuron is clearly, at best, a shorthand. It’s really a cartoon."
  3. The number of hypothesis is enormous. Thus, a multiple comparison/testing problem can be really an issue in large-scale data mining.
  4. He predicts a " ... big-data winter. After a bubble, when people invested and a lot of companies over-promised without providing serious analysis, it will bust. "
  5. Solving the problems will take decades during which we will improve steadily. Yet, there has not been a significant technological breakthrough made.
  6. "Despite recent claims to the contrary, we are no further along with computer vision than we were with physics when Isaac Newton sat under his apple tree."
  7. If I had an unrestricted $1 billion grant, I would work on natural language processing.
  8. Singularity is not an academic discipline.
  9. On the Turing test: "there will be a slow accumulation of capabilities, including in domains like speech and vision and natural language. " In that, the Turing test does not seem to be "... a very clear demarcation."


Undocumented invalidation of UIMA iterators

As mentioned previously, in the world of natural language processing (at least, on some of its continents), everything is an annotation. In the Apache UIMA framework, there are capabilities to efficiently iterate over these annotations. For example, it is possible to retrieve a POS tag of every document word, or words belonging to (or covered by) a single sentence (annotation).

The iteration functionality is supported via the class FSIterator<T extends FeatureStructure>. These iterators can be "invalidated" in a rather interesting fashion, which does not seem to be documented properly.

Specifically, if one iterates over annotations and deletes them on the fly:

  1. FSIterator<Annotation> it = ...
  2. while (it.isValid()) {
  3. Annotation an = it.get();
  4. an.removeFromIndexes();
  5. an.moveToNext();
  6. }

the behavior seems to be undefined. Sometimes you may get a ConcurrentModificationException and sometimes you do not retrieve all indexed annotations. This issue is not limited to deletion. If you iterate over annotations, create new ones, and add them to the index (using the function addToIndexes), you are also likely to generate ConcurrentModificationException. This happens even if you iterate over annotations of one type and create annotations of another type.

It is a rather expected behavior, because, normally, you cannot iterate over the index and modify this index at the same time (though some fancy implementations of containers do support this). However, many UIMA users (including me) managed to fall into this trap. UIMA docs seem to be silent about this issue. The only confirmation of the described effect that I could find was in this obscure mailing list. Yet, I think an appropriate warning should be printed in a large red font.



This is the sort of English up with which I will not put!

There is a common belief that English sentences should not be ended with prepositions. I have heard that Californian teachers are especially vigorous in beating this nonsense into students' heads. There is a famous anecdote telling the story of a Nobel prize winner Winston Churchill, who was offended by an editor clumsily rearranging one of his sentences, which ended with a preposition. Being proud of his style, Winston Churchill wrote in reply (note that are several variants of this phrase circulating): "This is the sort of English up with which I will not put.”

This joke is not as good as it may seem at first glance, because, in this sentence, up is a verb particle, not a preposition! Simply speaking, the verb is the whole phrase put up. Verb particles can be moved, e.g.: both "switch off the lights" and "switch the lights off" are grammatical. However, I suspect that it is ungrammatical to move particles the way Winston Churchill did in his humorous reply to the editor.

Anyways, "stranded" prepositions are perfectly fine in English. Yet, I have been wondering why this is considered ungrammatical by so many people. Turns out that Romance languages, in general, and Latin in particular, do not have preposition stranding. Teachers believed that constructs impossible in Latin should not be allowed in English. As a result, for hundreds of years, they have been telling us that "nobody to play with" is ungrammatical.

Disclaimer: I know that there are some good arguments against the veracity of Churchill's story.

Credits: This post resulted from observations of El Nico Fauceglia and remarks by a linguist who wanted to remain anonymous. Anna Belova told me the Churchill's anecdote.



Pages

Subscribe to RSS - blogs