log in | about 
 

Adversarial AI vs Evil AI in layman's terms

This is written in response to a Quora question asking to explain in layman's terms the difference between adversarial and evil AI. Feel free to vote on my answer on Quora.

This is an excellent question! For starters, in my opinion, the current AI heavily relies on statistical learning methods, which are rather basic. For this reason it is nowhere near producing sufficiently intelligent machines, let alone machines that can have feelings, emotions, free will etc. There are algorithmic and hardware limitations that I cover in my blog post (also available as a Quora answer).

Modern AI cannot be evil in a traditional sense human sense of the word, however, it can can cause a lot of harm as any other immature technology. For example, despite the famous claim by a Turing award winner G. Hinton that we would have to stop training radiologists roughly today, there is mounting evidence that deep learning methods for image analysis do not always work well.

Furthermore, statistical methods (aka AI) are becoming ubiquitous tools of decision making (money lending, job search, and even jailing people). However, statistical learning methods are not fair and can be biased against certain groups of people. From this perspective AI can be considered evil. Of course, humans are biased too, but human opinions are diverse and we, humans, tend to improve. Having a single black-box uncontrollable decision algorithm that becomes more and more biased is a scary perspective.

Modern AI is not reliable and immature: It works only in very constrained environments. Why is that? Because statistical learning is a rear-mirror-view approach that makes future decision based on patterns observed in the past (aka training data). Once the actual (test) data diverges from training data in terms of the statistical properties, performance of modern AI decreases quite sharply.

In fact, it is possible to tweak the data slightly to decrease the performance of an AI system. This is called an adversarial attack. For example, there is research showing that addition of distractor phrases does not confuse humans much, but it completely “destroys” performance of a natural language understanding system. For the reference, the modern history of adversarial examples started from the famous paper by Szegedy et al 2013. They showed that small image perturbations, which are too small to be noticed by humans, completely confuse deep neural networks.

In summary, the adversarial AI has nothing to do with the evil AI. It concerns primarily with devising methods to fool modern statistical learning methods with (adversarial) examples as well as with methods to defend against such attacks. Clearly, we want models that can withstand adversarial attacks. This is a difficult objective and a lot of researchers specialize in the so called adversarial AI.



Robert Mercer's contribution to the development of machine translation technologies

This is written in response to a Quora question, which asks about Robert Mercer's contribution to the development of machine translation technologies. Feel free to vote there for my answer on Quora!

Robert Mercer (Peter Brown and a few other folks) played a pivotal and crucial role in the creation of the first modern translation models. They were able to create the first modern large scale noisy-channel translation system and publish the first paper on the subject. They created a series of IBM Model X models and spearheaded a new research direction (which is huge nowadays).

Recently Robert achieved an ACL lifetime achievement award for his pioneering work on machine translation. He was recently interviewed on the topic and there is a nice transcript of the story that uncovers a lot of historical details: Twenty Years of Bitext.



How do we make the architecture more efficient for machine learning systems, such as TensorFlow, without just adding more CPUs, GPUs, or ASCIs?

This is written in response to a Quora question, which asks about improving the efficiency of machine learning models without increasing hardware capacity. Feel free to vote there for my answer on Quora!

Efficiency in machine learning in general and deep learning in particular is a huge topic. Depending on what is the goal, different tricks can be applied.

  1. If the model is too large, or you have an ensemble, you can train a much smaller student model that mimics behavior of a large model. You can train to predict directly the probability distribution (for classification). The classic paper: "Distilling the Knowledge in a Neural Network" by Hinton et al., 2015.

  2. Use a simpler model and/or smaller model, which parallelizes well. For example, one reason transformer neural models are effective is that they are easier/faster to train compared to LSTMs.

  3. If the model does not fit into memory, you can train it using mixed precision: "Mixed precision training" by Narang et al 2018.

  4. Another trick, which comes at the expense of run-time, consists in discarding some of the tensors during training and recomputing them when necessary: "Low-Memory Neural Network Training: A Technical Report" Sohoni et al, 2019. There is a Google library for this: "Introducing GPipe, an Open Source Library for Efficiently Training Large-scale Neural Network Models."

  5. There is a tons of work on quantization (see, e.g., Fixed Point Quantization of Deep Convolutional Networks" by Lin et al 2016) and pruning of neural networks ("The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks" by Frankle and Carbin.) I do not remember a reference, but it is possible to train quantized models directly so that they use less memory.



Benefits of GRUs over LSTMs

This is written in response to a Quora question, which asks about the benefits of GRU over LSTMs. Feel free to vote there for my answer on Quora!

The primary advantage is the speed of training and inference: GRU has two gates instead of three (and fewer parameters). However, a simpler design comes at the expense of inferior capabilities (in theory). There is a paper arguing that LSTMs can count, but GRU can not.

The loss of computational expressivity may not matter much in practice. In fact, there is recent work showing that a trimmed, single-gate LSTM can be quite effective in practice: "The unreasonable effectiveness of the forget gate" by Westhuizen and Lasenby, 2018.



What are some direct implications of Wittgenstein’s work on natural language processing?

This is written in response to a Quora question, which asks about direct implications of Wittgenstein’s work on natural language processing. Feel free to vote there for my answer on Quora!

How could Wittgenstein have influenced modern NLP? Yorick Wilks cited by the question asker hints at three possible aspects:

  1. Distributional semantics
  2. Symbolic representations and computations
  3. Empiricism

Wittgenstein likely played an important role in the establishment of distributional semantics. We mostly cite Firth’s famous "You shall know a word by the company it keeps", but this was preceded by Wittgenstein’s "For a large class of cases—though not for all—in which we employ the word ‘meaning’ it can be defined thus: the meaning of a word is its use in the language." This formulation was given in his “Philosophical Investigations”, published posthumously in 1951, but he started to champion this idea as early as 1930s. It likely influenced later thinkers and possibly even Firth.

Let’s move onto the symbolic representations. In his earlier work Wittgenstein postulates that the world is a totality of facts, i.e., logical propositions (which is called logical atomism). It is not totally clear what could be the practical consequences of this statement (should it be implemented as an NLP paradigm). In addition, Wittgenstein rejected logical atomism later in life. He also declared that it is not possible/productive to define words by mental representations or references to real objects: Instead, one should focus exclusively on the word use. This sounds very "anti-ontology" to me.

Last, but not least, modern NLP has a statistical foundation. However, Wittgenstein never advocated an empirical approach to language understanding. I have found evidence that he dismissed weak empiricism.



Pages

Subscribe to RSS - blogs