log in | about 

This is written in response to a Quora question, which asks about the benefits of GRU over LSTMs. Feel free to vote there for my answer on Quora!

The primary advantage is the speed of training and inference: GRU has two gates instead of three (and fewer parameters). However, a simpler design comes at the expense of inferior capabilities (in theory). There is a paper arguing that LSTMs can count, but GRU can not.

The loss of computational expressivity may not matter much in practice. In fact, there is recent work showing that a trimmed, single-gate LSTM can be quite effective in practice: "The unreasonable effectiveness of the forget gate" by Westhuizen and Lasenby, 2018.