This is written in response to a Quora question, which asks about the benefits of GRU over LSTMs. Feel free to vote there for my answer on Quora!
The primary advantage is the speed of training and inference: GRU has two gates instead of three (and fewer parameters). However, a simpler design comes at the expense of inferior capabilities (in theory). There is a paper arguing that LSTMs can count, but GRU can not.