I have been helping my co-authors Pavel Efimov and Pavel Braslavski (who did nearly all the work) to analyze and describe this data set. We have conducted a very thorough analysis and evaluated several powerful models. The full analysis is available online, but here I would like to highlight the following:
SberQuAD was created similarly to Stanford SQuAD. Yet, despite the similarities, all the models perform worse on SberQuAD than on SQuAD, which can be attributed to having only a single answer variant and fewer answers that are named entities. A lot of answers in SberQuAD still often contain an entity, but it is normally only a part of an answer. This stands in contrast to SQuAD where roughly half of the answers are named entities.