English»Data Sets and State-of-the-art (SOTA)»Audio and Speech | searchivarius.org
log in | about 
 



Common Voice  
Database of Sound Insights   - over 1,000,000 minutes of real conversation audio between doctors and patients, paired with verbatim transcripts created by Verilogue.
LibriSpeech   - Large-scale (1000 hours) corpus of read English speech.
Open speech corpora list  Josh Meyer
OpenSLR: Open Speech and Language Resources  Daniel Povey
TED-LIUM   - The TED-LIUM corpus is English-language TED talks, with transcriptions, sampled at 16kHz. It contains about 118 hours of speech.
The FEARLESS STEPS Challenge: Massive Naturalistic Audio   - 11K hours of multi-channel recordings (Apollo mission).