English»Data Sets and State-of-the-art (SOTA)»Audio and Speech | searchivarius.org log in | about Skip to main content Blog Directory Top level » Data Sets and State-of-the-art (SOTA) » Audio and Speech Common Voice Database of Sound Insights - over 1,000,000 minutes of real conversation audio between doctors and patients, paired with verbatim transcripts created by Verilogue. LibriSpeech - Large-scale (1000 hours) corpus of read English speech. Open speech corpora list Josh Meyer OpenSLR: Open Speech and Language Resources Daniel Povey TED-LIUM - The TED-LIUM corpus is English-language TED talks, with transcriptions, sampled at 16kHz. It contains about 118 hours of speech. The FEARLESS STEPS Challenge: Massive Naturalistic Audio - 11K hours of multi-channel recordings (Apollo mission).