Update README.md

Chribit · web-flow · commit c0cf49a8e3f7 · 2022-02-11T23:20:11.000+01:00
diff --git a/README.md b/README.md
@@ -20,7 +20,7 @@ A collection of resources for natural language processing. Mostly links to datas
   <br> **Content**:
     - 24 hours _english_ by 1 voice
     
-3. VCTK Dataset
+3. CSTR VCTK Corpus
   <br> **Link**: https://datashare.ed.ac.uk/handle/10283/3443
   <br> **Content**:
     - ~400 sentences _english_ each by 110 voices
@@ -41,6 +41,7 @@ A collection of resources for natural language processing. Mostly links to datas
   <br> **Content**:
     - extracted from LibriVox (see 4.)
     - ~1000 hours _english_
+    - ~585 hours in higher quality at https://openslr.org/60/
     
 6. Vox Forge
   <br> **Link**: http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/
@@ -84,3 +85,74 @@ A collection of resources for natural language processing. Mostly links to datas
     - 1529 sentences _japanese_
     - 198 sentences _italian_
     - and many more
+    
+11. Spoken Wikipedia Corpora
+  <br> **Link**: https://nats.gitlab.io/swc/
+  <br> **Content**:
+    - spoken wikipedia articles
+    - 386 hours _german_ by 339 voices
+    - 395 hours _english_ by 395 voices
+    - 224 hours _dutch_ by 145 voices
+    
+12. M-AILABS Speech Dataset
+  <br> **Link**: https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/
+  <br> **Content**:
+    - mostly extracted from LibriVox (see 4.)
+    - 237 hours _german_
+    - 45 hours _british english_
+    - 102 hours _american english_
+    - 108 hours _spanish_
+    - 127 hours _italian_
+    - 87 hours _ukranian_
+    - 46 hours _russian_
+    - 190 hours _french_
+    - 53 hours _polish_
+    - contains _mixed_ data i.e. female and male speakers
+    
+13. VCTK Noisy Speech Database
+  <br> **Link**: https://datashare.ed.ac.uk/handle/10283/2791
+  <br> **Content**:
+    - noisy and clean audio files by up to 56 voices
+    - includes written transcripts
+    - unknown amount of hours
+
+14. American English Speech Corpus
+  <br> **Link**: https://www.magicdatatech.com/datasets/mdt-tts-e018-american-english-speech-corpus-for-tts-1631179203
+  <br> **Content**:
+    - ~2 hours _american english_ by 1 female voice
+    
+15. American Male Voice Dataset
+  <br> **Link**: https://www.magicdatatech.com/datasets/mdt-tts-e009-american-male-voice-tts-dataset
+  <br> **Content**:
+    - 15 hours _american english_ by 1 male voice
+    
+16. Facebook Vox Populi
+  <br> **Link**: https://github.com/facebookresearch/voxpopuli
+  <br> **Content**:
+    - download instructions in README of repository
+    - in 16 european languages including _english_, _german_, _french_ and _spanish_
+    - 1800 hours transcribed audio by unknown amount of voices
+    
+17. Multilingual Libri Speech
+  <br> **Link**: https://openslr.org/94/
+  <br> **Content**:
+    - unclear if transcripts provided
+    - extracted from LibriVox (see 4.)
+
+18. Kensho SPGI Speech
+  <br> **Link**: https://datasets.kensho.com/datasets/spgispeech
+  <br> **Content**:
+    - transcribed company earnings calls
+    - ~5000 hours _international business english_ by ~50000 voices
+    
+19. Free Spoken Digit Dataset
+  <br> **Link**: https://github.com/Jakobovski/free-spoken-digit-dataset
+  <br> **Content**:
+    - 3000 recordings _english_ by 6 voices
+    - 50 recordings per digit per voice
+
+20. Flickr Audio Captions Corpus
+  <br> **Link**: https://groups.csail.mit.edu/sls/downloads/flickraudio/index.cgi
+  <br> **Content**:
+    - 40000 spoken image captions _english_ of 8000 images
+    - download original captions here https://www.kaggle.com/adityajn105/flickr8k