You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: earnings21/README.md
+13-3
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,11 @@
1
1
[](LICENSE.md)
2
2
3
+
# Earnings 21
4
+
5
+
The Earnings 21 dataset ( also referred to as earnings21 ) is a 39-hour corpus of earnings calls containing entity dense speech from nine different financial sectors. This corpus is intended to benchmark automatic speech recognition (ASR) systems in the wild with special attention towards named entity recognition (NER).
6
+
7
+
This work has been recently accepted to Interspeech 2021!
8
+
3
9
# Table of Contents
4
10
5
11
*[File Format Overview](#file-format-overview)
@@ -8,7 +14,8 @@
8
14
+[wer_tag JSON](#wer_tag-json)
9
15
-[Example](#example-wer_tag-json)
10
16
*[Entity Labels](#entity-labels)
11
-
*[Results](#results)
17
+
*[Results and Eval-10](#results)
18
+
+[Eval-10: A Representative Earnings-21 Subset](#eval-10-a-representative-earnings-21-subset)
12
19
*[WER Calculation](#wer-calculation)
13
20
*[Cite this Dataset](#cite-this-dataset)
14
21
@@ -52,7 +59,7 @@ NexGEn|0||||MC|['7:ORG']|['7']
52
59
## wer_tag JSON
53
60
The wer_tags sidecar JSON is used in combination with an nlp file and exclusively when that file is using the wer_tags column. It is used to provide entity information about each entity ID. It is formatted such that the JSON acts as a list of objects that map the ID of an entity to an object specifying the entity_type as the entity label. The object is formatted such that:
54
61
55
-
```
62
+
```json
56
63
"ID":{
57
64
"entity_type" : "LABEL"
58
65
}
@@ -61,7 +68,7 @@ The wer_tags sidecar JSON is used in combination with an nlp file and exclusivel
61
68
### Example wer_tag JSON
62
69
`example.wer_tags.json`
63
70
64
-
```
71
+
```json
65
72
{
66
73
"0":{
67
74
"entity_type" : "YEAR"
@@ -115,6 +122,9 @@ In the following table, we provide a list of all possible entity tags we provide
115
122
# Results
116
123
Tables found in the paper along with all entity class WER can be found within the `transcripts` directory.
117
124
125
+
## Eval-10: A Representative Earnings-21 Subset
126
+
Along with the results found in the paper, we've included a subset denoted as Eval-10 which is a representative 10 hour sample of the full Earnings-21 corpus. This subset is not meant to replace the full dataset but rather allow for researchers to quickly evaluate their systems before running results on the full dataset. WER calculations for all systems on this subset can be found within the same table found in the `transcripts` directory.
127
+
118
128
# WER Calculation
119
129
All of our analysis on this dataset is done through the use of our newly released [fstalign](https://github.com/revdotcom/fstalign/tree/master) tool. We strongly recommend the use of this tool to quickly get started using the *Earnings-21* dataset.
0 commit comments