Skip to content

Commit 4bbcd88

Browse files
authored
Earnings-21: Reviewer Feedback (#17)
* improving description and adding eval 10 section * formatting * reordering toc * fixing extra token
1 parent 3db5fa9 commit 4bbcd88

File tree

1 file changed

+13
-3
lines changed

1 file changed

+13
-3
lines changed

earnings21/README.md

+13-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
[![License: CC BY-SA 4.0](https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg)](LICENSE.md)
22

3+
# Earnings 21
4+
5+
The Earnings 21 dataset ( also referred to as earnings21 ) is a 39-hour corpus of earnings calls containing entity dense speech from nine different financial sectors. This corpus is intended to benchmark automatic speech recognition (ASR) systems in the wild with special attention towards named entity recognition (NER).
6+
7+
This work has been recently accepted to Interspeech 2021!
8+
39
# Table of Contents
410

511
* [File Format Overview](#file-format-overview)
@@ -8,7 +14,8 @@
814
+ [wer_tag JSON](#wer_tag-json)
915
- [Example](#example-wer_tag-json)
1016
* [Entity Labels](#entity-labels)
11-
* [Results](#results)
17+
* [Results and Eval-10](#results)
18+
+ [Eval-10: A Representative Earnings-21 Subset](#eval-10-a-representative-earnings-21-subset)
1219
* [WER Calculation](#wer-calculation)
1320
* [Cite this Dataset](#cite-this-dataset)
1421

@@ -52,7 +59,7 @@ NexGEn|0||||MC|['7:ORG']|['7']
5259
## wer_tag JSON
5360
The wer_tags sidecar JSON is used in combination with an nlp file and exclusively when that file is using the wer_tags column. It is used to provide entity information about each entity ID. It is formatted such that the JSON acts as a list of objects that map the ID of an entity to an object specifying the entity_type as the entity label. The object is formatted such that:
5461

55-
```
62+
```json
5663
"ID":{
5764
"entity_type" : "LABEL"
5865
}
@@ -61,7 +68,7 @@ The wer_tags sidecar JSON is used in combination with an nlp file and exclusivel
6168
### Example wer_tag JSON
6269
`example.wer_tags.json`
6370

64-
```
71+
```json
6572
{
6673
"0":{
6774
"entity_type" : "YEAR"
@@ -115,6 +122,9 @@ In the following table, we provide a list of all possible entity tags we provide
115122
# Results
116123
Tables found in the paper along with all entity class WER can be found within the `transcripts` directory.
117124

125+
## Eval-10: A Representative Earnings-21 Subset
126+
Along with the results found in the paper, we've included a subset denoted as Eval-10 which is a representative 10 hour sample of the full Earnings-21 corpus. This subset is not meant to replace the full dataset but rather allow for researchers to quickly evaluate their systems before running results on the full dataset. WER calculations for all systems on this subset can be found within the same table found in the `transcripts` directory.
127+
118128
# WER Calculation
119129
All of our analysis on this dataset is done through the use of our newly released [fstalign](https://github.com/revdotcom/fstalign/tree/master) tool. We strongly recommend the use of this tool to quickly get started using the *Earnings-21* dataset.
120130

0 commit comments

Comments
 (0)