GitHub - ThanmayJ/multiple-choice-qa

This project was intended to familiarize myself with PyTorch DistributedDataParallel (DDP) and FullyShardedDataParallel (FSDP) training. I've tested it with two GPUs on a single node.

Task

Multiple Choice Question Answering: Given a question, classify the answer from a given set of possible answers.

Dataset(s) used

SWAG

Hyperparameters

Models were trained on RoBERTa-BASE
Learning Rate was 3e-6 (using RoBERTaMultipleChoice) and 1e-5 (using RoBERTaSequenceClassification) with the AdamW optimizer
Number of Epochs were 5

Results

Results are reported on the model having best valid loss across epochs:

Accuracy using RoBERTaSequenceClassification: 85.60 %
Accuracy using RoBERTaMultipleChoice: 83.63 %

Note: Due to compute limitations the models were trained with an effective effective batch size of 8 and training time per epoch was 150 minutes. Hence, we can expect much better results on training for more epochs.

Memory Usage

Maximum batch size (per GPU) that was obtainable using DDP: 2
Maximum batch size (per GPU) that was obtainable using FSDP: 4

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md
mcqa.py		mcqa.py
plot.png		plot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Task

Dataset(s) used

Hyperparameters

Results

Memory Usage

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ThanmayJ/multiple-choice-qa

Folders and files

Latest commit

History

Repository files navigation

Task

Dataset(s) used

Hyperparameters

Results

Memory Usage

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages