Skip to content

ThanmayJ/multiple-choice-qa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This project was intended to familiarize myself with PyTorch DistributedDataParallel (DDP) and FullyShardedDataParallel (FSDP) training. I've tested it with two GPUs on a single node.

Task

Multiple Choice Question Answering: Given a question, classify the answer from a given set of possible answers.

Dataset(s) used

Hyperparameters

  • Models were trained on RoBERTa-BASE
  • Learning Rate was 3e-6 (using RoBERTaMultipleChoice) and 1e-5 (using RoBERTaSequenceClassification) with the AdamW optimizer
  • Number of Epochs were 5

Results

Results are reported on the model having best valid loss across epochs:

  • Accuracy using RoBERTaSequenceClassification: 85.60 %
  • Accuracy using RoBERTaMultipleChoice: 83.63 %

Note: Due to compute limitations the models were trained with an effective effective batch size of 8 and training time per epoch was 150 minutes. Hence, we can expect much better results on training for more epochs.

Memory Usage

  • Maximum batch size (per GPU) that was obtainable using DDP: 2
  • Maximum batch size (per GPU) that was obtainable using FSDP: 4

References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages