Name	Name	Last commit message	Last commit date
parent directory ..
code	code	Update run-multi.sh	Sep 29, 2020
data	data	move into 4 folders	Sep 9, 2020
evaluator	evaluator	move into 4 folders	Sep 9, 2020
README.MD	README.MD	Update README.MD	Sep 29, 2020

CodeXGLUE -- Text-to-Text

Task Definition

Translate code documentation between human languages. Models are evaluated by BLEU score.

Dataset

The dataset we use is crawled and filtered from Microsoft Documentation, whose document located at https://github.com/MicrosoftDocs/.

Data Statistics

The source language and target language are stored in two files, one line corresponding to one line. Data statistics of Microsoft multi-lingual documentation dataset are shown in the below table:

Language Pairs	#Train	#Dev	#Test
Danish <-> English	43K	1K	1K
Latvian <-> English	19K	1K	1K
Norwegian <-> English	44K	1K	1K
Chinese <-> English	50K	1K	1K

Data Preprocess

Preprocess dataset (add language sign at the begin of source language and combine all language pairs):

cd data
python preprocessing.py

Evaluator

We provide a script to evaluate predictions for this task, and report BLEU score

Example

python evaluator/evaluator.py  evaluator/output.txt -p evaluator/gold.txt

{bleu-4: 67.75}

Pipeline

We also provide a pipeline that use Transformer architecture or fine-tunes XLM-Roberta (--using_pretrain_model) on this task.

Dependency

python 3.6 or 3.7
torch==1.6.0
transformers>=2.5.0

Fine-tune

lr=5e-5
batch_size=32
beam_size=5
source_length=256
target_length=256
data_dir=../data/processed
output_dir=saved_models/multi_model
train_file=$data_dir/train.all.src,$data_dir/train.all.tgt
dev_file=$data_dir/dev.all.src,$data_dir/dev.all.tgt
eval_steps=5000 
train_steps=50000 
pretrained_model=xlm-roberta-base #CodeBERT: path to CodeBERT. Roberta: roberta-base


CUDA_VISIBLE_DEVICES=0,1 python run.py \
--do_train \
--do_eval \
--using_pretrain_model \
--model_type roberta \
--model_name_or_path $pretrained_model \
--config_name xlm-roberta-base \
--tokenizer_name xlm-roberta-base \
--train_filename $train_file \
--dev_filename $dev_file \
--output_dir $output_dir \
--max_source_length $source_length \
--max_target_length $target_length \
--beam_size $beam_size \
--train_batch_size $batch_size \
--eval_batch_size $batch_size \
--learning_rate $lr \
--train_steps $train_steps \
--eval_steps $eval_steps

Evaluation

beam_size=5
batch_size=32
source_length=256
target_length=256
output_dir=saved_models/multi_model
data_dir=../data/processed
dev_file=$data_dir/dev.all.src,$data_dir/dev.all.tgt
test_file=$data_dir/test.all.src,$data_dir/test.all.tgt
test_model=$output_dir/checkpoint-best-bleu/pytorch_model.bin #checkpoint for test

CUDA_VISIBLE_DEVICES=0,1 python run.py \
--do_test \
--model_type roberta \
--model_name_or_path xlm-roberta-base \
--config_name xlm-roberta-base \
--tokenizer_name xlm-roberta-base  \
--load_model_path $test_model \
--dev_filename $dev_file \
--test_filename $test_file \
--output_dir $output_dir \
--max_source_length $source_length \
--max_target_length $target_length \
--beam_size $beam_size \
--eval_batch_size $batch_size

Result

The results on multi-lingual test dataset are shown as below:

Translation Direction	Transformer	Pretrained Transformer
English -> Danish	53.31	67.09
English -> Latvian	37.85	51.92
English -> Norwegian	53.84	68.00
English -> Chinese	59.90	70.60
Danish -> English	58.73	67.02
Latvian -> English	50.37	68.30
Norwegian -> English	57.33	71.84
Chinese -> English	50.00	64.47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

text-to-text

text-to-text

README.MD

CodeXGLUE -- Text-to-Text

Task Definition

Dataset

Data Statistics

Data Preprocess

Evaluator

Example

Pipeline

Dependency

Fine-tune

Evaluation

Result

Files

text-to-text

Directory actions

More options

Directory actions

More options

Latest commit

History

text-to-text

Folders and files

parent directory

README.MD

CodeXGLUE -- Text-to-Text

Task Definition

Dataset

Data Statistics

Data Preprocess

Evaluator

Example

Pipeline

Dependency

Fine-tune

Evaluation

Result