Skip to content

Can't run finetuning script (wrong paths?) #31

Open
@th789

Description

@th789

Hello Meditron team,

Thank you so much for sharing your work! I'd like to follow your instructions to fine-tune the meditron model, but I get an error (potentially due to wrong paths). Specifically, I perform the following:

  1. Navigate in the meditron folder: cd path/meditron
  2. Run the script: python finetuning/sft.py --checkpoint=meditron --size=7 --run_name=pubmedqa --data bigbio/pubmedqa

But, I get the following error:

python finetuning/sft.py --checkpoint=meditron --size=7 --run_name=pubmedqa --data bigbio/pubmedqa
Tokenizing data!
Traceback (most recent call last):
  File "/n/home07/than157/desktop/llm-med/meditron/Megatron-LLM/tools/preprocess_instruct_data.py", line 28, in <module>
    from megatron.tokenizer import build_tokenizer
ModuleNotFoundError: No module named 'megatron.tokenizer'
Traceback (most recent call last):
  File "/n/home07/than157/desktop/llm-med/meditron/finetuning/sft.py", line 268, in <module>
    main(args)
  File "/n/home07/than157/desktop/llm-med/meditron/finetuning/sft.py", line 206, in main
    data_prefix = tokenize_data(
  File "/n/home07/than157/desktop/llm-med/meditron/finetuning/sft.py", line 85, in tokenize_data
    execute(cmd)
  File "/n/home07/than157/desktop/llm-med/meditron/finetuning/sft.py", line 41, in execute
    assert proc.wait() == 0
AssertionError

I've spent hours trying to figure out the right paths, but to no avail. I would be so grateful if you could help me with the following so I can run your script:

  1. How to fix the error above?
  2. How should I set CHECKPOINTS in sft.py to finetune the meditron-7b model that I downloaded from huggingface?

Thank you very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions