We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
利用Qwen2-VL微调模型,发现如下问题: (1)单机多卡训练图文对或者纯文本,不管是lora或者全量,成功 (2)多机多卡训练图文对或者纯文本,不管是lora或者全量,成功 (3)单机多卡训练混合数据,lora 7b成功 (4)单机多卡训练混合数据,全量微调7b zero3+offload 不成功 (5)多机多卡训练混合数据, lora 不成功 (6)多机多卡训练混合数据,全量微调 zero3+offload,不成功
不成功的情况下是刚开始训练就卡死
另外,由于每张卡的显存是32G,Zero2训不起来,所以只能用Zero3训练了
...
No response
The text was updated successfully, but these errors were encountered:
目前混合数据不支持 zero3
Sorry, something went wrong.
那请教下,如果72B的模型单卡显存不够怎么办?用zero2会OOM吧,无法完整加载一个模型@hiyouga
fixed
Successfully merging a pull request may close this issue.
Reminder
System Info
利用Qwen2-VL微调模型,发现如下问题:
(1)单机多卡训练图文对或者纯文本,不管是lora或者全量,成功
(2)多机多卡训练图文对或者纯文本,不管是lora或者全量,成功
(3)单机多卡训练混合数据,lora 7b成功
(4)单机多卡训练混合数据,全量微调7b zero3+offload 不成功
(5)多机多卡训练混合数据, lora 不成功
(6)多机多卡训练混合数据,全量微调 zero3+offload,不成功
不成功的情况下是刚开始训练就卡死
另外,由于每张卡的显存是32G,Zero2训不起来,所以只能用Zero3训练了
Reproduction
...
Expected behavior
No response
Others
No response
The text was updated successfully, but these errors were encountered: