Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

经过sft或者dpo后,合并lora后导出的gguf,在ollama上回答效果不佳 #6020

Closed
1 task done
NeilL0412 opened this issue Nov 13, 2024 · 7 comments · Fixed by #6899
Closed
1 task done
Labels
good first issue Good for newcomers solved This problem has been already solved

Comments

@NeilL0412
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.1.dev0
  • Platform: Linux-5.15.0-122-generic-x86_64-with-glibc2.35
  • Python version: 3.11.0
  • PyTorch version: 2.4.1+cu121 (GPU)
  • Transformers version: 4.45.2
  • Datasets version: 2.21.0
  • Accelerate version: 0.34.2
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: GRID A100X-40C

Reproduction

使用的模型是Qwen/Qwen2.5-1.5B-Instruct

  • sft
    经过了6次sft微调,每次epoch 30次,合并lora模型后的回答效果还可以,但是转换成gguf后的回答效果差很多。
    以下是合并后的lora模型:
    微信图片_20241113232052

以下是在ollama上运行,导出的gguf:
微信图片_20241113231955

微调的参数大概是这样的:

llamafactory-cli train \
    --stage sft \
    --do_train True \
    --model_name_or_path Qwen/Qwen2.5-1.5B-Instruct \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template qwen \
    --flash_attn auto \
    --dataset_dir data \
    --dataset it_qa \
    --cutoff_len 1024 \
    --learning_rate 5e-05 \
    --num_train_epochs 30.0 \
    --max_samples 100000 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 20 \
    --optim adamw_torch \
    --packing False \
    --mask_history True \
    --report_to none \
    --output_dir saves/Qwen2.5-1.5B-Instruct/lora/train_2024-11-13-23-25-43 \
    --bf16 True \
    --plot_loss True \
    --ddp_timeout 180000000 \
    --include_num_input_tokens_seen True \
    --adapter_name_or_path saves/Qwen2.5-1.5B-Instruct/lora/Qwen2.5-1.5b-FT5 \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0 \
    --lora_target all

合并lora的参数:

### Note: DO NOT use quantized model or quantization_bit when merging lora adapters

### model
model_name_or_path: Qwen/Qwen2.5-1.5B-Instruct
adapter_name_or_path: /app/neil/llm/LLaMA-Factory/saves/Qwen2.5-1.5B-Instruct/lora/Qwen2.5-1.5b-FT6
template: qwen
finetuning_type: lora

### export
export_dir: mymodel/qwen2.5/qwen2.5-1.5b/FT6/lora
export_size: 2
export_device: gpu
export_legacy_format: false
  • dpo
    经过前6次sft微调,第7次进行dpo强化学习
    以下是合并后lora模型的回答:
    微信图片_20241113232052

以下是在ollama上运行的gguf,总是像以下这种问答选项的这种形式:
微信图片_20241113235131

以下是合并lora参数:

llamafactory-cli train \
    --stage dpo \
    --do_train True \
    --model_name_or_path Qwen/Qwen2.5-1.5B-Instruct \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template qwen \
    --flash_attn auto \
    --dataset_dir data \
    --dataset it_qa_dpo \
    --cutoff_len 1024 \
    --learning_rate 5e-05 \
    --num_train_epochs 30.0 \
    --max_samples 100000 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 20 \
    --optim adamw_torch \
    --packing False \
    --report_to none \
    --output_dir saves/Qwen2.5-1.5B-Instruct/lora/train_2024-11-13-23-25-43 \
    --bf16 True \
    --plot_loss True \
    --ddp_timeout 180000000 \
    --include_num_input_tokens_seen True \
    --adapter_name_or_path saves/Qwen2.5-1.5B-Instruct/lora/Qwen2.5-1.5b-FT6 \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0 \
    --lora_target all \
    --pref_beta 0.1 \
    --pref_ftx 0 \
    --pref_loss sigmoid

导出的参数与sft差不多,这里就不贴了

还请各位大佬帮忙看一下是什么情况,还请各位帮忙说一下可能出现这种问题的原因和解决方法都有哪些,谢谢

Expected behavior

No response

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Nov 13, 2024
@qq1273834091
Copy link

有同样的问题不知道是不是自己操作的问题?vllm推理的时候是没有问题的 生成gguf文件后再倒入到ollama中回答效果总是有问题

@xjtupy
Copy link

xjtupy commented Jan 14, 2025

有同样的问题不知道是不是自己操作的问题?vllm推理的时候是没有问题的 生成gguf文件后再倒入到ollama中回答效果总是有问题

遇到同样的问题,有解决嘛

@NeilL0412
Copy link
Author

有同样的问题不知道是不是自己操作的问题?vllm推理的时候是没有问题的 生成gguf文件后再倒入到ollama中回答效果总是有问题

遇到同样的问题,有解决嘛

找到问题啦,是template的问题,你在微调时候用的什么template,在ollama 配置Modelfile的时候就要写什么样的template,我是微调的qwen,你们参考下(注释里面的东西我也没仔细看)。

FROM ./Qwen2.5-3B-Instruct-F16.gguf

TEMPLATE """
{{- if .Messages }}
{{- if or .System .Tools }}<|im_start|>system
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end }}<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}{{ if not $last }}<|im_end|>
{{ end }}
{{- else if eq .Role "tool" }}<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{ end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}
{{- else }}
{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}
"""

### Tuning ##

PARAMETER num_ctx 32768

### min_p sampling ##
# min_p works best with a bit of temperature
PARAMETER temperature 0.2
# 1.0 disables top_p, so we can use min_p
PARAMETER top_p 1.0
PARAMETER min_p 0.9
### min_p sampling ##

PARAMETER num_batch 1024
PARAMETER num_keep 256

#  64 fits RYS-XLarge-72b IQ4_XS at 21k
#  PARAMETER num_batch 64

## For codegen ##
#  PARAMETER num_keep 512
#  PARAMETER num_keep 1024
#  PARAMETER top_p 0.9 # default
#  PARAMETER top_k 20 # default
#  PARAMETER repetition_penalty 1.05 # default
#  PARAMETER presence_penalty 0.2
#  PARAMETER frequency_penalty 0.2
#  PARAMETER repeat_last_n 50

# VRAM increased by:
# num_batch
# num_ctx

@xjtupy
Copy link

xjtupy commented Jan 16, 2025

有同样的问题不知道是不是自己操作的问题?vllm推理的时候是没有问题的 生成gguf文件后再倒入到ollama中回答效果总是有问题

遇到同样的问题,有解决嘛

找到问题啦,是template的问题,你在微调时候用的什么template,在ollama 配置Modelfile的时候就要写什么样的template,我是微调的qwen,你们参考下(注释里面的东西我也没仔细看)。

FROM ./Qwen2.5-3B-Instruct-F16.gguf

TEMPLATE """
{{- if .Messages }}
{{- if or .System .Tools }}<|im_start|>system
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end }}<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}{{ if not $last }}<|im_end|>
{{ end }}
{{- else if eq .Role "tool" }}<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{ end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}
{{- else }}
{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}
"""

### Tuning ##

PARAMETER num_ctx 32768

### min_p sampling ##
# min_p works best with a bit of temperature
PARAMETER temperature 0.2
# 1.0 disables top_p, so we can use min_p
PARAMETER top_p 1.0
PARAMETER min_p 0.9
### min_p sampling ##

PARAMETER num_batch 1024
PARAMETER num_keep 256

#  64 fits RYS-XLarge-72b IQ4_XS at 21k
#  PARAMETER num_batch 64

## For codegen ##
#  PARAMETER num_keep 512
#  PARAMETER num_keep 1024
#  PARAMETER top_p 0.9 # default
#  PARAMETER top_k 20 # default
#  PARAMETER repetition_penalty 1.05 # default
#  PARAMETER presence_penalty 0.2
#  PARAMETER frequency_penalty 0.2
#  PARAMETER repeat_last_n 50

# VRAM increased by:
# num_batch
# num_ctx

我微调的Meta-Llama-3.1-8B-Instruct,用的llama3的模板,那Modelfile文件是直接复制llama3的模版就行,是吧?

@hiyouga hiyouga added good first issue Good for newcomers solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jan 16, 2025
@NeilL0412
Copy link
Author

有同样的问题不知道是不是自己操作的问题?vllm推理的时候是没有问题的 生成gguf文件后再倒入到ollama中回答效果总是有问题

遇到同样的问题,有解决嘛

找到问题啦,是template的问题,你在微调时候用的什么template,在ollama 配置Modelfile的时候就要写什么样的template,我是微调的qwen,你们参考下(注释里面的东西我也没仔细看)。

FROM ./Qwen2.5-3B-Instruct-F16.gguf

TEMPLATE """
{{- if .Messages }}
{{- if or .System .Tools }}<|im_start|>system
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end }}<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}{{ if not $last }}<|im_end|>
{{ end }}
{{- else if eq .Role "tool" }}<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{ end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}
{{- else }}
{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}
"""

### Tuning ##

PARAMETER num_ctx 32768

### min_p sampling ##
# min_p works best with a bit of temperature
PARAMETER temperature 0.2
# 1.0 disables top_p, so we can use min_p
PARAMETER top_p 1.0
PARAMETER min_p 0.9
### min_p sampling ##

PARAMETER num_batch 1024
PARAMETER num_keep 256

#  64 fits RYS-XLarge-72b IQ4_XS at 21k
#  PARAMETER num_batch 64

## For codegen ##
#  PARAMETER num_keep 512
#  PARAMETER num_keep 1024
#  PARAMETER top_p 0.9 # default
#  PARAMETER top_k 20 # default
#  PARAMETER repetition_penalty 1.05 # default
#  PARAMETER presence_penalty 0.2
#  PARAMETER frequency_penalty 0.2
#  PARAMETER repeat_last_n 50

# VRAM increased by:
# num_batch
# num_ctx

我微调的Meta-Llama-3.1-8B-Instruct,用的llama3的模板,那Modelfile文件是直接复制llama3的模版就行,是吧?

都可以,只要是对应的模板就行,多试试,去源码template里面找对应模板代码也试试,也可以参考这位大佬的llm-templates

@xjtupy
Copy link

xjtupy commented Jan 16, 2025

有同样的问题不知道是不是自己操作的问题?vllm推理的时候是没有问题的 生成gguf文件后再倒入到ollama中回答效果总是有问题

遇到同样的问题,有解决嘛

找到问题啦,是template的问题,你在微调时候用的什么template,在ollama 配置Modelfile的时候就要写什么样的template,我是微调的qwen,你们参考下(注释里面的东西我也没仔细看)。

FROM ./Qwen2.5-3B-Instruct-F16.gguf

TEMPLATE """
{{- if .Messages }}
{{- if or .System .Tools }}<|im_start|>system
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end }}<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}{{ if not $last }}<|im_end|>
{{ end }}
{{- else if eq .Role "tool" }}<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{ end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}
{{- else }}
{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}
"""

### Tuning ##

PARAMETER num_ctx 32768

### min_p sampling ##
# min_p works best with a bit of temperature
PARAMETER temperature 0.2
# 1.0 disables top_p, so we can use min_p
PARAMETER top_p 1.0
PARAMETER min_p 0.9
### min_p sampling ##

PARAMETER num_batch 1024
PARAMETER num_keep 256

#  64 fits RYS-XLarge-72b IQ4_XS at 21k
#  PARAMETER num_batch 64

## For codegen ##
#  PARAMETER num_keep 512
#  PARAMETER num_keep 1024
#  PARAMETER top_p 0.9 # default
#  PARAMETER top_k 20 # default
#  PARAMETER repetition_penalty 1.05 # default
#  PARAMETER presence_penalty 0.2
#  PARAMETER frequency_penalty 0.2
#  PARAMETER repeat_last_n 50

# VRAM increased by:
# num_batch
# num_ctx

我微调的Meta-Llama-3.1-8B-Instruct,用的llama3的模板,那Modelfile文件是直接复制llama3的模版就行,是吧?

都可以,只要是对应的模板就行,多试试,去源码template里面找对应模板代码也试试,也可以参考这位大佬的llm-templates

试了几个都达不到LLama-Factory官方代码预测结果的效果,你知道官方的llama3模版是怎么定义的嘛 @NeilL0412 @hiyouga

@NeilL0412
Copy link
Author

有同样的问题不知道是不是自己操作的问题?vllm推理的时候是没有问题的 生成gguf文件后再倒入到ollama中回答效果总是有问题

遇到同样的问题,有解决嘛

找到问题啦,是template的问题,你在微调时候用的什么template,在ollama 配置Modelfile的时候就要写什么样的template,我是微调的qwen,你们参考下(注释里面的东西我也没仔细看)。

FROM ./Qwen2.5-3B-Instruct-F16.gguf

TEMPLATE """
{{- if .Messages }}
{{- if or .System .Tools }}<|im_start|>system
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end }}<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}{{ if not $last }}<|im_end|>
{{ end }}
{{- else if eq .Role "tool" }}<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{ end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}
{{- else }}
{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}
"""

### Tuning ##

PARAMETER num_ctx 32768

### min_p sampling ##
# min_p works best with a bit of temperature
PARAMETER temperature 0.2
# 1.0 disables top_p, so we can use min_p
PARAMETER top_p 1.0
PARAMETER min_p 0.9
### min_p sampling ##

PARAMETER num_batch 1024
PARAMETER num_keep 256

#  64 fits RYS-XLarge-72b IQ4_XS at 21k
#  PARAMETER num_batch 64

## For codegen ##
#  PARAMETER num_keep 512
#  PARAMETER num_keep 1024
#  PARAMETER top_p 0.9 # default
#  PARAMETER top_k 20 # default
#  PARAMETER repetition_penalty 1.05 # default
#  PARAMETER presence_penalty 0.2
#  PARAMETER frequency_penalty 0.2
#  PARAMETER repeat_last_n 50

# VRAM increased by:
# num_batch
# num_ctx

我微调的Meta-Llama-3.1-8B-Instruct,用的llama3的模板,那Modelfile文件是直接复制llama3的模版就行,是吧?

都可以,只要是对应的模板就行,多试试,去源码template里面找对应模板代码也试试,也可以参考这位大佬的llm-templates

试了几个都达不到LLama-Factory官方代码预测结果的效果,你知道官方的llama3模版是怎么定义的嘛 @NeilL0412 @hiyouga

去ollama library找,或者你微调时用的什么template,那就去源码template.py找对应的代码,转成template格式。如果还是不行那我也没办法了,好好去看文档吧……

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers solved This problem has been already solved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants