Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2.5-0.5B-Instruct 8bit量化 推理输出乱码 #3091

Closed
jfduma opened this issue Nov 19, 2024 · 7 comments
Closed

Qwen2.5-0.5B-Instruct 8bit量化 推理输出乱码 #3091

jfduma opened this issue Nov 19, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@jfduma
Copy link

jfduma commented Nov 19, 2024

平台(如果交叉编译请再附上交叉编译目标平台):

Platform(Include target platform as well if cross-compiling):

ubuntu 20.04 cuda

使用最新的3.0 MNN版本导出qwen2.5-0.5b模型,4bit量化正常,8bit量化输出乱码【无论是否修改"precision": "fp16"】。

########### 4bit ############
python mnn/transformers/llm/export/llmexport.py --path pretrained_model/Qwen2.5-0.5B-Instruct --export mnn --dst_path mnn-output/qwen2.5_0.5b_instruct_mnn --quant_bit 4 --mnnconvert mnn/build/MNNConvert

./mnn/build/llm_demo mnn-output/qwen2.5_0.5b_instruct_mnn/config.json
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0
config path is mnn-output/qwen2.5_0.5b_instruct_mnn/config.json
Can't open file:.tempcache
Load Cache file error.

is_single_ = 1

load tokenizer
tokenizer_type = 3
load tokenizer Done
load mnn-output/qwen2.5_0.5b_instruct_mnn/llm.mnn ... Load Module Done!
Clone Decode Module Done!
main, 180, cost time: 2222.191162 ms
Prepare for resize opt Begin
Prepare for resize opt End
Fix: 1070 - Total: 1070, rate = 1.000000
main, 184, cost time: 249.036011 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 188, cost time: 0.010000 ms

Q: hi

A: Hello! How can I assist you today? Is there something specific you would like to know or discuss about anything in particular? I'm here to help answer questions and provide information on various topics. Please feel free to ask me any questions, and I'll do my best to help you.

############# 8bit ################

python mnn/transformers/llm/export/llmexport.py --path pretrained_model/Qwen2.5-0.5B-Instruct --export mnn --dst_path mnn-output/qwen2.5_0.5b_instruct_mnn --quant_bit 8 --mnnconvert mnn/build/MNNConvert

./mnn/build/llm_demo mnn-output/qwen2.5_0.5b_instruct_mnn/config.json 【无论是否修改"precision": "fp16"】

The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0
config path is mnn-output/basemodel_0.5b_instruct_q88_300/config.json
Can't open file:.tempcache
Load Cache file error.

is_single_ = 1

load tokenizer
tokenizer_type = 3
load tokenizer Done
load mnn-output/basemodel_0.5b_instruct_q88_300/llm.mnn ... Load Module Done!
Clone Decode Module Done!
main, 180, cost time: 2159.822021 ms
Prepare for resize opt Begin
Prepare for resize opt End
Fix: 1070 - Total: 1070, rate = 1.000000
main, 184, cost time: 246.123016 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 188, cost time: 0.010000 ms

Q: hi

A: s

p

-ho P.

O

@jxt1234
Copy link
Collaborator

jxt1234 commented Nov 19, 2024

我们查一下看看

@jxt1234
Copy link
Collaborator

jxt1234 commented Nov 19, 2024

先导出 onnx ,再转mnn 的方式 8bit 量化正常么?

@jxt1234 jxt1234 added the bug Something isn't working label Nov 19, 2024
@jxt1234
Copy link
Collaborator

jxt1234 commented Nov 19, 2024

现在不支持 cuda 跑吧,你是用 cpu 跑的?

@jfduma
Copy link
Author

jfduma commented Nov 20, 2024

模型推理是用cpu跑的。编译mnn主工程时添加了MNN_CUDA宏。但是编译mnn的android模块没有添加MNN_CUDA宏。

@jfduma
Copy link
Author

jfduma commented Nov 20, 2024

先导出 onnx ,再转mnn 的方式 8bit 量化正常么?

先导出onnx再转mnn是正常的。命令如下。我创建了config.json 并删除了"llm_weight": "llm.mnn.weight",可以正常运行。

python mnn/transformers/llm/export/llmexport.py --path pretrained_model/Qwen2.5-0.5B-Instruct --export mnn --dst_path mnn-output/qwen2.5_0.5b_instruct_onnx

mnn/build/MNNConvert --modelFile mnn-output/qwen2.5_0.5b_instruct_onnx/onnx/llm.onnx --framework ONNX --MNNModel mnn-output/qwen2.5_0.5b_instruct_onnx/llm.mnn --weightQuantBits 8 --transformerFuse=1 --allowCustomOp

./mnn/build/llm_demo mnn-output/qwen2.5_0.5b_instruct_onnx/config.json

@jxt1234
Copy link
Collaborator

jxt1234 commented Nov 20, 2024

收到,我们排查一下

@jxt1234
Copy link
Collaborator

jxt1234 commented Dec 2, 2024

已经修正

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants