Qwen2.5-0.5B-Instruct 8bit量化推理输出乱码 #3091

jfduma · 2024-11-19T12:21:47Z

平台(如果交叉编译请再附上交叉编译目标平台):

Platform(Include target platform as well if cross-compiling):

ubuntu 20.04 cuda

使用最新的3.0 MNN版本导出qwen2.5-0.5b模型，4bit量化正常，8bit量化输出乱码【无论是否修改"precision": "fp16"】。

########### 4bit ############
python mnn/transformers/llm/export/llmexport.py --path pretrained_model/Qwen2.5-0.5B-Instruct --export mnn --dst_path mnn-output/qwen2.5_0.5b_instruct_mnn --quant_bit 4 --mnnconvert mnn/build/MNNConvert

./mnn/build/llm_demo mnn-output/qwen2.5_0.5b_instruct_mnn/config.json
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0
config path is mnn-output/qwen2.5_0.5b_instruct_mnn/config.json
Can't open file:.tempcache
Load Cache file error.

is_single_ = 1

load tokenizer
tokenizer_type = 3
load tokenizer Done
load mnn-output/qwen2.5_0.5b_instruct_mnn/llm.mnn ... Load Module Done!
Clone Decode Module Done!
main, 180, cost time: 2222.191162 ms
Prepare for resize opt Begin
Prepare for resize opt End
Fix: 1070 - Total: 1070, rate = 1.000000
main, 184, cost time: 249.036011 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 188, cost time: 0.010000 ms

Q: hi

A: Hello! How can I assist you today? Is there something specific you would like to know or discuss about anything in particular? I'm here to help answer questions and provide information on various topics. Please feel free to ask me any questions, and I'll do my best to help you.

############# 8bit ################

python mnn/transformers/llm/export/llmexport.py --path pretrained_model/Qwen2.5-0.5B-Instruct --export mnn --dst_path mnn-output/qwen2.5_0.5b_instruct_mnn --quant_bit 8 --mnnconvert mnn/build/MNNConvert

./mnn/build/llm_demo mnn-output/qwen2.5_0.5b_instruct_mnn/config.json 【无论是否修改"precision": "fp16"】

The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0
config path is mnn-output/basemodel_0.5b_instruct_q88_300/config.json
Can't open file:.tempcache
Load Cache file error.

is_single_ = 1

load tokenizer
tokenizer_type = 3
load tokenizer Done
load mnn-output/basemodel_0.5b_instruct_q88_300/llm.mnn ... Load Module Done!
Clone Decode Module Done!
main, 180, cost time: 2159.822021 ms
Prepare for resize opt Begin
Prepare for resize opt End
Fix: 1070 - Total: 1070, rate = 1.000000
main, 184, cost time: 246.123016 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 188, cost time: 0.010000 ms

Q: hi

A: s

p

-ho P.

O

jxt1234 · 2024-11-19T13:03:01Z

我们查一下看看

jxt1234 · 2024-11-19T13:03:21Z

先导出 onnx ，再转mnn 的方式 8bit 量化正常么？

jxt1234 · 2024-11-19T13:04:02Z

现在不支持 cuda 跑吧，你是用 cpu 跑的？

jfduma · 2024-11-20T02:50:38Z

模型推理是用cpu跑的。编译mnn主工程时添加了MNN_CUDA宏。但是编译mnn的android模块没有添加MNN_CUDA宏。

jfduma · 2024-11-20T03:28:39Z

先导出 onnx ，再转mnn 的方式 8bit 量化正常么？

先导出onnx再转mnn是正常的。命令如下。我创建了config.json 并删除了"llm_weight": "llm.mnn.weight"，可以正常运行。

python mnn/transformers/llm/export/llmexport.py --path pretrained_model/Qwen2.5-0.5B-Instruct --export mnn --dst_path mnn-output/qwen2.5_0.5b_instruct_onnx

mnn/build/MNNConvert --modelFile mnn-output/qwen2.5_0.5b_instruct_onnx/onnx/llm.onnx --framework ONNX --MNNModel mnn-output/qwen2.5_0.5b_instruct_onnx/llm.mnn --weightQuantBits 8 --transformerFuse=1 --allowCustomOp

./mnn/build/llm_demo mnn-output/qwen2.5_0.5b_instruct_onnx/config.json

jxt1234 · 2024-11-20T07:13:02Z

收到，我们排查一下

jxt1234 · 2024-12-02T06:19:16Z

已经修正

jxt1234 added the bug Something isn't working label Nov 19, 2024

jxt1234 mentioned this issue Dec 2, 2024

MNN:Sync: Sync Internal 3.0.1 #3107

Merged

jxt1234 closed this as completed Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2.5-0.5B-Instruct 8bit量化推理输出乱码 #3091

Qwen2.5-0.5B-Instruct 8bit量化推理输出乱码 #3091

jfduma commented Nov 19, 2024

jxt1234 commented Nov 19, 2024

jxt1234 commented Nov 19, 2024

jxt1234 commented Nov 19, 2024

jfduma commented Nov 20, 2024

jfduma commented Nov 20, 2024 •

edited

Loading

jxt1234 commented Nov 20, 2024

jxt1234 commented Dec 2, 2024

Qwen2.5-0.5B-Instruct 8bit量化 推理输出乱码 #3091

Qwen2.5-0.5B-Instruct 8bit量化 推理输出乱码 #3091

Comments

jfduma commented Nov 19, 2024

平台(如果交叉编译请再附上交叉编译目标平台):

Platform(Include target platform as well if cross-compiling):

is_single_ = 1

is_single_ = 1

jxt1234 commented Nov 19, 2024

jxt1234 commented Nov 19, 2024

jxt1234 commented Nov 19, 2024

jfduma commented Nov 20, 2024

jfduma commented Nov 20, 2024 • edited Loading

jxt1234 commented Nov 20, 2024

jxt1234 commented Dec 2, 2024

Qwen2.5-0.5B-Instruct 8bit量化推理输出乱码 #3091

Qwen2.5-0.5B-Instruct 8bit量化推理输出乱码 #3091

jfduma commented Nov 20, 2024 •

edited

Loading