Questions about evaluation results #46

yingyibiao · 2025-03-07T20:31:19Z

Thanks for your work!

I'm curious about the evaluation results you reported about Qwen-2.5-Math-7B-Instruct and hkust-nlp/Qwen-2.5-Math-7B-SimpleRL-Zero:

I use your script to evaluate both Qwen-2.5-Math-7B-Instruct and hkust-nlp/Qwen-2.5-Math-7B-SimpleRL-Zero models and get the following results:

The gaps are pretty large such that it will diminish your improvements of Qwen2.5-7B-SimpleRL-Zero if compared to Qwen-2.5-Math-7B-Instruct.

rabbitjy · 2025-03-13T07:52:10Z

I got similar results

Zeng-WH · 2025-03-13T07:57:04Z

Could you please provide detailed evaluation parameters so we can see where the problem lies?

rabbitjy · 2025-03-13T07:59:34Z

Thanks for your response.

I just used the evaluation script as mentioned in readme, and got different results from notion.

#Qwen2.5-Math-Instruct Series
PROMPT_TYPE="qwen25-math-cot"

#Qwen2.5-Math-7B-Instruct
export CUDA_VISIBLE_DEVICES="0"
MODEL_NAME_OR_PATH="Qwen/Qwen2.5-Math-7B-Instruct"
OUTPUT_DIR="Qwen2.5-Math-7B-Instruct-Math-Eval"
bash sh/eval.sh $PROMPT_TYPE $MODEL_NAME_OR_PATH $OUTPUT_DIR

yingyibiao · 2025-03-13T15:01:38Z

Thanks for your response.

I just used the evaluation script as mentioned in readme, and got different results from notion.

#Qwen2.5-Math-Instruct Series PROMPT_TYPE="qwen25-math-cot"

#Qwen2.5-Math-7B-Instruct export CUDA_VISIBLE_DEVICES="0" MODEL_NAME_OR_PATH="Qwen/Qwen2.5-Math-7B-Instruct" OUTPUT_DIR="Qwen2.5-Math-7B-Instruct-Math-Eval" bash sh/eval.sh $PROMPT_TYPE $MODEL_NAME_OR_PATH $OUTPUT_DIR

I'm using the same script! @Zeng-WH

Zhuofeng-Li · 2025-04-01T23:39:37Z

I got similar results @Zeng-WH

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions about evaluation results #46

Questions about evaluation results #46

yingyibiao commented Mar 7, 2025 •

edited

Loading

rabbitjy commented Mar 13, 2025

Uh oh!

Zeng-WH commented Mar 13, 2025

Uh oh!

rabbitjy commented Mar 13, 2025 •

edited

Loading

Uh oh!

yingyibiao commented Mar 13, 2025

Uh oh!

Zhuofeng-Li commented Apr 1, 2025

Uh oh!

Questions about evaluation results #46

Questions about evaluation results #46

Comments

yingyibiao commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

rabbitjy commented Mar 13, 2025

Uh oh!

Zeng-WH commented Mar 13, 2025

Uh oh!

rabbitjy commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yingyibiao commented Mar 13, 2025

Uh oh!

Zhuofeng-Li commented Apr 1, 2025

Uh oh!

yingyibiao commented Mar 7, 2025 •

edited

Loading

rabbitjy commented Mar 13, 2025 •

edited

Loading