Skip to content

Questions about evaluation results #46

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yingyibiao opened this issue Mar 7, 2025 · 5 comments
Open

Questions about evaluation results #46

yingyibiao opened this issue Mar 7, 2025 · 5 comments

Comments

@yingyibiao
Copy link

yingyibiao commented Mar 7, 2025

Thanks for your work!

I'm curious about the evaluation results you reported about Qwen-2.5-Math-7B-Instruct and hkust-nlp/Qwen-2.5-Math-7B-SimpleRL-Zero:

Image

I use your script to evaluate both Qwen-2.5-Math-7B-Instruct and hkust-nlp/Qwen-2.5-Math-7B-SimpleRL-Zero models and get the following results:
Image

The gaps are pretty large such that it will diminish your improvements of Qwen2.5-7B-SimpleRL-Zero if compared to Qwen-2.5-Math-7B-Instruct.

@rabbitjy
Copy link

I got similar results

@Zeng-WH
Copy link
Collaborator

Zeng-WH commented Mar 13, 2025

Could you please provide detailed evaluation parameters so we can see where the problem lies?

@rabbitjy
Copy link

rabbitjy commented Mar 13, 2025

Thanks for your response.

I just used the evaluation script as mentioned in readme, and got different results from notion.


#Qwen2.5-Math-Instruct Series
PROMPT_TYPE="qwen25-math-cot"

#Qwen2.5-Math-7B-Instruct
export CUDA_VISIBLE_DEVICES="0"
MODEL_NAME_OR_PATH="Qwen/Qwen2.5-Math-7B-Instruct"
OUTPUT_DIR="Qwen2.5-Math-7B-Instruct-Math-Eval"
bash sh/eval.sh $PROMPT_TYPE $MODEL_NAME_OR_PATH $OUTPUT_DIR

@yingyibiao
Copy link
Author

Thanks for your response.

I just used the evaluation script as mentioned in readme, and got different results from notion.

#Qwen2.5-Math-Instruct Series PROMPT_TYPE="qwen25-math-cot"

#Qwen2.5-Math-7B-Instruct export CUDA_VISIBLE_DEVICES="0" MODEL_NAME_OR_PATH="Qwen/Qwen2.5-Math-7B-Instruct" OUTPUT_DIR="Qwen2.5-Math-7B-Instruct-Math-Eval" bash sh/eval.sh $PROMPT_TYPE $MODEL_NAME_OR_PATH $OUTPUT_DIR

I'm using the same script! @Zeng-WH

@Zhuofeng-Li
Copy link

I got similar results @Zeng-WH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants