Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sharegpt dataset convert error #6878

Closed
1 task done
JJJYmmm opened this issue Feb 10, 2025 · 0 comments · Fixed by #6879
Closed
1 task done

sharegpt dataset convert error #6878

JJJYmmm opened this issue Feb 10, 2025 · 0 comments · Fixed by #6879
Labels
solved This problem has been already solved

Comments

@JJJYmmm
Copy link
Contributor

JJJYmmm commented Feb 10, 2025

Reminder

  • I have read the above rules and searched the existing issues.

System Info

  • llamafactory version: 0.9.2.dev0
  • Platform: Linux-5.15.0-131-generic-x86_64-with-glibc2.35
  • Python version: 3.10.16
  • PyTorch version: 2.6.0+cu126 (GPU)
  • Transformers version: 4.49.0.dev0
  • Datasets version: 2.21.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA H100 80GB HBM3
  • GPU number: 8
  • GPU memory: 79.19GB

Reproduction

Just use sharegpt_hyper dataset would cause the error.

[rank0]:   File "/home/xxx/LLaMA-Factory/src/llamafactory/data/aligner.py", line 15
3, in convert_sharegpt                                                                  
[rank0]:     {"role": tag_mapping[message[dataset_attr.role_tag]], "content": message[da
taset_attr.content_tag]}                                                                
[rank0]: KeyError: 'user'

code here, when broken_data = True, it doesn't break and cause key error finally.

Others

No response

@JJJYmmm JJJYmmm added bug Something isn't working pending This problem is yet to be addressed labels Feb 10, 2025
@JJJYmmm JJJYmmm mentioned this issue Feb 10, 2025
2 tasks
@hiyouga hiyouga added solved This problem has been already solved and removed bug Something isn't working pending This problem is yet to be addressed labels Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants