Skip to content

TP + FP8 - NotImplementedError for certain operations #2629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nathan-az opened this issue Apr 23, 2025 · 2 comments · May be fixed by pytorch/ao#2154
Open

TP + FP8 - NotImplementedError for certain operations #2629

nathan-az opened this issue Apr 23, 2025 · 2 comments · May be fixed by pytorch/ao#2154

Comments

@nathan-az
Copy link
Contributor

FP8 training is now supported #2546, but has issues with tensor parallelism which is currently gated. MVP for this feature should include:

  • Plug-and-play support for enable_fp8_training with setting a tensor_parallel_plan
  • Compatibility with torch.compile

This issue is to track support and the request. It's not clear to me the scope of what needs to be done to support this. @andrewor14 feel free to comment if there are other requirements for MVP for this feature, or if you want to clarify the scope.

@andrewor14
Copy link
Contributor

Thanks @nathan-az. Just to clarify torch.compile is already supported. It's just not compatible with tensor parallel yet

@nathan-az nathan-az changed the title Support tensor parallelism with FP8 training TP + FP8 - NotImplementedError for certain operations May 7, 2025
@nathan-az
Copy link
Contributor Author

I've created a separate issue for the FP8 + TP + compile support so these can be tackled separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants