Skip to content

[Wan LoRAs] make T2V LoRAs compatible with Wan I2V #11107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Mar 19, 2025

Conversation

linoytsaban
Copy link
Collaborator

@linoytsaban linoytsaban commented Mar 18, 2025

This PR was authored by @hlky to make T2V Wan LoRAs compatible with Wan I2V 🙌🏻

@linoytsaban linoytsaban requested a review from hlky March 18, 2025 13:21
@hlky
Copy link
Contributor

hlky commented Mar 18, 2025

@bot /style

Copy link
Contributor

Style fixes have been applied. View the workflow run here.

@linoytsaban linoytsaban marked this pull request as ready for review March 18, 2025 13:25
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@hlky hlky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @linoytsaban. I've left some comments with changes we discussed here https://huggingface.slack.com/archives/D08GXKYKGAK/p1742301801696229

@hlky hlky requested a review from sayakpaul March 19, 2025 11:17
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left some comments and thanks for catching the nasty key names.

Let's see some results with this?

num_blocks = len({k.split("blocks.")[1].split(".")[0] for k in state_dict})
is_i2v_lora = any("k_img" in k for k in state_dict) and any("v_img" in k for k in state_dict)
if not is_i2v_lora:
return state_dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't perform any extra operation if it's is_i2v_lora?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for loading T2V lora into I2V model, so if it's already I2V lora we return the state dict as-is.

Comment on lines 4264 to 4265
if transformer.config.image_dim is None:
return state_dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be moved out at the top of this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be slightly faster than checking the keys first, this is checking whether the transformer is I2V. T2V transformer config has image_dim as None.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be not None tho no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T2V has transformer.config.image_dim = None

We have T2V loaded -> we are loading T2V lora -> if transformer.config.image_dim is None -> no state dict modification required -> return as-is

We have I2V loaded -> we are loading T2V lora -> if transformer.config.image_dim is None does not match as image_dim is not None -> continue with state dict modifications

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're absolutely right🙌🏻

@hlky
Copy link
Contributor

hlky commented Mar 19, 2025

output.29.mp4

Wan-AI/Wan2.1-I2V-14B-720P-Diffusers with https://huggingface.co/benjamin-paine/steamboat-willie-14b

It's experimental, we suspect style does not transfer well and will try a motion-oriented lora, however results are comparable to replicate which recently added support for this

@linoytsaban linoytsaban requested review from hlky and sayakpaul March 19, 2025 13:52
Co-authored-by: hlky <hlky@hlky.ac>
@linoytsaban
Copy link
Collaborator Author

@bot /style

@hlky
Copy link
Contributor

hlky commented Mar 19, 2025

@bot /style

@hlky
Copy link
Contributor

hlky commented Mar 19, 2025

https://github.com/huggingface/diffusers/actions/runs/13948751559/job/39042360914
You don't have admin permissions for some reason @linoytsaban

Copy link
Contributor

Style fixes have been applied. View the workflow run here.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sleek!

I would suggest adding a snippet about this in our Wan docs.

Should be good to merge after that!

@linoytsaban
Copy link
Collaborator Author

I would suggest adding a snippet about this in our Wan docs.

@sayakpaul Yeah we should definitely have some documentation for it, but I'm thinking maybe in a separate PR to add it somewhere else? I think for other pipelines we didnt include lora examples in the pipeline page so maybe makes more sense as part of LoRA documentation dedicated to videos?

@hlky
Copy link
Contributor

hlky commented Mar 19, 2025

Yeah I think docs can be added in a follow up as we'll be overhauling all the pipeline doc pages similar to #11112 anyway and we don't have a great example for this right now.

@sayakpaul
Copy link
Member

I respectfully disagree here. Just a snippet and saying that T2V LoRAs are compatible with I2V LoRAs won't be too much.

It is quite easy to forget it if we were to wait till the overhaul to complete.

we don't have a great example for this right now.

Well, if we don't know with examples if it works as expected, then it could wait until we do I guess. If we're matching the Replicate reference in terms of quality, I am fine with that.

@hlky
Copy link
Contributor

hlky commented Mar 19, 2025

There are many things that are not explicitly documented. I don't think it should be a blocker to merge, and any documentation should be complete, simply stating that T2V loras are compatible with I2V model is not useful without the example, and we don't have a good example right now, nobody does as it's an experimental thing from the past 1-2 days. If we merge it will allow us and the community to conduct further experimentation/testing without switching branches etc.

@sayakpaul sayakpaul merged commit a34d97c into huggingface:main Mar 19, 2025
@sayakpaul
Copy link
Member

It's a matter of disagreement here, then. I mentioned an example (even if it matches the quality of the Replicate reference) should be fine. Many of the examples we provide in our docs don't always lead to good reasonable results from the get-go.

We had similar situations with using Flux Control LoRAs and Turbo LoRAs but we at least added a small section in the Flux docs during the PR.

If we merge it will allow us and the community to conduct further experimentation/testing without switching branches etc.

This can never ideally be a blocker. If a feature is experimental, community understands that and is often okay to switch branches.

I would appreciate it if we don't rush into these merges without the little documentation I requested for going forward.

@DN6 DN6 added the roadmap Add to current release roadmap label Mar 20, 2025
@DN6 DN6 moved this from In Progress to Done in Diffusers Roadmap 0.34 Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
roadmap Add to current release roadmap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants