CogView4 Control Block #10809

zRzRzRzRzRzRzR · 2025-02-17T08:08:45Z

What does this pull request do?

The purpose of this PR is to add a Control module to CogView4, which refers to the implementation of Flux.

Who can review?

@arrow

…diffusers into cogview4_control

yiyixuxu

thanks for the PR!
do we already have a checkpoint for cogview3 control lora or is this mainly to support training?

src/diffusers/models/transformers/transformer_cogview4.py

yiyixuxu · 2025-02-20T19:52:19Z

src/diffusers/pipelines/cogview4/pipeline_cogview4_control.py

+"""
+
+
+def calculate_shift(


we can add a #Copied from here

I tried to add, it should be in this format.

then maybe you can update the one in cogview4 and then add a #Copied from?

yiyixuxu · 2025-02-20T19:54:46Z

src/diffusers/pipelines/cogview4/pipeline_cogview4_control.py

+        >>> import torch
+        >>> from diffusers import CogView4Pipeline
+
+        >>> pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=torch.bfloat16)


need to update the pipeline
do we have a checkpoint?

Now there is none, but we expect to train a control checkpoint, and we are in the process of training.

src/diffusers/pipelines/cogview4/pipeline_cogview4_control.py

yiyixuxu · 2025-02-20T20:16:06Z

src/diffusers/pipelines/cogview4/pipeline_cogview4_control.py

+            self.scheduler.config.get("base_shift", 0.25),
+            self.scheduler.config.get("max_shift", 0.75),
+        )
+        _, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, sigmas=sigmas, mu=mu)


Suggested change

_, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, sigmas=sigmas, mu=mu)

timesteps, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, timesteps, sigmas=sigmas, mu=mu)

we updated our scheduler to work with cogview4 - is there any reason we still cannot use the 1scheduler.set_timesteps1 to set timesteps?

I have modified the code, and now it is the same as the cogview4 model.

src/diffusers/pipelines/cogview4/pipeline_cogview4_control.py

zRzRzRzRzRzRzR · 2025-02-21T06:43:27Z

thanks for the PR! do we already have a checkpoint for cogview3 control lora or is this mainly to support training?

This is to support training, and furthermore, it supports Control training for the CogView4 model, not the CogView3 model.

…diffusers into cogview4_control

yiyixuxu

I left some nits but looks good to me
do you want to wait to merge after you have a checkpoint? this way we could add proper doc & test too

yiyixuxu · 2025-02-24T16:56:38Z

src/diffusers/pipelines/cogview4/pipeline_cogview4_control.py

+"""
+
+
+def calculate_shift(


then maybe you can update the one in cogview4 and then add a #Copied from?

src/diffusers/pipelines/cogview4/pipeline_cogview4_control.py

…diffusers into cogview4_control

zRzRzRzRzRzRzR · 2025-03-12T10:25:37Z

I have added a script for converting from the meagtron model of controlnet to the diffusers model.

a-r-r-o-w

Thanks for the amazing work and upcoming model! Just some minor changes remaining before we can merge

a-r-r-o-w · 2025-03-12T21:17:55Z

examples/cogview4-control/train_control_cogview4.py

+                )[0]
+                # these weighting schemes use a uniform timestep sampling
+                # and instead post-weight the loss
+                weighting = compute_loss_weighting_for_sd3(weighting_scheme=args.weighting_scheme, sigmas=sigmas)


I've done a very limited set of experiments in finetrainers but I found that using the shifted sigmas (scale_factors) instead of sigmas consistently has a lower loss and converges to a given style/character faster. We don't have to modify it here but I'm just making a note (and perhaps I'm doing it wrong as well, since I haven't done particularly long runs to verify this)

How do you think the convergence can be modified to be faster in this place?

a-r-r-o-w · 2025-03-12T21:21:49Z

src/diffusers/models/transformers/transformer_cogview4.py

+        if attention_mask is not None:
+            text_attention_mask = attention_mask.float().to(query.device)
+            actual_text_seq_length = text_attention_mask.size(1)
+            new_attention_mask = torch.zeros((batch_size, text_seq_length + image_seq_length), device=query.device)
+            new_attention_mask[:, :actual_text_seq_length] = text_attention_mask
+            new_attention_mask = new_attention_mask.unsqueeze(2)
+            attention_mask_matrix = new_attention_mask @ new_attention_mask.transpose(1, 2)
+            attention_mask = (attention_mask_matrix > 0).unsqueeze(1).to(query.dtype)
+


I believe this does not impact the original CogView4 model since we don't pass an attention mask in that (so it should have same speed as before)

This part is for training, not for inference.

If this part of the construction is not used, the corresponding attn mask is all 1s, which led to poor performance of the trained model in previous experiments.

a-r-r-o-w · 2025-03-12T21:25:37Z

src/diffusers/models/transformers/transformer_cogview4.py

@@ -289,7 +304,7 @@ def forward(self, hidden_states: torch.Tensor) -> Tuple[torch.Tensor, torch.Tens
        return (freqs.cos(), freqs.sin())


-class CogView4Transformer2DModel(ModelMixin, ConfigMixin, PeftAdapterMixin):
+class CogView4Transformer2DModel(ModelMixin, ConfigMixin, PeftAdapterMixin, CacheMixin):


To support CacheMixin, the pipeline implementations should set the _current_timestep attribute as well. Currently it is not set, so this will error out.

Could you update both the CogView4 pipelines with similar changes to mentions of _current_timestep in CogVideoX:

diffusers/src/diffusers/pipelines/cogvideo/pipeline_cogvideox.py

Line 713 in 20e4b6a

self._current_timestep = t

I have increased the price for this part of the code, is this correct?

src/diffusers/pipelines/cogview4/__init__.py

a-r-r-o-w · 2025-03-12T21:27:19Z

src/diffusers/pipelines/cogview4/pipeline_cogview4_control.py

+    Examples:
+        ```python
+        >>> import torch
+        >>> from diffusers import CogView4Pipeline


Suggested change

>>> from diffusers import CogView4Pipeline

>>> from diffusers import CogView4ControlPipeline

src/diffusers/pipelines/cogview4/pipeline_cogview4_control.py

HuggingFaceDocBuilderDev · 2025-03-13T18:42:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu · 2025-03-13T18:44:56Z

@bot /style

zRzRzRzRzRzRzR · 2025-03-15T08:43:31Z

@bot /style

yiyixuxu · 2025-03-15T14:27:37Z

@bot /style

Co-authored-by: SunMarc <marc.sun@hotmail.fr> condition better. support mapping. improvements. [Quantization] Add Quanto backend (#10756) * update * updaet * update * update * update * update * update * update * update * update * update * update * Update docs/source/en/quantization/quanto.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * Update src/diffusers/quantizers/quanto/utils.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * update * update --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> [Single File] Add single file loading for SANA Transformer (#10947) * added support for from_single_file * added diffusers mapping script * added testcase * bug fix * updated tests * corrected code quality * corrected code quality --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> [LoRA] Improve warning messages when LoRA loading becomes a no-op (#10187) * updates * updates * updates * updates * notebooks revert * fix-copies. * seeing * fix * revert * fixes * fixes * fixes * remove print * fix * conflicts ii. * updates * fixes * better filtering of prefix. --------- Co-authored-by: hlky <hlky@hlky.ac> [LoRA] CogView4 (#10981) * update * make fix-copies * update [Tests] improve quantization tests by additionally measuring the inference memory savings (#11021) * memory usage tests * fixes * gguf [`Research Project`] Add AnyText: Multilingual Visual Text Generation And Editing (#8998) * Add initial template * Second template * feat: Add TextEmbeddingModule to AnyTextPipeline * feat: Add AuxiliaryLatentModule template to AnyTextPipeline * Add bert tokenizer from the anytext repo for now * feat: Update AnyTextPipeline's modify_prompt method This commit adds improvements to the modify_prompt method in the AnyTextPipeline class. The method now handles special characters and replaces selected string prompts with a placeholder. Additionally, it includes a check for Chinese text and translation using the trans_pipe. * Fill in the `forward` pass of `AuxiliaryLatentModule` * `make style && make quality` * `chore: Update bert_tokenizer.py with a TODO comment suggesting the use of the transformers library` * Update error handling to raise and logging * Add `create_glyph_lines` function into `TextEmbeddingModule` * make style * Up * Up * Up * Up * Remove several comments * refactor: Remove ControlNetConditioningEmbedding and update code accordingly * Up * Up * up * refactor: Update AnyTextPipeline to include new optional parameters * up * feat: Add OCR model and its components * chore: Update `TextEmbeddingModule` to include OCR model components and dependencies * chore: Update `AuxiliaryLatentModule` to include VAE model and its dependencies for masked image in the editing task * `make style` * refactor: Update `AnyTextPipeline`'s docstring * Update `AuxiliaryLatentModule` to include info dictionary so that text processing is done once * simplify * `make style` * Converting `TextEmbeddingModule` to ordinary `encode_prompt()` function * Simplify for now * `make style` * Up * feat: Add scripts to convert AnyText controlnet to diffusers * `make style` * Fix: Move glyph rendering to `TextEmbeddingModule` from `AuxiliaryLatentModule` * make style * Up * Simplify * Up * feat: Add safetensors module for loading model file * Fix device issues * Up * Up * refactor: Simplify * refactor: Simplify code for loading models and handling data types * `make style` * refactor: Update to() method in FrozenCLIPEmbedderT3 and TextEmbeddingModule * refactor: Update dtype in embedding_manager.py to match proj.weight * Up * Add attribution and adaptation information to pipeline_anytext.py * Update usage example * Will refactor `controlnet_cond_embedding` initialization * Add `AnyTextControlNetConditioningEmbedding` template * Refactor organization * style * style * Move custom blocks from `AuxiliaryLatentModule` to `AnyTextControlNetConditioningEmbedding` * Follow one-file policy * style * [Docs] Update README and pipeline_anytext.py to use AnyTextControlNetModel * [Docs] Update import statement for AnyTextControlNetModel in pipeline_anytext.py * [Fix] Update import path for ControlNetModel, ControlNetOutput in anytext_controlnet.py * Refactor AnyTextControlNet to use configurable conditioning embedding channels * Complete control net conditioning embedding in AnyTextControlNetModel * up * [FIX] Ensure embeddings use correct device in AnyTextControlNetModel * up * up * style * [UPDATE] Revise README and example code for AnyTextPipeline integration with DiffusionPipeline * [UPDATE] Update example code in anytext.py to use correct font file and improve clarity * down * [UPDATE] Refactor BasicTokenizer usage to a new Checker class for text processing * update pillow * [UPDATE] Remove commented-out code and unnecessary docstring in anytext.py and anytext_controlnet.py for improved clarity * [REMOVE] Delete frozen_clip_embedder_t3.py as it is in the anytext.py file * [UPDATE] Replace edict with dict for configuration in anytext.py and RecModel.py for consistency * 🆙 * style * [UPDATE] Revise README.md for clarity, remove unused imports in anytext.py, and add author credits in anytext_controlnet.py * style * Update examples/research_projects/anytext/README.md Co-authored-by: Aryan <contact.aryanvs@gmail.com> * Remove commented-out image preparation code in AnyTextPipeline * Remove unnecessary blank line in README.md [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 (#11018) * update * update * update * update * update * update * update * update * update fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings (#11012) small fix on generating time_ids & embeddings [LoRA] support wan i2v loras from the world. (#11025) * support wan i2v loras from the world. * remove copied from. * upates * add lora. Fix SD3 IPAdapter feature extractor (#11027) chore: fix help messages in advanced diffusion examples (#10923) Fix missing **kwargs in lora_pipeline.py (#11011) * Update lora_pipeline.py * Apply style fixes * fix-copies --------- Co-authored-by: hlky <hlky@hlky.ac> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Fix for multi-GPU WAN inference (#10997) Ensure that hidden_state and shift/scale are on the same device when running with multiple GPUs Co-authored-by: Jimmy <39@🇺🇸.com> [Refactor] Clean up import utils boilerplate (#11026) * update * update * update Use `output_size` in `repeat_interleave` (#11030) [hybrid inference 🍯🐝] Add VAE encode (#11017) * [hybrid inference 🍯🐝] Add VAE encode * _toctree: add vae encode * Add endpoints, tests * vae_encode docs * vae encode benchmarks * api reference * changelog * Update docs/source/en/hybrid_inference/overview.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * update --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Wan Pipeline scaling fix, type hint warning, multi generator fix (#11007) * Wan Pipeline scaling fix, type hint warning, multi generator fix * Apply suggestions from code review [LoRA] change to warning from info when notifying the users about a LoRA no-op (#11044) * move to warning. * test related changes. Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline (#10827) * Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline --------- Co-authored-by: YiYi Xu <yixu310@gmail.com> making ```formatted_images``` initialization compact (#10801) compact writing Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com> Fix aclnnRepeatInterleaveIntWithDim error on NPU for get_1d_rotary_pos_embed (#10820) * get_1d_rotary_pos_embed support npu * Update src/diffusers/models/embeddings.py --------- Co-authored-by: Kai zheng <kaizheng@KaideMacBook-Pro.local> Co-authored-by: hlky <hlky@hlky.ac> Co-authored-by: YiYi Xu <yixu310@gmail.com> [Tests] restrict memory tests for quanto for certain schemes. (#11052) * restrict memory tests for quanto for certain schemes. * Apply suggestions from code review Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> * fixes * style --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> [LoRA] feat: support non-diffusers wan t2v loras. (#11059) feat: support non-diffusers wan t2v loras. [examples/controlnet/train_controlnet_sd3.py] Fixes #11050 - Cast prompt_embeds and pooled_prompt_embeds to weight_dtype to prevent dtype mismatch (#11051) Fix: dtype mismatch of prompt embeddings in sd3 controlnet training Co-authored-by: Andreas Jörg <andreasjoerg@MacBook-Pro-von-Andreas-2.fritz.box> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> reverts accidental change that removes attn_mask in attn. Improves fl… (#11065) reverts accidental change that removes attn_mask in attn. Improves flux ptxla by using flash block sizes. Moves encoding outside the for loop. Co-authored-by: Juan Acevedo <jfacevedo@google.com> Fix deterministic issue when getting pipeline dtype and device (#10696) Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> [Tests] add requires peft decorator. (#11037) * add requires peft decorator. * install peft conditionally. * conditional deps. Co-authored-by: DN6 <dhruv.nair@gmail.com> --------- Co-authored-by: DN6 <dhruv.nair@gmail.com> CogView4 Control Block (#10809) * cogview4 control training --------- Co-authored-by: OleehyO <leehy0357@gmail.com> Co-authored-by: yiyixuxu <yixu310@gmail.com> [CI] pin transformers version for benchmarking. (#11067) pin transformers version for benchmarking. updates Fix Wan I2V Quality (#11087) * fix_wan_i2v_quality * Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update pipeline_wan_i2v.py --------- Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: hlky <hlky@hlky.ac> LTX 0.9.5 (#10968) * update --------- Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: hlky <hlky@hlky.ac> make PR GPU tests conditioned on styling. (#11099) Group offloading improvements (#11094) update Fix pipeline_flux_controlnet.py (#11095) * Fix pipeline_flux_controlnet.py * Fix style update readme instructions. (#11096) Co-authored-by: Juan Acevedo <jfacevedo@google.com> Resolve stride mismatch in UNet's ResNet to support Torch DDP (#11098) Modify UNet's ResNet implementation to resolve stride mismatch in Torch's DDP Fix Group offloading behaviour when using streams (#11097) * update * update Quality options in `export_to_video` (#11090) * Quality options in `export_to_video` * make style improve more. add placeholders for docstrings. formatting. smol fix. solidify validation and annotation

* feat: pipeline-level quant config. Co-authored-by: SunMarc <marc.sun@hotmail.fr> condition better. support mapping. improvements. [Quantization] Add Quanto backend (#10756) * update * updaet * update * update * update * update * update * update * update * update * update * update * Update docs/source/en/quantization/quanto.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * Update src/diffusers/quantizers/quanto/utils.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * update * update --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> [Single File] Add single file loading for SANA Transformer (#10947) * added support for from_single_file * added diffusers mapping script * added testcase * bug fix * updated tests * corrected code quality * corrected code quality --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> [LoRA] Improve warning messages when LoRA loading becomes a no-op (#10187) * updates * updates * updates * updates * notebooks revert * fix-copies. * seeing * fix * revert * fixes * fixes * fixes * remove print * fix * conflicts ii. * updates * fixes * better filtering of prefix. --------- Co-authored-by: hlky <hlky@hlky.ac> [LoRA] CogView4 (#10981) * update * make fix-copies * update [Tests] improve quantization tests by additionally measuring the inference memory savings (#11021) * memory usage tests * fixes * gguf [`Research Project`] Add AnyText: Multilingual Visual Text Generation And Editing (#8998) * Add initial template * Second template * feat: Add TextEmbeddingModule to AnyTextPipeline * feat: Add AuxiliaryLatentModule template to AnyTextPipeline * Add bert tokenizer from the anytext repo for now * feat: Update AnyTextPipeline's modify_prompt method This commit adds improvements to the modify_prompt method in the AnyTextPipeline class. The method now handles special characters and replaces selected string prompts with a placeholder. Additionally, it includes a check for Chinese text and translation using the trans_pipe. * Fill in the `forward` pass of `AuxiliaryLatentModule` * `make style && make quality` * `chore: Update bert_tokenizer.py with a TODO comment suggesting the use of the transformers library` * Update error handling to raise and logging * Add `create_glyph_lines` function into `TextEmbeddingModule` * make style * Up * Up * Up * Up * Remove several comments * refactor: Remove ControlNetConditioningEmbedding and update code accordingly * Up * Up * up * refactor: Update AnyTextPipeline to include new optional parameters * up * feat: Add OCR model and its components * chore: Update `TextEmbeddingModule` to include OCR model components and dependencies * chore: Update `AuxiliaryLatentModule` to include VAE model and its dependencies for masked image in the editing task * `make style` * refactor: Update `AnyTextPipeline`'s docstring * Update `AuxiliaryLatentModule` to include info dictionary so that text processing is done once * simplify * `make style` * Converting `TextEmbeddingModule` to ordinary `encode_prompt()` function * Simplify for now * `make style` * Up * feat: Add scripts to convert AnyText controlnet to diffusers * `make style` * Fix: Move glyph rendering to `TextEmbeddingModule` from `AuxiliaryLatentModule` * make style * Up * Simplify * Up * feat: Add safetensors module for loading model file * Fix device issues * Up * Up * refactor: Simplify * refactor: Simplify code for loading models and handling data types * `make style` * refactor: Update to() method in FrozenCLIPEmbedderT3 and TextEmbeddingModule * refactor: Update dtype in embedding_manager.py to match proj.weight * Up * Add attribution and adaptation information to pipeline_anytext.py * Update usage example * Will refactor `controlnet_cond_embedding` initialization * Add `AnyTextControlNetConditioningEmbedding` template * Refactor organization * style * style * Move custom blocks from `AuxiliaryLatentModule` to `AnyTextControlNetConditioningEmbedding` * Follow one-file policy * style * [Docs] Update README and pipeline_anytext.py to use AnyTextControlNetModel * [Docs] Update import statement for AnyTextControlNetModel in pipeline_anytext.py * [Fix] Update import path for ControlNetModel, ControlNetOutput in anytext_controlnet.py * Refactor AnyTextControlNet to use configurable conditioning embedding channels * Complete control net conditioning embedding in AnyTextControlNetModel * up * [FIX] Ensure embeddings use correct device in AnyTextControlNetModel * up * up * style * [UPDATE] Revise README and example code for AnyTextPipeline integration with DiffusionPipeline * [UPDATE] Update example code in anytext.py to use correct font file and improve clarity * down * [UPDATE] Refactor BasicTokenizer usage to a new Checker class for text processing * update pillow * [UPDATE] Remove commented-out code and unnecessary docstring in anytext.py and anytext_controlnet.py for improved clarity * [REMOVE] Delete frozen_clip_embedder_t3.py as it is in the anytext.py file * [UPDATE] Replace edict with dict for configuration in anytext.py and RecModel.py for consistency * 🆙 * style * [UPDATE] Revise README.md for clarity, remove unused imports in anytext.py, and add author credits in anytext_controlnet.py * style * Update examples/research_projects/anytext/README.md Co-authored-by: Aryan <contact.aryanvs@gmail.com> * Remove commented-out image preparation code in AnyTextPipeline * Remove unnecessary blank line in README.md [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 (#11018) * update * update * update * update * update * update * update * update * update fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings (#11012) small fix on generating time_ids & embeddings [LoRA] support wan i2v loras from the world. (#11025) * support wan i2v loras from the world. * remove copied from. * upates * add lora. Fix SD3 IPAdapter feature extractor (#11027) chore: fix help messages in advanced diffusion examples (#10923) Fix missing **kwargs in lora_pipeline.py (#11011) * Update lora_pipeline.py * Apply style fixes * fix-copies --------- Co-authored-by: hlky <hlky@hlky.ac> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Fix for multi-GPU WAN inference (#10997) Ensure that hidden_state and shift/scale are on the same device when running with multiple GPUs Co-authored-by: Jimmy <39@🇺🇸.com> [Refactor] Clean up import utils boilerplate (#11026) * update * update * update Use `output_size` in `repeat_interleave` (#11030) [hybrid inference 🍯🐝] Add VAE encode (#11017) * [hybrid inference 🍯🐝] Add VAE encode * _toctree: add vae encode * Add endpoints, tests * vae_encode docs * vae encode benchmarks * api reference * changelog * Update docs/source/en/hybrid_inference/overview.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * update --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Wan Pipeline scaling fix, type hint warning, multi generator fix (#11007) * Wan Pipeline scaling fix, type hint warning, multi generator fix * Apply suggestions from code review [LoRA] change to warning from info when notifying the users about a LoRA no-op (#11044) * move to warning. * test related changes. Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline (#10827) * Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline --------- Co-authored-by: YiYi Xu <yixu310@gmail.com> making ```formatted_images``` initialization compact (#10801) compact writing Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com> Fix aclnnRepeatInterleaveIntWithDim error on NPU for get_1d_rotary_pos_embed (#10820) * get_1d_rotary_pos_embed support npu * Update src/diffusers/models/embeddings.py --------- Co-authored-by: Kai zheng <kaizheng@KaideMacBook-Pro.local> Co-authored-by: hlky <hlky@hlky.ac> Co-authored-by: YiYi Xu <yixu310@gmail.com> [Tests] restrict memory tests for quanto for certain schemes. (#11052) * restrict memory tests for quanto for certain schemes. * Apply suggestions from code review Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> * fixes * style --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> [LoRA] feat: support non-diffusers wan t2v loras. (#11059) feat: support non-diffusers wan t2v loras. [examples/controlnet/train_controlnet_sd3.py] Fixes #11050 - Cast prompt_embeds and pooled_prompt_embeds to weight_dtype to prevent dtype mismatch (#11051) Fix: dtype mismatch of prompt embeddings in sd3 controlnet training Co-authored-by: Andreas Jörg <andreasjoerg@MacBook-Pro-von-Andreas-2.fritz.box> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> reverts accidental change that removes attn_mask in attn. Improves fl… (#11065) reverts accidental change that removes attn_mask in attn. Improves flux ptxla by using flash block sizes. Moves encoding outside the for loop. Co-authored-by: Juan Acevedo <jfacevedo@google.com> Fix deterministic issue when getting pipeline dtype and device (#10696) Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> [Tests] add requires peft decorator. (#11037) * add requires peft decorator. * install peft conditionally. * conditional deps. Co-authored-by: DN6 <dhruv.nair@gmail.com> --------- Co-authored-by: DN6 <dhruv.nair@gmail.com> CogView4 Control Block (#10809) * cogview4 control training --------- Co-authored-by: OleehyO <leehy0357@gmail.com> Co-authored-by: yiyixuxu <yixu310@gmail.com> [CI] pin transformers version for benchmarking. (#11067) pin transformers version for benchmarking. updates Fix Wan I2V Quality (#11087) * fix_wan_i2v_quality * Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update pipeline_wan_i2v.py --------- Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: hlky <hlky@hlky.ac> LTX 0.9.5 (#10968) * update --------- Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: hlky <hlky@hlky.ac> make PR GPU tests conditioned on styling. (#11099) Group offloading improvements (#11094) update Fix pipeline_flux_controlnet.py (#11095) * Fix pipeline_flux_controlnet.py * Fix style update readme instructions. (#11096) Co-authored-by: Juan Acevedo <jfacevedo@google.com> Resolve stride mismatch in UNet's ResNet to support Torch DDP (#11098) Modify UNet's ResNet implementation to resolve stride mismatch in Torch's DDP Fix Group offloading behaviour when using streams (#11097) * update * update Quality options in `export_to_video` (#11090) * Quality options in `export_to_video` * make style improve more. add placeholders for docstrings. formatting. smol fix. solidify validation and annotation * Revert "feat: pipeline-level quant config." This reverts commit 316ff46. * feat: implement pipeline-level quantization config Co-authored-by: SunMarc <marc@huggingface.co> * update * fixes * fix validation. * add tests and other improvements. * add tests * import quality * remove prints. * add docs. * fixes to docs. * doc fixes. * doc fixes. * add validation to the input quantization_config. * clarify recommendations. * docs * add to ci. * todo. --------- Co-authored-by: SunMarc <marc@huggingface.co>

zRzRzRzRzRzRzR added 2 commits February 17, 2025 15:51

1

a97fca2

change to channel 1

c30ca7a

zRzRzRzRzRzRzR changed the title ~~CogView4 Contorl Block~~ CogView4 Control Block Feb 17, 2025

zRzRzRzRzRzRzR added 18 commits February 18, 2025 14:43

cogview4 control training

5c25cd2

add CacheMixin

44bfd4c

1

a9f448e

remove initial_input_channels change for val

2cbdf35

1

df83bf2

update

8bba67a

use 3.5

b9d864b

new loss

5d2e994

Merge branch 'huggingface:main' into cogview4_control

ebeb1e4

1

95e8504

Merge branch 'cogview4_control' of https://github.com/zRzRzRzRzRzRzR/…

940c23b

…diffusers into cogview4_control

use imagetoken

7a68a3e

for megatron convert

2a81772

1

1d91a24

train con and uc

dff4b29

Merge branch 'huggingface:main' into cogview4_control

050b97c

2

b007be0

remove guidance_scale

25f4e4b

yiyixuxu reviewed Feb 20, 2025

View reviewed changes

zRzRzRzRzRzRzR added 2 commits February 21, 2025 14:40

Update pipeline_cogview4_control.py

7ffecbc

fix

b4e11e7

zRzRzRzRzRzRzR added 4 commits February 21, 2025 14:56

Merge branch 'huggingface:main' into cogview4_control

efa0f41

use cogview4 pipeline with timestep

f55e3cc

Merge branch 'cogview4_control' of https://github.com/zRzRzRzRzRzRzR/…

9410e46

…diffusers into cogview4_control

update shift_factor

29b0c81

yiyixuxu reviewed Feb 24, 2025

View reviewed changes

OleehyO force-pushed the cogview4_control branch from c339be0 to 264060e Compare February 28, 2025 09:11

OleehyO and others added 3 commits March 4, 2025 10:26

[fix] Add attention mask for padded token

9a10ceb

Merge branch 'huggingface:main' into cogview4_control

b6e10e7

update

692e5cc

zRzRzRzRzRzRzR mentioned this pull request Mar 6, 2025

Training scripts THUDM/CogView4#33

Closed

zRzRzRzRzRzRzR added 6 commits March 6, 2025 20:02

remove padding type

fc3830c

Update train_control_cogview4.py

98a2417

resolve conflicts with huggingface#10981

c774f45

Merge branch 'main' into cogview4_control

687faa4

add control convert

8abca19

Merge branch 'cogview4_control' of https://github.com/zRzRzRzRzRzRzR/…

cbfeb0b

…diffusers into cogview4_control

a-r-r-o-w approved these changes Mar 12, 2025

View reviewed changes

zRzRzRzRzRzRzR added 3 commits March 13, 2025 15:09

use control format

347dd17

fix

775bb8c

add missing import

985baa9

zRzRzRzRzRzRzR added 2 commits March 15, 2025 15:56

Merge branch 'huggingface:main' into cogview4_control

c2a1985

update with cogview4 formate

88abb39

make style

3e3387e

yiyixuxu merged commit 82188ce into huggingface:main Mar 15, 2025
11 of 12 checks passed

DN6 added this to Diffusers Roadmap 0.34 Mar 20, 2025

github-project-automation bot moved this to In Progress in Diffusers Roadmap 0.34 Mar 20, 2025

DN6 moved this from In Progress to Done in Diffusers Roadmap 0.34 Mar 20, 2025

yiyixuxu removed this from Diffusers Roadmap 0.34 Apr 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CogView4 Control Block #10809

CogView4 Control Block #10809

zRzRzRzRzRzRzR commented Feb 17, 2025

yiyixuxu left a comment

yiyixuxu Feb 20, 2025

zRzRzRzRzRzRzR Feb 21, 2025

yiyixuxu Feb 24, 2025

yiyixuxu Feb 20, 2025

zRzRzRzRzRzRzR Feb 21, 2025

yiyixuxu Feb 20, 2025

zRzRzRzRzRzRzR Feb 21, 2025 •

edited

Loading

zRzRzRzRzRzRzR commented Feb 21, 2025

yiyixuxu left a comment

yiyixuxu Feb 24, 2025

zRzRzRzRzRzRzR commented Mar 12, 2025

a-r-r-o-w left a comment

a-r-r-o-w Mar 12, 2025 •

edited

Loading

zRzRzRzRzRzRzR Mar 13, 2025

a-r-r-o-w Mar 12, 2025

zRzRzRzRzRzRzR Mar 13, 2025

zRzRzRzRzRzRzR Mar 13, 2025

a-r-r-o-w Mar 12, 2025

zRzRzRzRzRzRzR Mar 13, 2025

a-r-r-o-w Mar 12, 2025

HuggingFaceDocBuilderDev commented Mar 13, 2025

yiyixuxu commented Mar 13, 2025

zRzRzRzRzRzRzR commented Mar 15, 2025

yiyixuxu commented Mar 15, 2025

	_, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, sigmas=sigmas, mu=mu)
	timesteps, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, timesteps, sigmas=sigmas, mu=mu)

	>>> from diffusers import CogView4Pipeline
	>>> from diffusers import CogView4ControlPipeline

CogView4 Control Block #10809

CogView4 Control Block #10809

Conversation

zRzRzRzRzRzRzR commented Feb 17, 2025

yiyixuxu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zRzRzRzRzRzRzR Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

zRzRzRzRzRzRzR commented Feb 21, 2025

yiyixuxu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zRzRzRzRzRzRzR commented Mar 12, 2025

a-r-r-o-w left a comment

Choose a reason for hiding this comment

a-r-r-o-w Mar 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Mar 13, 2025

yiyixuxu commented Mar 13, 2025

zRzRzRzRzRzRzR commented Mar 15, 2025

yiyixuxu commented Mar 15, 2025

zRzRzRzRzRzRzR Feb 21, 2025 •

edited

Loading

a-r-r-o-w Mar 12, 2025 •

edited

Loading