fix copy inconsistencies by scxue · Pull Request #11 · lawrence-cj/diffusers

scxue · 2024-01-15T04:58:02Z

fix copy inconsistencies

* Update README.md * minor fix * [CI-Lint] Fix code style issues with pre-commit 9839049f792c5e059dc934116077d9604da5c53f --------- Co-authored-by: GitHub Action <action@github.com>

* Initial LTX 2.0 transformer implementation * Add tests for LTX 2 transformer model * Get LTX 2 transformer tests working * Rename LTX 2 compile test class to have LTX2 * Remove RoPE debug print statements * Get LTX 2 transformer compile tests passing * Fix LTX 2 transformer shape errors * Initial script to convert LTX 2 transformer to diffusers * Add more LTX 2 transformer audio arguments * Allow LTX 2 transformer to be loaded from local path for conversion * Improve dummy inputs and add test for LTX 2 transformer consistency * Fix LTX 2 transformer bugs so consistency test passes * Initial implementation of LTX 2.0 video VAE * Explicitly specify temporal and spatial VAE scale factors when converting * Add initial LTX 2.0 video VAE tests * Add initial LTX 2.0 video VAE tests (part 2) * Get diffusers implementation on par with official LTX 2.0 video VAE implementation * Initial LTX 2.0 vocoder implementation * Use RMSNorm implementation closer to original for LTX 2.0 video VAE * start audio decoder. * init registration. * up * simplify and clean up * up * Initial LTX 2.0 text encoder implementation * Rough initial LTX 2.0 pipeline implementation * up * up * up * up * Add imports for LTX 2.0 Audio VAE * Conversion script for LTX 2.0 Audio VAE Decoder * Add Audio VAE logic to T2V pipeline * Duplicate scheduler for audio latents * Support num_videos_per_prompt for prompt embeddings * LTX 2.0 scheduler and full pipeline conversion * Add script to test full LTX2Pipeline T2V inference * Fix pipeline return bugs * Add LTX 2 text encoder and vocoder to ltx2 subdirectory __init__ * Fix more bugs in LTX2Pipeline.__call__ * Improve CPU offload support * Fix pipeline audio VAE decoding dtype bug * Fix video shape error in full pipeline test script * Get LTX 2 T2V pipeline to produce reasonable outputs * Make LTX 2.0 scheduler more consistent with original code * Fix typo when applying scheduler fix in T2V inference script * Refactor Audio VAE to be simpler and remove helpers (#7) * remove resolve causality axes stuff. * remove a bunch of helpers. * remove adjust output shape helper. * remove the use of audiolatentshape. * move normalization and patchify out of pipeline. * fix * up * up * Remove unpatchify and patchify ops before audio latents denormalization (#9) --------- Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Add support for I2V (#8) * start i2v. * up * up * up * up * up * remove uniform strategy code. * remove unneeded code. * Denormalize audio latents in I2V pipeline (analogous to T2V change) (#11) * test i2v. * Move Video and Audio Text Encoder Connectors to Transformer (#12) * Denormalize audio latents in I2V pipeline (analogous to T2V change) * Initial refactor to put video and audio text encoder connectors in transformer * Get LTX 2 transformer tests working after connector refactor * precompute run_connectors,. * fixes * Address review comments * Calculate RoPE double precisions freqs using torch instead of np * Further simplify LTX 2 RoPE freq calc * Make connectors a separate module (huggingface#18) * remove text_encoder.py * address yiyi's comments. * up * up * up * up --------- Co-authored-by: sayakpaul <spsayakpaul@gmail.com> * up (huggingface#19) * address initial feedback from lightricks team (huggingface#16) * cross_attn_timestep_scale_multiplier to 1000 * implement split rope type. * up * propagate rope_type to rope embed classes as well. * up * When using split RoPE, make sure that the output dtype is same as input dtype * Fix apply split RoPE shape error when reshaping x to 4D * Add export_utils file for exporting LTX 2.0 videos with audio * Tests for T2V and I2V (#6) * add ltx2 pipeline tests. * up * up * up * up * remove content * style * Denormalize audio latents in I2V pipeline (analogous to T2V change) * Initial refactor to put video and audio text encoder connectors in transformer * Get LTX 2 transformer tests working after connector refactor * up * up * i2v tests. * up * Address review comments * Calculate RoPE double precisions freqs using torch instead of np * Further simplify LTX 2 RoPE freq calc * revert unneded changes. * up * up * update to split style rope. * up --------- Co-authored-by: Daniel Gu <dgu8957@gmail.com> * up * use export util funcs. * Point original checkpoint to LTX 2.0 official checkpoint * Allow the I2V pipeline to accept image URLs * make style and make quality * remove function map. * remove args. * update docs. * update doc entries. * disable ltx2_consistency test * Simplify LTX 2 RoPE forward by removing coords is None logic * make style and make quality * Support LTX 2.0 audio VAE encoder * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Remove print statement in audio VAE * up * Fix bug when calculating audio RoPE coords * Ltx 2 latent upsample pipeline (huggingface#12922) * Initial implementation of LTX 2.0 latent upsampling pipeline * Add new LTX 2.0 spatial latent upsampler logic * Add test script for LTX 2.0 latent upsampling * Add option to enable VAE tiling in upsampling test script * Get latent upsampler working with video latents * Fix typo in BlurDownsample * Add latent upsample pipeline docstring and example * Remove deprecated pipeline VAE slicing/tiling methods * make style and make quality * When returning latents, return unpacked and denormalized latents for T2V and I2V * Add model_cpu_offload_seq for latent upsampling pipeline --------- Co-authored-by: Daniel Gu <dgu8957@gmail.com> * Fix latent upsampler filename in LTX 2 conversion script * Add latent upsample pipeline to LTX 2 docs * Add dummy objects for LTX 2 latent upsample pipeline * Set default FPS to official LTX 2 ckpt default of 24.0 * Set default CFG scale to official LTX 2 ckpt default of 4.0 * Update LTX 2 pipeline example docstrings * make style and make quality * Remove LTX 2 test scripts * Fix LTX 2 upsample pipeline example docstring * Add logic to convert and save a LTX 2 upsampling pipeline * Document LTX2VideoTransformer3DModel forward pass --------- Co-authored-by: sayakpaul <spsayakpaul@gmail.com>

…gingface#13815) * feat(pipelines): add DreamLite text-to-image and image-edit pipelines Add ByteDance's DreamLite model family to diffusers. DreamLite is a UNet-based diffusion model that supports both text-to-image generation and reference-image editing through a shared 3-branch dual-CFG design. Two pipelines are shipped: * DreamLitePipeline - full 3-branch dual CFG (negative, reference, prompt); supports T2I and I2I editing at 1024x1024. * DreamLiteMobilePipeline - distilled single-branch variant for on-device inference; no CFG. New model code (all isolated under *_dreamlite.py / unet_dreamlite.py to avoid touching shared upstream files): * models/transformers/transformer_2d_dreamlite.py - DreamLite 2D transformer block. * models/unets/unet_dreamlite.py - DreamLiteUNetModel. * models/unets/unet_2d_blocks_dreamlite.py - DreamLite-specific down/up/mid blocks. * models/resnet_dreamlite.py - DreamLite ResNet variants. * models/attention_processor.py - add DreamLiteAttnProcessor2_0 (pure addition, no existing processor modified). Pipeline + tests + docs: * pipelines/dreamlite/{__init__.py, pipeline_dreamlite.py, pipeline_dreamlite_mobile.py, pipeline_output.py}. * tests/pipelines/dreamlite/{test_pipeline_dreamlite.py, test_pipeline_dreamlite_mobile.py} with the standard PipelineTesterMixin suite; setUp/tearDown auto-patches encode_prompt with a fake so MagicMock text encoders work without per-test boilerplate. * Skip 8 mixin tests that don't apply to DreamLite (MagicMock serialisation, custom attention processor, encode_prompt return shape, batch_size > 1 sweep), mirroring SD3 / Flux conventions. * docs/source/en/api/pipelines/dreamlite.md + _toctree.yml entry (alphabetically between DiT and EasyAnimate). * Register exports in 6 __init__.py files. Two real bugs surfaced by the mixin test suite are fixed in this commit: * num_images_per_prompt > 1: prompt_embeds and text_attention_mask are now repeated along the batch dimension in both pipelines' T2I and I2I branches before being passed to the UNet. * vae=None: __init__ now guards the encoder_block_out_channels lookup so encode_prompt can be tested in isolation per PipelineTesterMixin convention. SlowTests real-checkpoint resolution is set to 1024x1024 (the only size DreamLite is trained for). Test result: 27 passed, 50 skipped, 0 failed on CPU fast suite. make style && make quality: clean. * docs+tests(pipelines/dreamlite): pin Hub repos to `diffusers` branch The `carlofkl/DreamLite-{base,mobile}` Hub repos host two flavours of the same checkpoint: * `main` branch - keeps `model_index.json` pointing at ByteDance's internal package path so the original (non-diffusers) reference code can still load these weights. * `diffusers` branch - rewrites the `unet` entry of `model_index.json` to `["diffusers", "DreamLiteUNetModel"]` so this integration loads correctly from `diffusers`. This commit pins every `from_pretrained(...)` call shipped with the diffusers integration (docs examples, pipeline docstrings, SlowTests) to `revision="diffusers"`. Local-override env vars (DREAMLITE_BASE_PATH / DREAMLITE_MOBILE_PATH) still bypass the revision pin. * chore(pipelines/dreamlite): sync `# Copied from` blocks + dummy objects after rebase Mechanical changes after rebasing onto current `main`: * `pipeline_dreamlite.py::retrieve_timesteps` — re-synced from `diffusers.pipelines.flux.pipeline_flux.retrieve_timesteps` (PEP 604 type hints, expanded docstring, plus the new `accepts_timesteps` / `accept_sigmas` introspection guards). DreamLite's default code path uses `num_inference_steps` (uniform schedule) and never passes custom `timesteps` / `sigmas`, so the added guards are dead-code for this pipeline — behaviour is unchanged. * `dummy_pt_objects.py` / `dummy_torch_and_transformers_objects.py` — registered the dummy classes auto-generated by `make fix-copies` for `DreamLiteTransformer2DModel`, `DreamLiteUNetModel`, `DreamLitePipeline`, `DreamLiteMobilePipeline`, `DreamLitePipelineOutput`. Generated by `make fix-copies`. No hand edits. * docs(dreamlite): register attention processor + split combined docstring entries - Register DreamLiteAttnProcessor2_0 in docs/source/en/api/attnprocessor.md (fixes check_support_list.py). - Split combined 'height / width' and 'guidance_scale / image_guidance_scale' entries in the two pipeline docstrings; add a complete Args block to DreamLiteTransformer2DModel.forward (fixes check_forward_call_docstrings.py). No behavioral change. * refactor(dreamlite): address review feedback from huggingface#13815 - Inline the down/up block factories and define DreamLiteCrossAttn{,NoSelfAttn}{Down,Up}Block2D directly (review #1, #2) - Rename DownBlock2DDreamLite/UpBlock2DDreamLite to DreamLiteDownBlock2D/DreamLiteUpBlock2D to match diffusers naming conventions (review #3, #4) - Merge unet_2d_blocks_dreamlite.py into unet_dreamlite.py to mirror recent transformer model files (review #5) - Wire max_sequence_length into the tokenizer call for generate mode (review #6) - Replace hard-coded drop_idx values (64/34) with self.prompt_template_encode_*_start_idx attributes plus a comment explaining how the offsets are derived (review #7, #8) - Drop the manual Image.resize call and rely on VaeImageProcessor's LANCZOS default in preprocess(image, height, width) (review #9) - Use self.guidance_scale / self.image_guidance_scale properties in the CFG combine instead of the underscore-prefixed attributes (review #10, #11) - Inline retrieve_latents / retrieve_timesteps / calculate_shift in the mobile pipeline with `# Copied from` markers, removing the cross-pipeline imports (review #12) - Add `# Copied from` marker to _extract_masked_hidden in the mobile pipeline (review huggingface#13) * refactor(dreamlite): address dg845 follow-up review - Merge resnet_dreamlite.py (DepthwiseSeparableConv + ResnetBlock2DDreamLite) into unet_dreamlite.py and delete the standalone module (review #1) - Move DreamLiteAttnProcessor2_0 from attention_processor.py into unet_dreamlite.py to keep all DreamLite-specific code in one place; update docs autodoc reference accordingly (review #2) - Drop the PyTorch 2.0 hasattr/ImportError check in DreamLiteAttnProcessor2_0.__init__ (diffusers already requires torch>=2.0; matches Wan deprecation) (review #3) - Drop the deprecated `scale` argument handling from DreamLiteAttnProcessor2_0.__call__ (new model, no legacy callers) (review #4) - Switch SDPA call to dispatch_attention_fn so all diffusers attention backends (FlashAttention, FlashAttention-3, sageattention, etc.) are selectable (review #5) - Rename block dispatch keys in _get_{down,mid,up}_block_dreamlite to match the Python class names (DreamLiteCrossAttn{Down,Up}Block2D / DreamLiteCrossAttnNoSelfAttn{Down,Up}Block2D / DreamLiteUNetMidBlock2DCrossAttn / DreamLite{Down,Up}Block2D); default down/up/mid block_types in DreamLiteUNetModel and the test fixtures are updated to the new keys (review #6, #7); the carlofkl/DreamLite-{base,mobile} (diffusers branch) Hub configs are being updated in lock-step - Localize retrieve_latents inside pipeline_dreamlite.py with a `# Copied from` marker, removing the cross-pipeline import; mirrors the mobile pipeline (review #8) - Add a check_inputs() method to both DreamLitePipeline and DreamLiteMobilePipeline (mobile uses `# Copied from`); call it from __call__; pulls the image-type validation out of prepare_image_latents and adds prompt-type and h/w-divisibility checks (review #9) * fix(dreamlite): correct Q/K/V layout for dispatch_attention_fn dispatch_attention_fn expects (batch, seq, heads, head_dim) and handles the transpose internally; the previous code passed (batch, heads, seq, head_dim), which collided with the dispatch's internal transpose and broke inference (RuntimeError: tensor size mismatch at non-singleton dimension 1). * test(dreamlite): swap MagicMock for tiny real Qwen3-VL fixture Address dg845's review: rebuild the DreamLite fast-test fixture around a real (tiny) Qwen3VLForConditionalGeneration + Qwen3VLProcessor so the standard PipelineTesterMixin save/load, dtype, and offload tests run end-to-end against the actual encode_prompt code path. Override DreamLiteUNetModel.set_default_attn_processor to reinstall the GQA processor so mixin utilities that round-trip through it keep working. * Apply style fixes * fix(dreamlite): address blocking review issues from huggingface#13815 - Override _no_split_modules / _repeated_blocks on DreamLiteUNetModel with the actual DreamLite class names (BasicTransformerBlockDreamLite, ResnetBlock2DDreamLite, DreamLiteCrossAttnUpBlock2D, DreamLiteUpBlock2D) so device_map="auto" and compile_repeated_blocks() match correctly. - Keep attention masks as bool tensors in DreamLiteTransformer2DModel instead of converting them to dense additive float biases. The dense format hard-raises on flash / _flash_3 / _sage backends in dispatch_attention_fn (which requires dtype == torch.bool). - Add explicit parentheses around each clause in check_inputs's mixed and/or condition (both pipelines) for readability. - Replace nn.Module.__init__(self) with ModelMixin.__init__(self) in DreamLiteUNetModel.__init__ so mixin state (e.g. _gradient_checkpointing_func) is properly initialised. ConfigMixin / PushToHubMixin don't define their own __init__, so this covers the full chain without re-running UNet2DConditionModel.__init__. * fix(dreamlite): forward all processor outputs to Qwen3VL text encoder Recent versions of Qwen3VLProcessor add an mm_token_type_ids output, and Qwen3VLModel.compute_3d_position_ids raises ValueError whenever multimodal inputs are present (image_grid_thw is not None) but mm_token_type_ids is None. encode_prompt previously forwarded only input_ids / attention_mask / pixel_values / image_grid_thw, dropping the new field and breaking the fast pipeline tests against transformers main. Switch to ``self.text_encoder(**tk_out, output_hidden_states=True)`` (matching NucleusMoEImagePipeline) so all processor outputs are forwarded automatically and future additions don't regress this path. * Apply style fixes * docs(dreamlite): address final review nits from huggingface#13815 - Replace broken cat.png URL in editing examples (both base and mobile) with the standard `huggingface/documentation-images` source used elsewhere in the diffusers docs. - Promote the recommended guidance_scale=3.5 / image_guidance_scale=1.5 to the default values of DreamLitePipeline.__call__, and drop the now-redundant explicit args from the docs examples. - Switch the EXAMPLE_DOC_STRING examples in both pipelines from torch.float16 to torch.bfloat16 for consistency with the rest of the docs. --------- Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

fix copy inconsistencies

236b75d

lawrence-cj merged commit a8c2228 into lawrence-cj:feat/sa-solver Jan 15, 2024

lawrence-cj pushed a commit that referenced this pull request Dec 23, 2024

Linter bot example (#11)

b831634

* Update README.md * minor fix * [CI-Lint] Fix code style issues with pre-commit 9839049f792c5e059dc934116077d9604da5c53f --------- Co-authored-by: GitHub Action <action@github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix copy inconsistencies#11

fix copy inconsistencies#11
lawrence-cj merged 1 commit into
lawrence-cj:feat/sa-solverfrom
scxue:feat/sa-solver

scxue commented Jan 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

scxue commented Jan 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants