Skip to content

fix copy inconsistencies#11

Merged
lawrence-cj merged 1 commit into
lawrence-cj:feat/sa-solverfrom
scxue:feat/sa-solver
Jan 15, 2024
Merged

fix copy inconsistencies#11
lawrence-cj merged 1 commit into
lawrence-cj:feat/sa-solverfrom
scxue:feat/sa-solver

Conversation

@scxue

@scxue scxue commented Jan 15, 2024

Copy link
Copy Markdown

fix copy inconsistencies

@lawrence-cj lawrence-cj merged commit a8c2228 into lawrence-cj:feat/sa-solver Jan 15, 2024
lawrence-cj pushed a commit that referenced this pull request Dec 23, 2024
* Update README.md

* minor fix

* [CI-Lint] Fix code style issues with pre-commit 9839049f792c5e059dc934116077d9604da5c53f

---------

Co-authored-by: GitHub Action <action@github.com>
lawrence-cj pushed a commit that referenced this pull request Jan 12, 2026
* Initial LTX 2.0 transformer implementation

* Add tests for LTX 2 transformer model

* Get LTX 2 transformer tests working

* Rename LTX 2 compile test class to have LTX2

* Remove RoPE debug print statements

* Get LTX 2 transformer compile tests passing

* Fix LTX 2 transformer shape errors

* Initial script to convert LTX 2 transformer to diffusers

* Add more LTX 2 transformer audio arguments

* Allow LTX 2 transformer to be loaded from local path for conversion

* Improve dummy inputs and add test for LTX 2 transformer consistency

* Fix LTX 2 transformer bugs so consistency test passes

* Initial implementation of LTX 2.0 video VAE

* Explicitly specify temporal and spatial VAE scale factors when converting

* Add initial LTX 2.0 video VAE tests

* Add initial LTX 2.0 video VAE tests (part 2)

* Get diffusers implementation on par with official LTX 2.0 video VAE implementation

* Initial LTX 2.0 vocoder implementation

* Use RMSNorm implementation closer to original for LTX 2.0 video VAE

* start audio decoder.

* init registration.

* up

* simplify and clean up

* up

* Initial LTX 2.0 text encoder implementation

* Rough initial LTX 2.0 pipeline implementation

* up

* up

* up

* up

* Add imports for LTX 2.0 Audio VAE

* Conversion script for LTX 2.0 Audio VAE Decoder

* Add Audio VAE logic to T2V pipeline

* Duplicate scheduler for audio latents

* Support num_videos_per_prompt for prompt embeddings

* LTX 2.0 scheduler and full pipeline conversion

* Add script to test full LTX2Pipeline T2V inference

* Fix pipeline return bugs

* Add LTX 2 text encoder and vocoder to ltx2 subdirectory __init__

* Fix more bugs in LTX2Pipeline.__call__

* Improve CPU offload support

* Fix pipeline audio VAE decoding dtype bug

* Fix video shape error in full pipeline test script

* Get LTX 2 T2V pipeline to produce reasonable outputs

* Make LTX 2.0 scheduler more consistent with original code

* Fix typo when applying scheduler fix in T2V inference script

* Refactor Audio VAE to be simpler and remove helpers (#7)

* remove resolve causality axes stuff.

* remove a bunch of helpers.

* remove adjust output shape helper.

* remove the use of audiolatentshape.

* move normalization and patchify out of pipeline.

* fix

* up

* up

* Remove unpatchify and patchify ops before audio latents denormalization (#9)

---------

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Add support for I2V (#8)

* start i2v.

* up

* up

* up

* up

* up

* remove uniform strategy code.

* remove unneeded code.

* Denormalize audio latents in I2V pipeline (analogous to T2V change) (#11)

* test i2v.

* Move Video and Audio Text Encoder Connectors to Transformer (#12)

* Denormalize audio latents in I2V pipeline (analogous to T2V change)

* Initial refactor to put video and audio text encoder connectors in transformer

* Get LTX 2 transformer tests working after connector refactor

* precompute run_connectors,.

* fixes

* Address review comments

* Calculate RoPE double precisions freqs using torch instead of np

* Further simplify LTX 2 RoPE freq calc

* Make connectors a separate module (huggingface#18)

* remove text_encoder.py

* address yiyi's comments.

* up

* up

* up

* up

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>

* up (huggingface#19)

* address initial feedback from lightricks team (huggingface#16)

* cross_attn_timestep_scale_multiplier to 1000

* implement split rope type.

* up

* propagate rope_type to rope embed classes as well.

* up

* When using split RoPE, make sure that the output dtype is same as input dtype

* Fix apply split RoPE shape error when reshaping x to 4D

* Add export_utils file for exporting LTX 2.0 videos with audio

* Tests for T2V and I2V (#6)

* add ltx2 pipeline tests.

* up

* up

* up

* up

* remove content

* style

* Denormalize audio latents in I2V pipeline (analogous to T2V change)

* Initial refactor to put video and audio text encoder connectors in transformer

* Get LTX 2 transformer tests working after connector refactor

* up

* up

* i2v tests.

* up

* Address review comments

* Calculate RoPE double precisions freqs using torch instead of np

* Further simplify LTX 2 RoPE freq calc

* revert unneded changes.

* up

* up

* update to split style rope.

* up

---------

Co-authored-by: Daniel Gu <dgu8957@gmail.com>

* up

* use export util funcs.

* Point original checkpoint to LTX 2.0 official checkpoint

* Allow the I2V pipeline to accept image URLs

* make style and make quality

* remove function map.

* remove args.

* update docs.

* update doc entries.

* disable ltx2_consistency test

* Simplify LTX 2 RoPE forward by removing coords is None logic

* make style and make quality

* Support LTX 2.0 audio VAE encoder

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Remove print statement in audio VAE

* up

* Fix bug when calculating audio RoPE coords

* Ltx 2 latent upsample pipeline (huggingface#12922)

* Initial implementation of LTX 2.0 latent upsampling pipeline

* Add new LTX 2.0 spatial latent upsampler logic

* Add test script for LTX 2.0 latent upsampling

* Add option to enable VAE tiling in upsampling test script

* Get latent upsampler working with video latents

* Fix typo in BlurDownsample

* Add latent upsample pipeline docstring and example

* Remove deprecated pipeline VAE slicing/tiling methods

* make style and make quality

* When returning latents, return unpacked and denormalized latents for T2V and I2V

* Add model_cpu_offload_seq for latent upsampling pipeline

---------

Co-authored-by: Daniel Gu <dgu8957@gmail.com>

* Fix latent upsampler filename in LTX 2 conversion script

* Add latent upsample pipeline to LTX 2 docs

* Add dummy objects for LTX 2 latent upsample pipeline

* Set default FPS to official LTX 2 ckpt default of 24.0

* Set default CFG scale to official LTX 2 ckpt default of 4.0

* Update LTX 2 pipeline example docstrings

* make style and make quality

* Remove LTX 2 test scripts

* Fix LTX 2 upsample pipeline example docstring

* Add logic to convert and save a LTX 2 upsampling pipeline

* Document LTX2VideoTransformer3DModel forward pass

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
lawrence-cj pushed a commit that referenced this pull request Jun 15, 2026
…gingface#13815)

* feat(pipelines): add DreamLite text-to-image and image-edit pipelines

Add ByteDance's DreamLite model family to diffusers. DreamLite is a
UNet-based diffusion model that supports both text-to-image generation
and reference-image editing through a shared 3-branch dual-CFG design.
Two pipelines are shipped:

* DreamLitePipeline           - full 3-branch dual CFG (negative,
                                reference, prompt); supports T2I and
                                I2I editing at 1024x1024.
* DreamLiteMobilePipeline     - distilled single-branch variant for
                                on-device inference; no CFG.

New model code (all isolated under *_dreamlite.py / unet_dreamlite.py
to avoid touching shared upstream files):

* models/transformers/transformer_2d_dreamlite.py - DreamLite 2D
  transformer block.
* models/unets/unet_dreamlite.py                  - DreamLiteUNetModel.
* models/unets/unet_2d_blocks_dreamlite.py        - DreamLite-specific
  down/up/mid blocks.
* models/resnet_dreamlite.py                      - DreamLite ResNet
  variants.
* models/attention_processor.py                   - add
  DreamLiteAttnProcessor2_0 (pure addition, no existing processor
  modified).

Pipeline + tests + docs:

* pipelines/dreamlite/{__init__.py, pipeline_dreamlite.py,
  pipeline_dreamlite_mobile.py, pipeline_output.py}.
* tests/pipelines/dreamlite/{test_pipeline_dreamlite.py,
  test_pipeline_dreamlite_mobile.py} with the standard
  PipelineTesterMixin suite; setUp/tearDown auto-patches encode_prompt
  with a fake so MagicMock text encoders work without per-test
  boilerplate.
* Skip 8 mixin tests that don't apply to DreamLite (MagicMock
  serialisation, custom attention processor, encode_prompt return
  shape, batch_size > 1 sweep), mirroring SD3 / Flux conventions.
* docs/source/en/api/pipelines/dreamlite.md + _toctree.yml entry
  (alphabetically between DiT and EasyAnimate).
* Register exports in 6 __init__.py files.

Two real bugs surfaced by the mixin test suite are fixed in this
commit:

* num_images_per_prompt > 1: prompt_embeds and text_attention_mask
  are now repeated along the batch dimension in both pipelines'
  T2I and I2I branches before being passed to the UNet.
* vae=None: __init__ now guards the encoder_block_out_channels
  lookup so encode_prompt can be tested in isolation per
  PipelineTesterMixin convention.

SlowTests real-checkpoint resolution is set to 1024x1024 (the only
size DreamLite is trained for).

Test result: 27 passed, 50 skipped, 0 failed on CPU fast suite.
make style && make quality: clean.

* docs+tests(pipelines/dreamlite): pin Hub repos to `diffusers` branch

The `carlofkl/DreamLite-{base,mobile}` Hub repos host two flavours of the
same checkpoint:

* `main` branch      - keeps `model_index.json` pointing at ByteDance's
                       internal package path so the original (non-diffusers)
                       reference code can still load these weights.
* `diffusers` branch - rewrites the `unet` entry of `model_index.json` to
                       `["diffusers", "DreamLiteUNetModel"]` so this
                       integration loads correctly from `diffusers`.

This commit pins every `from_pretrained(...)` call shipped with the
diffusers integration (docs examples, pipeline docstrings, SlowTests) to
`revision="diffusers"`. Local-override env vars (DREAMLITE_BASE_PATH /
DREAMLITE_MOBILE_PATH) still bypass the revision pin.

* chore(pipelines/dreamlite): sync `# Copied from` blocks + dummy objects after rebase

Mechanical changes after rebasing onto current `main`:

* `pipeline_dreamlite.py::retrieve_timesteps` — re-synced from
  `diffusers.pipelines.flux.pipeline_flux.retrieve_timesteps` (PEP 604
  type hints, expanded docstring, plus the new
  `accepts_timesteps` / `accept_sigmas` introspection guards). DreamLite's
  default code path uses `num_inference_steps` (uniform schedule) and never
  passes custom `timesteps` / `sigmas`, so the added guards are dead-code
  for this pipeline — behaviour is unchanged.
* `dummy_pt_objects.py` / `dummy_torch_and_transformers_objects.py` —
  registered the dummy classes auto-generated by `make fix-copies` for
  `DreamLiteTransformer2DModel`, `DreamLiteUNetModel`, `DreamLitePipeline`,
  `DreamLiteMobilePipeline`, `DreamLitePipelineOutput`.

Generated by `make fix-copies`. No hand edits.

* docs(dreamlite): register attention processor + split combined docstring entries

- Register DreamLiteAttnProcessor2_0 in docs/source/en/api/attnprocessor.md
  (fixes check_support_list.py).
- Split combined 'height / width' and 'guidance_scale / image_guidance_scale'
  entries in the two pipeline docstrings; add a complete Args block to
  DreamLiteTransformer2DModel.forward
  (fixes check_forward_call_docstrings.py).

No behavioral change.

* refactor(dreamlite): address review feedback from huggingface#13815

- Inline the down/up block factories and define DreamLiteCrossAttn{,NoSelfAttn}{Down,Up}Block2D directly (review #1, #2)
- Rename DownBlock2DDreamLite/UpBlock2DDreamLite to DreamLiteDownBlock2D/DreamLiteUpBlock2D to match diffusers naming conventions (review #3, #4)
- Merge unet_2d_blocks_dreamlite.py into unet_dreamlite.py to mirror recent transformer model files (review #5)
- Wire max_sequence_length into the tokenizer call for generate mode (review #6)
- Replace hard-coded drop_idx values (64/34) with self.prompt_template_encode_*_start_idx attributes plus a comment explaining how the offsets are derived (review #7, #8)
- Drop the manual Image.resize call and rely on VaeImageProcessor's LANCZOS default in preprocess(image, height, width) (review #9)
- Use self.guidance_scale / self.image_guidance_scale properties in the CFG combine instead of the underscore-prefixed attributes (review #10, #11)
- Inline retrieve_latents / retrieve_timesteps / calculate_shift in the mobile pipeline with `# Copied from` markers, removing the cross-pipeline imports (review #12)
- Add `# Copied from` marker to _extract_masked_hidden in the mobile pipeline (review huggingface#13)

* refactor(dreamlite): address dg845 follow-up review

- Merge resnet_dreamlite.py (DepthwiseSeparableConv + ResnetBlock2DDreamLite)
  into unet_dreamlite.py and delete the standalone module (review #1)
- Move DreamLiteAttnProcessor2_0 from attention_processor.py into
  unet_dreamlite.py to keep all DreamLite-specific code in one place;
  update docs autodoc reference accordingly (review #2)
- Drop the PyTorch 2.0 hasattr/ImportError check in
  DreamLiteAttnProcessor2_0.__init__ (diffusers already requires
  torch>=2.0; matches Wan deprecation) (review #3)
- Drop the deprecated `scale` argument handling from
  DreamLiteAttnProcessor2_0.__call__ (new model, no legacy callers)
  (review #4)
- Switch SDPA call to dispatch_attention_fn so all diffusers attention
  backends (FlashAttention, FlashAttention-3, sageattention, etc.) are
  selectable (review #5)
- Rename block dispatch keys in _get_{down,mid,up}_block_dreamlite to
  match the Python class names (DreamLiteCrossAttn{Down,Up}Block2D /
  DreamLiteCrossAttnNoSelfAttn{Down,Up}Block2D /
  DreamLiteUNetMidBlock2DCrossAttn / DreamLite{Down,Up}Block2D);
  default down/up/mid block_types in DreamLiteUNetModel and the test
  fixtures are updated to the new keys (review #6, #7); the
  carlofkl/DreamLite-{base,mobile} (diffusers branch) Hub configs are
  being updated in lock-step
- Localize retrieve_latents inside pipeline_dreamlite.py with a
  `# Copied from` marker, removing the cross-pipeline import; mirrors
  the mobile pipeline (review #8)
- Add a check_inputs() method to both DreamLitePipeline and
  DreamLiteMobilePipeline (mobile uses `# Copied from`); call it from
  __call__; pulls the image-type validation out of prepare_image_latents
  and adds prompt-type and h/w-divisibility checks (review #9)

* fix(dreamlite): correct Q/K/V layout for dispatch_attention_fn

dispatch_attention_fn expects (batch, seq, heads, head_dim) and handles the transpose internally; the previous code passed (batch, heads, seq, head_dim), which collided with the dispatch's internal transpose and broke inference (RuntimeError: tensor size mismatch at non-singleton dimension 1).

* test(dreamlite): swap MagicMock for tiny real Qwen3-VL fixture

Address dg845's review: rebuild the DreamLite fast-test fixture around a
real (tiny) Qwen3VLForConditionalGeneration + Qwen3VLProcessor so the
standard PipelineTesterMixin save/load, dtype, and offload tests run
end-to-end against the actual encode_prompt code path. Override
DreamLiteUNetModel.set_default_attn_processor to reinstall the GQA
processor so mixin utilities that round-trip through it keep working.

* Apply style fixes

* fix(dreamlite): address blocking review issues from huggingface#13815

- Override _no_split_modules / _repeated_blocks on DreamLiteUNetModel
  with the actual DreamLite class names (BasicTransformerBlockDreamLite,
  ResnetBlock2DDreamLite, DreamLiteCrossAttnUpBlock2D,
  DreamLiteUpBlock2D) so device_map="auto" and compile_repeated_blocks()
  match correctly.

- Keep attention masks as bool tensors in DreamLiteTransformer2DModel
  instead of converting them to dense additive float biases. The dense
  format hard-raises on flash / _flash_3 / _sage backends in
  dispatch_attention_fn (which requires dtype == torch.bool).

- Add explicit parentheses around each clause in check_inputs's mixed
  and/or condition (both pipelines) for readability.

- Replace nn.Module.__init__(self) with ModelMixin.__init__(self) in
  DreamLiteUNetModel.__init__ so mixin state (e.g.
  _gradient_checkpointing_func) is properly initialised. ConfigMixin /
  PushToHubMixin don't define their own __init__, so this covers the
  full chain without re-running UNet2DConditionModel.__init__.

* fix(dreamlite): forward all processor outputs to Qwen3VL text encoder

Recent versions of Qwen3VLProcessor add an mm_token_type_ids output, and
Qwen3VLModel.compute_3d_position_ids raises ValueError whenever
multimodal inputs are present (image_grid_thw is not None) but
mm_token_type_ids is None.

encode_prompt previously forwarded only input_ids / attention_mask /
pixel_values / image_grid_thw, dropping the new field and breaking the
fast pipeline tests against transformers main.

Switch to ``self.text_encoder(**tk_out, output_hidden_states=True)``
(matching NucleusMoEImagePipeline) so all processor outputs are
forwarded automatically and future additions don't regress this path.

* Apply style fixes

* docs(dreamlite): address final review nits from huggingface#13815

- Replace broken cat.png URL in editing examples (both base and mobile)
  with the standard `huggingface/documentation-images` source used
  elsewhere in the diffusers docs.
- Promote the recommended guidance_scale=3.5 / image_guidance_scale=1.5
  to the default values of DreamLitePipeline.__call__, and drop the
  now-redundant explicit args from the docs examples.
- Switch the EXAMPLE_DOC_STRING examples in both pipelines from
  torch.float16 to torch.bfloat16 for consistency with the rest of the
  docs.

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants