[Pipelines] Add DreamLite text-to-image and image-edit pipelines#13815
[Pipelines] Add DreamLite text-to-image and image-edit pipelines#13815Carlofkl wants to merge 11 commits into
Conversation
Add ByteDance's DreamLite model family to diffusers. DreamLite is a
UNet-based diffusion model that supports both text-to-image generation
and reference-image editing through a shared 3-branch dual-CFG design.
Two pipelines are shipped:
* DreamLitePipeline - full 3-branch dual CFG (negative,
reference, prompt); supports T2I and
I2I editing at 1024x1024.
* DreamLiteMobilePipeline - distilled single-branch variant for
on-device inference; no CFG.
New model code (all isolated under *_dreamlite.py / unet_dreamlite.py
to avoid touching shared upstream files):
* models/transformers/transformer_2d_dreamlite.py - DreamLite 2D
transformer block.
* models/unets/unet_dreamlite.py - DreamLiteUNetModel.
* models/unets/unet_2d_blocks_dreamlite.py - DreamLite-specific
down/up/mid blocks.
* models/resnet_dreamlite.py - DreamLite ResNet
variants.
* models/attention_processor.py - add
DreamLiteAttnProcessor2_0 (pure addition, no existing processor
modified).
Pipeline + tests + docs:
* pipelines/dreamlite/{__init__.py, pipeline_dreamlite.py,
pipeline_dreamlite_mobile.py, pipeline_output.py}.
* tests/pipelines/dreamlite/{test_pipeline_dreamlite.py,
test_pipeline_dreamlite_mobile.py} with the standard
PipelineTesterMixin suite; setUp/tearDown auto-patches encode_prompt
with a fake so MagicMock text encoders work without per-test
boilerplate.
* Skip 8 mixin tests that don't apply to DreamLite (MagicMock
serialisation, custom attention processor, encode_prompt return
shape, batch_size > 1 sweep), mirroring SD3 / Flux conventions.
* docs/source/en/api/pipelines/dreamlite.md + _toctree.yml entry
(alphabetically between DiT and EasyAnimate).
* Register exports in 6 __init__.py files.
Two real bugs surfaced by the mixin test suite are fixed in this
commit:
* num_images_per_prompt > 1: prompt_embeds and text_attention_mask
are now repeated along the batch dimension in both pipelines'
T2I and I2I branches before being passed to the UNet.
* vae=None: __init__ now guards the encoder_block_out_channels
lookup so encode_prompt can be tested in isolation per
PipelineTesterMixin convention.
SlowTests real-checkpoint resolution is set to 1024x1024 (the only
size DreamLite is trained for).
Test result: 27 passed, 50 skipped, 0 failed on CPU fast suite.
make style && make quality: clean.
The `carlofkl/DreamLite-{base,mobile}` Hub repos host two flavours of the
same checkpoint:
* `main` branch - keeps `model_index.json` pointing at ByteDance's
internal package path so the original (non-diffusers)
reference code can still load these weights.
* `diffusers` branch - rewrites the `unet` entry of `model_index.json` to
`["diffusers", "DreamLiteUNetModel"]` so this
integration loads correctly from `diffusers`.
This commit pins every `from_pretrained(...)` call shipped with the
diffusers integration (docs examples, pipeline docstrings, SlowTests) to
`revision="diffusers"`. Local-override env vars (DREAMLITE_BASE_PATH /
DREAMLITE_MOBILE_PATH) still bypass the revision pin.
…ts after rebase Mechanical changes after rebasing onto current `main`: * `pipeline_dreamlite.py::retrieve_timesteps` — re-synced from `diffusers.pipelines.flux.pipeline_flux.retrieve_timesteps` (PEP 604 type hints, expanded docstring, plus the new `accepts_timesteps` / `accept_sigmas` introspection guards). DreamLite's default code path uses `num_inference_steps` (uniform schedule) and never passes custom `timesteps` / `sigmas`, so the added guards are dead-code for this pipeline — behaviour is unchanged. * `dummy_pt_objects.py` / `dummy_torch_and_transformers_objects.py` — registered the dummy classes auto-generated by `make fix-copies` for `DreamLiteTransformer2DModel`, `DreamLiteUNetModel`, `DreamLitePipeline`, `DreamLiteMobilePipeline`, `DreamLitePipelineOutput`. Generated by `make fix-copies`. No hand edits.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…ing entries - Register DreamLiteAttnProcessor2_0 in docs/source/en/api/attnprocessor.md (fixes check_support_list.py). - Split combined 'height / width' and 'guidance_scale / image_guidance_scale' entries in the two pipeline docstrings; add a complete Args block to DreamLiteTransformer2DModel.forward (fixes check_forward_call_docstrings.py). No behavioral change.
|
Hi @sayakpaul @yiyixuxu — pushed a small follow-up commit (
No behavioral change — docs/docstrings only. Verified both lints pass locally. Whenever convenient, could you re-approve the workflows? Thanks! |
|
Hi @yiyixuxu @DN6 @sayakpaul — quick update: CI is now fully green |
dg845
left a comment
There was a problem hiding this comment.
Thanks for the PR! Left an initial design review :).
- Inline the down/up block factories and define DreamLiteCrossAttn{,NoSelfAttn}{Down,Up}Block2D directly (review huggingface#1, huggingface#2)
- Rename DownBlock2DDreamLite/UpBlock2DDreamLite to DreamLiteDownBlock2D/DreamLiteUpBlock2D to match diffusers naming conventions (review huggingface#3, huggingface#4)
- Merge unet_2d_blocks_dreamlite.py into unet_dreamlite.py to mirror recent transformer model files (review huggingface#5)
- Wire max_sequence_length into the tokenizer call for generate mode (review huggingface#6)
- Replace hard-coded drop_idx values (64/34) with self.prompt_template_encode_*_start_idx attributes plus a comment explaining how the offsets are derived (review huggingface#7, huggingface#8)
- Drop the manual Image.resize call and rely on VaeImageProcessor's LANCZOS default in preprocess(image, height, width) (review huggingface#9)
- Use self.guidance_scale / self.image_guidance_scale properties in the CFG combine instead of the underscore-prefixed attributes (review huggingface#10, huggingface#11)
- Inline retrieve_latents / retrieve_timesteps / calculate_shift in the mobile pipeline with `# Copied from` markers, removing the cross-pipeline imports (review huggingface#12)
- Add `# Copied from` marker to _extract_masked_hidden in the mobile pipeline (review huggingface#13)
|
@dg845 Thanks for testing! Confirmed: the artifact only appears with
|
|
@dg845 thanks for the initial review — pushed updates in 62a5db6 addressing all the inline comments, please take another look when you have time. @yiyixuxu friendly ping — would love your eyes on this whenever you get a chance, especially the single-file unet decision in src/diffusers/models/unets/unet_dreamlite.py (per dg845's comment). |
dg845
left a comment
There was a problem hiding this comment.
Thanks for iterating! Left some follow up comments.
- Merge resnet_dreamlite.py (DepthwiseSeparableConv + ResnetBlock2DDreamLite) into unet_dreamlite.py and delete the standalone module (review huggingface#1) - Move DreamLiteAttnProcessor2_0 from attention_processor.py into unet_dreamlite.py to keep all DreamLite-specific code in one place; update docs autodoc reference accordingly (review huggingface#2) - Drop the PyTorch 2.0 hasattr/ImportError check in DreamLiteAttnProcessor2_0.__init__ (diffusers already requires torch>=2.0; matches Wan deprecation) (review huggingface#3) - Drop the deprecated `scale` argument handling from DreamLiteAttnProcessor2_0.__call__ (new model, no legacy callers) (review huggingface#4) - Switch SDPA call to dispatch_attention_fn so all diffusers attention backends (FlashAttention, FlashAttention-3, sageattention, etc.) are selectable (review huggingface#5) - Rename block dispatch keys in _get_{down,mid,up}_block_dreamlite to match the Python class names (DreamLiteCrossAttn{Down,Up}Block2D / DreamLiteCrossAttnNoSelfAttn{Down,Up}Block2D / DreamLiteUNetMidBlock2DCrossAttn / DreamLite{Down,Up}Block2D); default down/up/mid block_types in DreamLiteUNetModel and the test fixtures are updated to the new keys (review huggingface#6, huggingface#7); the carlofkl/DreamLite-{base,mobile} (diffusers branch) Hub configs are being updated in lock-step - Localize retrieve_latents inside pipeline_dreamlite.py with a `# Copied from` marker, removing the cross-pipeline import; mirrors the mobile pipeline (review huggingface#8) - Add a check_inputs() method to both DreamLitePipeline and DreamLiteMobilePipeline (mobile uses `# Copied from`); call it from __call__; pulls the image-type validation out of prepare_image_latents and adds prompt-type and h/w-divisibility checks (review huggingface#9)
dispatch_attention_fn expects (batch, seq, heads, head_dim) and handles the transpose internally; the previous code passed (batch, heads, seq, head_dim), which collided with the dispatch's internal transpose and broke inference (RuntimeError: tensor size mismatch at non-singleton dimension 1).
|
@dg845 follow-up review fully addressed across two commits ( -1. resnet ( Re-tested end-to-end T2I + I2I locally on A800 / bf16 against Note on CI: the 5 failing PR test jobs (Fast PyTorch Pipeline / Models CPU tests, PyTorch Example CPU tests, Hub tests, LoRA tests with PEFT main) appear to be failing on |
|
|
||
| # text_encoder must expose a real torch.dtype because pipeline does | ||
| # ``dtype = self.text_encoder.dtype``. Everything else is mocked. | ||
| text_encoder = MagicMock() |
There was a problem hiding this comment.
Would it be possible to use a small Qwen3VLForConditionalGeneration text encoder model for testing here instead of a MagicMock instance (and similarly for the tokenizer and processor)? For example, NucleusMoE-Image does the following:
diffusers/tests/pipelines/nucleusmoe_image/test_nucleusmoe_image.py
Lines 116 to 117 in 1c7f759
This should let us run the skipped tests below normally. (I'm still reviewing the rest of the PR but wanted to leave this comment first since the change would make reviewing the tests easier.)




Context
This PR integrates DreamLite — ByteDance's text-to-image / image-edit diffusion model — into
diffusers, following an invitation from @NielsRogge to release the model on the Hub indiffusersformat.Related issue: ByteVisionLab/DreamLite#3 (comment)
Model cards (public, ungated):
Both repos use a
diffusersbranch (loaded viarevision="diffusers") to keep the original ByteDance-internalmainbranch intact for backward compatibility with existing users.What's added
Architecture highlights
DreamLiteUNetModel— UNet-based denoiser conditioned on Qwen3-VL text/vision embeddings.DreamLitePipeline— runs 3 forward passes per step (text-cond / image-cond / uncond) and combines them with a dual-CFG schedule for high-fidelity text-to-image and image edit.DreamLiteMobilePipeline— distilled single-pass variant; no CFG; designed for on-device inference. Pairs withAutoencoderTiny.FlowMatchEulerDiscreteScheduler.Testing
carlofkl/DreamLite-basewithrevision="diffusers"— all 6 sub-modules resolve to the correctdiffusers.*namespace.std≈93, no NaN/Inf).tests/pipelines/dreamlite/.Before submitting
Who can review?
cc @sayakpaul @yiyixuxu @DN6 — thanks in advance for the review!