make style#10
Merged
Merged
Conversation
lawrence-cj
pushed a commit
that referenced
this pull request
Jun 15, 2026
…gingface#13815) * feat(pipelines): add DreamLite text-to-image and image-edit pipelines Add ByteDance's DreamLite model family to diffusers. DreamLite is a UNet-based diffusion model that supports both text-to-image generation and reference-image editing through a shared 3-branch dual-CFG design. Two pipelines are shipped: * DreamLitePipeline - full 3-branch dual CFG (negative, reference, prompt); supports T2I and I2I editing at 1024x1024. * DreamLiteMobilePipeline - distilled single-branch variant for on-device inference; no CFG. New model code (all isolated under *_dreamlite.py / unet_dreamlite.py to avoid touching shared upstream files): * models/transformers/transformer_2d_dreamlite.py - DreamLite 2D transformer block. * models/unets/unet_dreamlite.py - DreamLiteUNetModel. * models/unets/unet_2d_blocks_dreamlite.py - DreamLite-specific down/up/mid blocks. * models/resnet_dreamlite.py - DreamLite ResNet variants. * models/attention_processor.py - add DreamLiteAttnProcessor2_0 (pure addition, no existing processor modified). Pipeline + tests + docs: * pipelines/dreamlite/{__init__.py, pipeline_dreamlite.py, pipeline_dreamlite_mobile.py, pipeline_output.py}. * tests/pipelines/dreamlite/{test_pipeline_dreamlite.py, test_pipeline_dreamlite_mobile.py} with the standard PipelineTesterMixin suite; setUp/tearDown auto-patches encode_prompt with a fake so MagicMock text encoders work without per-test boilerplate. * Skip 8 mixin tests that don't apply to DreamLite (MagicMock serialisation, custom attention processor, encode_prompt return shape, batch_size > 1 sweep), mirroring SD3 / Flux conventions. * docs/source/en/api/pipelines/dreamlite.md + _toctree.yml entry (alphabetically between DiT and EasyAnimate). * Register exports in 6 __init__.py files. Two real bugs surfaced by the mixin test suite are fixed in this commit: * num_images_per_prompt > 1: prompt_embeds and text_attention_mask are now repeated along the batch dimension in both pipelines' T2I and I2I branches before being passed to the UNet. * vae=None: __init__ now guards the encoder_block_out_channels lookup so encode_prompt can be tested in isolation per PipelineTesterMixin convention. SlowTests real-checkpoint resolution is set to 1024x1024 (the only size DreamLite is trained for). Test result: 27 passed, 50 skipped, 0 failed on CPU fast suite. make style && make quality: clean. * docs+tests(pipelines/dreamlite): pin Hub repos to `diffusers` branch The `carlofkl/DreamLite-{base,mobile}` Hub repos host two flavours of the same checkpoint: * `main` branch - keeps `model_index.json` pointing at ByteDance's internal package path so the original (non-diffusers) reference code can still load these weights. * `diffusers` branch - rewrites the `unet` entry of `model_index.json` to `["diffusers", "DreamLiteUNetModel"]` so this integration loads correctly from `diffusers`. This commit pins every `from_pretrained(...)` call shipped with the diffusers integration (docs examples, pipeline docstrings, SlowTests) to `revision="diffusers"`. Local-override env vars (DREAMLITE_BASE_PATH / DREAMLITE_MOBILE_PATH) still bypass the revision pin. * chore(pipelines/dreamlite): sync `# Copied from` blocks + dummy objects after rebase Mechanical changes after rebasing onto current `main`: * `pipeline_dreamlite.py::retrieve_timesteps` — re-synced from `diffusers.pipelines.flux.pipeline_flux.retrieve_timesteps` (PEP 604 type hints, expanded docstring, plus the new `accepts_timesteps` / `accept_sigmas` introspection guards). DreamLite's default code path uses `num_inference_steps` (uniform schedule) and never passes custom `timesteps` / `sigmas`, so the added guards are dead-code for this pipeline — behaviour is unchanged. * `dummy_pt_objects.py` / `dummy_torch_and_transformers_objects.py` — registered the dummy classes auto-generated by `make fix-copies` for `DreamLiteTransformer2DModel`, `DreamLiteUNetModel`, `DreamLitePipeline`, `DreamLiteMobilePipeline`, `DreamLitePipelineOutput`. Generated by `make fix-copies`. No hand edits. * docs(dreamlite): register attention processor + split combined docstring entries - Register DreamLiteAttnProcessor2_0 in docs/source/en/api/attnprocessor.md (fixes check_support_list.py). - Split combined 'height / width' and 'guidance_scale / image_guidance_scale' entries in the two pipeline docstrings; add a complete Args block to DreamLiteTransformer2DModel.forward (fixes check_forward_call_docstrings.py). No behavioral change. * refactor(dreamlite): address review feedback from huggingface#13815 - Inline the down/up block factories and define DreamLiteCrossAttn{,NoSelfAttn}{Down,Up}Block2D directly (review #1, #2) - Rename DownBlock2DDreamLite/UpBlock2DDreamLite to DreamLiteDownBlock2D/DreamLiteUpBlock2D to match diffusers naming conventions (review #3, #4) - Merge unet_2d_blocks_dreamlite.py into unet_dreamlite.py to mirror recent transformer model files (review #5) - Wire max_sequence_length into the tokenizer call for generate mode (review #6) - Replace hard-coded drop_idx values (64/34) with self.prompt_template_encode_*_start_idx attributes plus a comment explaining how the offsets are derived (review #7, #8) - Drop the manual Image.resize call and rely on VaeImageProcessor's LANCZOS default in preprocess(image, height, width) (review #9) - Use self.guidance_scale / self.image_guidance_scale properties in the CFG combine instead of the underscore-prefixed attributes (review #10, #11) - Inline retrieve_latents / retrieve_timesteps / calculate_shift in the mobile pipeline with `# Copied from` markers, removing the cross-pipeline imports (review #12) - Add `# Copied from` marker to _extract_masked_hidden in the mobile pipeline (review huggingface#13) * refactor(dreamlite): address dg845 follow-up review - Merge resnet_dreamlite.py (DepthwiseSeparableConv + ResnetBlock2DDreamLite) into unet_dreamlite.py and delete the standalone module (review #1) - Move DreamLiteAttnProcessor2_0 from attention_processor.py into unet_dreamlite.py to keep all DreamLite-specific code in one place; update docs autodoc reference accordingly (review #2) - Drop the PyTorch 2.0 hasattr/ImportError check in DreamLiteAttnProcessor2_0.__init__ (diffusers already requires torch>=2.0; matches Wan deprecation) (review #3) - Drop the deprecated `scale` argument handling from DreamLiteAttnProcessor2_0.__call__ (new model, no legacy callers) (review #4) - Switch SDPA call to dispatch_attention_fn so all diffusers attention backends (FlashAttention, FlashAttention-3, sageattention, etc.) are selectable (review #5) - Rename block dispatch keys in _get_{down,mid,up}_block_dreamlite to match the Python class names (DreamLiteCrossAttn{Down,Up}Block2D / DreamLiteCrossAttnNoSelfAttn{Down,Up}Block2D / DreamLiteUNetMidBlock2DCrossAttn / DreamLite{Down,Up}Block2D); default down/up/mid block_types in DreamLiteUNetModel and the test fixtures are updated to the new keys (review #6, #7); the carlofkl/DreamLite-{base,mobile} (diffusers branch) Hub configs are being updated in lock-step - Localize retrieve_latents inside pipeline_dreamlite.py with a `# Copied from` marker, removing the cross-pipeline import; mirrors the mobile pipeline (review #8) - Add a check_inputs() method to both DreamLitePipeline and DreamLiteMobilePipeline (mobile uses `# Copied from`); call it from __call__; pulls the image-type validation out of prepare_image_latents and adds prompt-type and h/w-divisibility checks (review #9) * fix(dreamlite): correct Q/K/V layout for dispatch_attention_fn dispatch_attention_fn expects (batch, seq, heads, head_dim) and handles the transpose internally; the previous code passed (batch, heads, seq, head_dim), which collided with the dispatch's internal transpose and broke inference (RuntimeError: tensor size mismatch at non-singleton dimension 1). * test(dreamlite): swap MagicMock for tiny real Qwen3-VL fixture Address dg845's review: rebuild the DreamLite fast-test fixture around a real (tiny) Qwen3VLForConditionalGeneration + Qwen3VLProcessor so the standard PipelineTesterMixin save/load, dtype, and offload tests run end-to-end against the actual encode_prompt code path. Override DreamLiteUNetModel.set_default_attn_processor to reinstall the GQA processor so mixin utilities that round-trip through it keep working. * Apply style fixes * fix(dreamlite): address blocking review issues from huggingface#13815 - Override _no_split_modules / _repeated_blocks on DreamLiteUNetModel with the actual DreamLite class names (BasicTransformerBlockDreamLite, ResnetBlock2DDreamLite, DreamLiteCrossAttnUpBlock2D, DreamLiteUpBlock2D) so device_map="auto" and compile_repeated_blocks() match correctly. - Keep attention masks as bool tensors in DreamLiteTransformer2DModel instead of converting them to dense additive float biases. The dense format hard-raises on flash / _flash_3 / _sage backends in dispatch_attention_fn (which requires dtype == torch.bool). - Add explicit parentheses around each clause in check_inputs's mixed and/or condition (both pipelines) for readability. - Replace nn.Module.__init__(self) with ModelMixin.__init__(self) in DreamLiteUNetModel.__init__ so mixin state (e.g. _gradient_checkpointing_func) is properly initialised. ConfigMixin / PushToHubMixin don't define their own __init__, so this covers the full chain without re-running UNet2DConditionModel.__init__. * fix(dreamlite): forward all processor outputs to Qwen3VL text encoder Recent versions of Qwen3VLProcessor add an mm_token_type_ids output, and Qwen3VLModel.compute_3d_position_ids raises ValueError whenever multimodal inputs are present (image_grid_thw is not None) but mm_token_type_ids is None. encode_prompt previously forwarded only input_ids / attention_mask / pixel_values / image_grid_thw, dropping the new field and breaking the fast pipeline tests against transformers main. Switch to ``self.text_encoder(**tk_out, output_hidden_states=True)`` (matching NucleusMoEImagePipeline) so all processor outputs are forwarded automatically and future additions don't regress this path. * Apply style fixes * docs(dreamlite): address final review nits from huggingface#13815 - Replace broken cat.png URL in editing examples (both base and mobile) with the standard `huggingface/documentation-images` source used elsewhere in the diffusers docs. - Promote the recommended guidance_scale=3.5 / image_guidance_scale=1.5 to the default values of DreamLitePipeline.__call__, and drop the now-redundant explicit args from the docs examples. - Switch the EXAMPLE_DOC_STRING examples in both pipelines from torch.float16 to torch.bfloat16 for consistency with the rest of the docs. --------- Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
lawrence-cj
pushed a commit
that referenced
this pull request
Jun 15, 2026
* [.ai] add self-review skill, retire parity-testing skill, and tighten the agent guides - New `self-review` skill mirroring the `@claude` CI review (rubric from review-rules.md, call-path dead-code analysis), report-only, with the report flagging what to fix before submitting (blocking + dead code) vs what to leave for the actual review. - Remove the WIP `parity-testing` skill; preserve its pitfalls as `model-integration/pitfalls.md` (numerical-discrepancy reference). - model-integration: restructure around a grouped checklist, default-to-modular, an overall file-structure sketch (details deferred to the guides), a fresh-conversion `Model parity test` example (internal, not shipped), and a filled-in weight/checkpoint-conversion section. - Centralize the loading rule (from_pretrained / from_single_file, no custom loaders) in models.md; add per-folder File structure sections to models.md / pipelines.md; default-to-modular note in pipelines.md. - AGENTS.md: dedicated 'Self-review before a PR' and 'Reference guides' sections. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * [.ai] simplify pitfalls #6 and drop the model-storage / injection-test entries Trim pitfall #6 to the essential point (small dtype diffs compound into a large final difference), remove the `/tmp` model-storage and incomplete-injection-test pitfalls, and renumber 1-16 with cross-references updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * [.ai] drop parity-harness-specific pitfalls With the parity-testing skill gone, remove the stale-test-fixtures pitfall (saved tensors / cross-pipeline fixtures no longer apply) and de-jargon the noise-dtype detection note. Keeps the pitfalls list generic to numerical discrepancy. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * [.ai] trim pitfalls to a concise possible-causes reference Drop the variable-shadowing and decoder-config pitfalls and the noise-dtype 'Detection' aside, tighten the remaining entries, renumber 1-12 (cross-refs updated), and reframe the intro as a non-checklist reference list of possible causes to consult only when outputs don't match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Apply suggestion from @yiyixuxu * Apply suggestion from @yiyixuxu * [docs] update contributing guide for the self-review skill Replace the retired parity-testing skill with self-review in the skills list, and add a 'Self-review before opening' step to the AI-assisted contributions section: run the self-review skill / review-rules, fix blocking issues + dead code, and treat the @claude CI review as a non-authoritative helper (note any intentional skips in the PR for the reviewer). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * [.ai] fix dangling pitfalls ref and broaden self-review scope - Drop the broken 'pitfalls.md #10' reference in the conversion step (the /tmp model-storage pitfall was removed); save to a local path instead. - Self-review now reviews the whole diff, not just src/diffusers/ and .ai/ — a contributor should review their own tests/docs/scripts too (the CI's scoping is a safety measure for untrusted PRs). Reword to 'same rubric as the CI'. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
make style