Skip to content

make style#10

Merged
lawrence-cj merged 1 commit into
lawrence-cj:feat/sa-solverfrom
scxue:feat/sa-solver
Jan 15, 2024
Merged

make style#10
lawrence-cj merged 1 commit into
lawrence-cj:feat/sa-solverfrom
scxue:feat/sa-solver

Conversation

@scxue

@scxue scxue commented Jan 15, 2024

Copy link
Copy Markdown

make style

@lawrence-cj lawrence-cj merged commit 6094447 into lawrence-cj:feat/sa-solver Jan 15, 2024
lawrence-cj pushed a commit that referenced this pull request Jun 15, 2026
…gingface#13815)

* feat(pipelines): add DreamLite text-to-image and image-edit pipelines

Add ByteDance's DreamLite model family to diffusers. DreamLite is a
UNet-based diffusion model that supports both text-to-image generation
and reference-image editing through a shared 3-branch dual-CFG design.
Two pipelines are shipped:

* DreamLitePipeline           - full 3-branch dual CFG (negative,
                                reference, prompt); supports T2I and
                                I2I editing at 1024x1024.
* DreamLiteMobilePipeline     - distilled single-branch variant for
                                on-device inference; no CFG.

New model code (all isolated under *_dreamlite.py / unet_dreamlite.py
to avoid touching shared upstream files):

* models/transformers/transformer_2d_dreamlite.py - DreamLite 2D
  transformer block.
* models/unets/unet_dreamlite.py                  - DreamLiteUNetModel.
* models/unets/unet_2d_blocks_dreamlite.py        - DreamLite-specific
  down/up/mid blocks.
* models/resnet_dreamlite.py                      - DreamLite ResNet
  variants.
* models/attention_processor.py                   - add
  DreamLiteAttnProcessor2_0 (pure addition, no existing processor
  modified).

Pipeline + tests + docs:

* pipelines/dreamlite/{__init__.py, pipeline_dreamlite.py,
  pipeline_dreamlite_mobile.py, pipeline_output.py}.
* tests/pipelines/dreamlite/{test_pipeline_dreamlite.py,
  test_pipeline_dreamlite_mobile.py} with the standard
  PipelineTesterMixin suite; setUp/tearDown auto-patches encode_prompt
  with a fake so MagicMock text encoders work without per-test
  boilerplate.
* Skip 8 mixin tests that don't apply to DreamLite (MagicMock
  serialisation, custom attention processor, encode_prompt return
  shape, batch_size > 1 sweep), mirroring SD3 / Flux conventions.
* docs/source/en/api/pipelines/dreamlite.md + _toctree.yml entry
  (alphabetically between DiT and EasyAnimate).
* Register exports in 6 __init__.py files.

Two real bugs surfaced by the mixin test suite are fixed in this
commit:

* num_images_per_prompt > 1: prompt_embeds and text_attention_mask
  are now repeated along the batch dimension in both pipelines'
  T2I and I2I branches before being passed to the UNet.
* vae=None: __init__ now guards the encoder_block_out_channels
  lookup so encode_prompt can be tested in isolation per
  PipelineTesterMixin convention.

SlowTests real-checkpoint resolution is set to 1024x1024 (the only
size DreamLite is trained for).

Test result: 27 passed, 50 skipped, 0 failed on CPU fast suite.
make style && make quality: clean.

* docs+tests(pipelines/dreamlite): pin Hub repos to `diffusers` branch

The `carlofkl/DreamLite-{base,mobile}` Hub repos host two flavours of the
same checkpoint:

* `main` branch      - keeps `model_index.json` pointing at ByteDance's
                       internal package path so the original (non-diffusers)
                       reference code can still load these weights.
* `diffusers` branch - rewrites the `unet` entry of `model_index.json` to
                       `["diffusers", "DreamLiteUNetModel"]` so this
                       integration loads correctly from `diffusers`.

This commit pins every `from_pretrained(...)` call shipped with the
diffusers integration (docs examples, pipeline docstrings, SlowTests) to
`revision="diffusers"`. Local-override env vars (DREAMLITE_BASE_PATH /
DREAMLITE_MOBILE_PATH) still bypass the revision pin.

* chore(pipelines/dreamlite): sync `# Copied from` blocks + dummy objects after rebase

Mechanical changes after rebasing onto current `main`:

* `pipeline_dreamlite.py::retrieve_timesteps` — re-synced from
  `diffusers.pipelines.flux.pipeline_flux.retrieve_timesteps` (PEP 604
  type hints, expanded docstring, plus the new
  `accepts_timesteps` / `accept_sigmas` introspection guards). DreamLite's
  default code path uses `num_inference_steps` (uniform schedule) and never
  passes custom `timesteps` / `sigmas`, so the added guards are dead-code
  for this pipeline — behaviour is unchanged.
* `dummy_pt_objects.py` / `dummy_torch_and_transformers_objects.py` —
  registered the dummy classes auto-generated by `make fix-copies` for
  `DreamLiteTransformer2DModel`, `DreamLiteUNetModel`, `DreamLitePipeline`,
  `DreamLiteMobilePipeline`, `DreamLitePipelineOutput`.

Generated by `make fix-copies`. No hand edits.

* docs(dreamlite): register attention processor + split combined docstring entries

- Register DreamLiteAttnProcessor2_0 in docs/source/en/api/attnprocessor.md
  (fixes check_support_list.py).
- Split combined 'height / width' and 'guidance_scale / image_guidance_scale'
  entries in the two pipeline docstrings; add a complete Args block to
  DreamLiteTransformer2DModel.forward
  (fixes check_forward_call_docstrings.py).

No behavioral change.

* refactor(dreamlite): address review feedback from huggingface#13815

- Inline the down/up block factories and define DreamLiteCrossAttn{,NoSelfAttn}{Down,Up}Block2D directly (review #1, #2)
- Rename DownBlock2DDreamLite/UpBlock2DDreamLite to DreamLiteDownBlock2D/DreamLiteUpBlock2D to match diffusers naming conventions (review #3, #4)
- Merge unet_2d_blocks_dreamlite.py into unet_dreamlite.py to mirror recent transformer model files (review #5)
- Wire max_sequence_length into the tokenizer call for generate mode (review #6)
- Replace hard-coded drop_idx values (64/34) with self.prompt_template_encode_*_start_idx attributes plus a comment explaining how the offsets are derived (review #7, #8)
- Drop the manual Image.resize call and rely on VaeImageProcessor's LANCZOS default in preprocess(image, height, width) (review #9)
- Use self.guidance_scale / self.image_guidance_scale properties in the CFG combine instead of the underscore-prefixed attributes (review #10, #11)
- Inline retrieve_latents / retrieve_timesteps / calculate_shift in the mobile pipeline with `# Copied from` markers, removing the cross-pipeline imports (review #12)
- Add `# Copied from` marker to _extract_masked_hidden in the mobile pipeline (review huggingface#13)

* refactor(dreamlite): address dg845 follow-up review

- Merge resnet_dreamlite.py (DepthwiseSeparableConv + ResnetBlock2DDreamLite)
  into unet_dreamlite.py and delete the standalone module (review #1)
- Move DreamLiteAttnProcessor2_0 from attention_processor.py into
  unet_dreamlite.py to keep all DreamLite-specific code in one place;
  update docs autodoc reference accordingly (review #2)
- Drop the PyTorch 2.0 hasattr/ImportError check in
  DreamLiteAttnProcessor2_0.__init__ (diffusers already requires
  torch>=2.0; matches Wan deprecation) (review #3)
- Drop the deprecated `scale` argument handling from
  DreamLiteAttnProcessor2_0.__call__ (new model, no legacy callers)
  (review #4)
- Switch SDPA call to dispatch_attention_fn so all diffusers attention
  backends (FlashAttention, FlashAttention-3, sageattention, etc.) are
  selectable (review #5)
- Rename block dispatch keys in _get_{down,mid,up}_block_dreamlite to
  match the Python class names (DreamLiteCrossAttn{Down,Up}Block2D /
  DreamLiteCrossAttnNoSelfAttn{Down,Up}Block2D /
  DreamLiteUNetMidBlock2DCrossAttn / DreamLite{Down,Up}Block2D);
  default down/up/mid block_types in DreamLiteUNetModel and the test
  fixtures are updated to the new keys (review #6, #7); the
  carlofkl/DreamLite-{base,mobile} (diffusers branch) Hub configs are
  being updated in lock-step
- Localize retrieve_latents inside pipeline_dreamlite.py with a
  `# Copied from` marker, removing the cross-pipeline import; mirrors
  the mobile pipeline (review #8)
- Add a check_inputs() method to both DreamLitePipeline and
  DreamLiteMobilePipeline (mobile uses `# Copied from`); call it from
  __call__; pulls the image-type validation out of prepare_image_latents
  and adds prompt-type and h/w-divisibility checks (review #9)

* fix(dreamlite): correct Q/K/V layout for dispatch_attention_fn

dispatch_attention_fn expects (batch, seq, heads, head_dim) and handles the transpose internally; the previous code passed (batch, heads, seq, head_dim), which collided with the dispatch's internal transpose and broke inference (RuntimeError: tensor size mismatch at non-singleton dimension 1).

* test(dreamlite): swap MagicMock for tiny real Qwen3-VL fixture

Address dg845's review: rebuild the DreamLite fast-test fixture around a
real (tiny) Qwen3VLForConditionalGeneration + Qwen3VLProcessor so the
standard PipelineTesterMixin save/load, dtype, and offload tests run
end-to-end against the actual encode_prompt code path. Override
DreamLiteUNetModel.set_default_attn_processor to reinstall the GQA
processor so mixin utilities that round-trip through it keep working.

* Apply style fixes

* fix(dreamlite): address blocking review issues from huggingface#13815

- Override _no_split_modules / _repeated_blocks on DreamLiteUNetModel
  with the actual DreamLite class names (BasicTransformerBlockDreamLite,
  ResnetBlock2DDreamLite, DreamLiteCrossAttnUpBlock2D,
  DreamLiteUpBlock2D) so device_map="auto" and compile_repeated_blocks()
  match correctly.

- Keep attention masks as bool tensors in DreamLiteTransformer2DModel
  instead of converting them to dense additive float biases. The dense
  format hard-raises on flash / _flash_3 / _sage backends in
  dispatch_attention_fn (which requires dtype == torch.bool).

- Add explicit parentheses around each clause in check_inputs's mixed
  and/or condition (both pipelines) for readability.

- Replace nn.Module.__init__(self) with ModelMixin.__init__(self) in
  DreamLiteUNetModel.__init__ so mixin state (e.g.
  _gradient_checkpointing_func) is properly initialised. ConfigMixin /
  PushToHubMixin don't define their own __init__, so this covers the
  full chain without re-running UNet2DConditionModel.__init__.

* fix(dreamlite): forward all processor outputs to Qwen3VL text encoder

Recent versions of Qwen3VLProcessor add an mm_token_type_ids output, and
Qwen3VLModel.compute_3d_position_ids raises ValueError whenever
multimodal inputs are present (image_grid_thw is not None) but
mm_token_type_ids is None.

encode_prompt previously forwarded only input_ids / attention_mask /
pixel_values / image_grid_thw, dropping the new field and breaking the
fast pipeline tests against transformers main.

Switch to ``self.text_encoder(**tk_out, output_hidden_states=True)``
(matching NucleusMoEImagePipeline) so all processor outputs are
forwarded automatically and future additions don't regress this path.

* Apply style fixes

* docs(dreamlite): address final review nits from huggingface#13815

- Replace broken cat.png URL in editing examples (both base and mobile)
  with the standard `huggingface/documentation-images` source used
  elsewhere in the diffusers docs.
- Promote the recommended guidance_scale=3.5 / image_guidance_scale=1.5
  to the default values of DreamLitePipeline.__call__, and drop the
  now-redundant explicit args from the docs examples.
- Switch the EXAMPLE_DOC_STRING examples in both pipelines from
  torch.float16 to torch.bfloat16 for consistency with the rest of the
  docs.

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
lawrence-cj pushed a commit that referenced this pull request Jun 15, 2026
* [.ai] add self-review skill, retire parity-testing skill, and tighten the agent guides

- New `self-review` skill mirroring the `@claude` CI review (rubric from
  review-rules.md, call-path dead-code analysis), report-only, with the report
  flagging what to fix before submitting (blocking + dead code) vs what to leave
  for the actual review.
- Remove the WIP `parity-testing` skill; preserve its pitfalls as
  `model-integration/pitfalls.md` (numerical-discrepancy reference).
- model-integration: restructure around a grouped checklist, default-to-modular,
  an overall file-structure sketch (details deferred to the guides), a
  fresh-conversion `Model parity test` example (internal, not shipped), and a
  filled-in weight/checkpoint-conversion section.
- Centralize the loading rule (from_pretrained / from_single_file, no custom
  loaders) in models.md; add per-folder File structure sections to models.md /
  pipelines.md; default-to-modular note in pipelines.md.
- AGENTS.md: dedicated 'Self-review before a PR' and 'Reference guides' sections.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* [.ai] simplify pitfalls #6 and drop the model-storage / injection-test entries

Trim pitfall #6 to the essential point (small dtype diffs compound into a large
final difference), remove the `/tmp` model-storage and incomplete-injection-test
pitfalls, and renumber 1-16 with cross-references updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* [.ai] drop parity-harness-specific pitfalls

With the parity-testing skill gone, remove the stale-test-fixtures pitfall (saved
tensors / cross-pipeline fixtures no longer apply) and de-jargon the noise-dtype
detection note. Keeps the pitfalls list generic to numerical discrepancy.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* [.ai] trim pitfalls to a concise possible-causes reference

Drop the variable-shadowing and decoder-config pitfalls and the noise-dtype
'Detection' aside, tighten the remaining entries, renumber 1-12 (cross-refs
updated), and reframe the intro as a non-checklist reference list of possible
causes to consult only when outputs don't match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Apply suggestion from @yiyixuxu

* Apply suggestion from @yiyixuxu

* [docs] update contributing guide for the self-review skill

Replace the retired parity-testing skill with self-review in the skills list, and
add a 'Self-review before opening' step to the AI-assisted contributions section:
run the self-review skill / review-rules, fix blocking issues + dead code, and
treat the @claude CI review as a non-authoritative helper (note any intentional
skips in the PR for the reviewer).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* [.ai] fix dangling pitfalls ref and broaden self-review scope

- Drop the broken 'pitfalls.md #10' reference in the conversion step (the /tmp
  model-storage pitfall was removed); save to a local path instead.
- Self-review now reviews the whole diff, not just src/diffusers/ and .ai/ — a
  contributor should review their own tests/docs/scripts too (the CI's scoping is
  a safety measure for untrusted PRs). Reword to 'same rubric as the CI'.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants