[WIP] Refactor Model Design by DN6 · Pull Request #13794 · huggingface/diffusers

DN6 · 2026-05-22T19:59:03Z

What does this PR do?

This refactor turns models into self-contained modules that declare their capabilities in one place. Per-model conversion code moves next to the model and a unified metadata() API makes feature attributes inspectable from any model class.

Motivation

Today, features are added to models through a mix of class attributes and mixins. Mixins define their own class attributes as well, so when examining a model class it isn't immediately clear which attributes and features are relevant or available.

Models are defined in a single file, so we end up using centralized utility files for things like model-specific weight and LoRA conversions. These files have grown enormous as they accumulate code to handle per-model variants and their idiosyncrasies.

The new design makes the mixins model-agnostic and has each mixin reach for the per-model metadata it needs through small handler objects attached to the model class.

Proposed Structure

Using Flux as a reference:

models/transformers/flux/
├── __init__.py
├── _ip_adapter.py        # FluxIPAdapterMixin + converters (internal)
├── _lora.py              # FLUX_LORA handler + per-format converters (internal)
├── _weight_mapping.py    # FLUX_WEIGHT_MAPPING handler + key tables (internal)
└── model.py              # FluxTransformer2DModel class declaration

Two patterns live next to model.py, picked per subsystem based on whether the behavior actually generalizes across models:

handler + shared mixin — for features where the steps are the same across models and only the data/conversion function varies. The model opts in by inheriting the shared mixin in loaders/ and assigning its handler as a class attribute. LoRA and single-file weight mapping fit here:
```
class FluxTransformer2DModel(ModelMixin, LoRAModelMixin, ...):
    _lora = LoRAHandler("...")               # handler instance, consumed by LoRAModelMixin
```
Per-model mixin — for features that vary too much across models for a single shared mixin to be useful. Each model gets its own mixin declared right next to the model and inherited directly. IP-Adapter is the showcase:
```
class FluxTransformer2DModel(..., FluxIPAdapterMixin):
    ...
```

This should simplify developing on top of these models — modifications or enhancements stay within one folder. If a model has a very specific feature that doesn't generalize across others, it can be kept isolated there too (e.g. FreeNoise for the AnimateDiff UNet). Additionally, if a custom model is modifying an existing diffusers model (Self-Forcing Wan), the folder method of organizing the model lends itself well to custom code loading with AutoModel.

Features Introduced

Model capability introspection via `Model.metadata()`

Each model exposes a metadata() classmethod that returns a metadata object, keyed by the class attribute that controls each feature. The displayed row tells you exactly what to set or inherit to change the behavior.

>>> print(FluxTransformer2DModel.metadata())
FluxTransformer2DModel feature attributes
──────────────────────────────────────────────────────────────────────────────────
  _supports_gradient_checkpointing  True
  _supports_group_offloading        True
  _no_split_modules                 FluxTransformerBlock, FluxSingleTransformerBlock
  _skip_layerwise_casting_patterns  pos_embed, norm
  _repeated_blocks                  FluxTransformerBlock, FluxSingleTransformerBlock
  _cp_plan                          True
  _weight_mapping                   flux-depth, flux-dev, flux-fill, flux-schnell
  _lora                             bfl, kohya, kontext, xlabs
  _supports_cache                   True
  _supports_ip_adapter              True

The returned ModelMetadata exposes each feature value as an attribute (meta._supports_ip_adapter, meta._lora, ...), supports keys() / values() / items() for mapping-style iteration, and in for presence checks. meta.describe(verbose=True) adds an indented description and docs link under each row. Which can be useful for agents.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

yiyixuxu

Thanks so much for working on this!
I really like the direction this is taking, very exciting!
It's a big PR with a lot of changes, so I'll go through it step by step - left some initial questions:)

yiyixuxu · 2026-05-29T02:30:22Z

    _default_processor_cls = None
    _available_processors = []
    _supports_qkv_fusion = True
+    _parallel_config = None


why do we need it here?

yiyixuxu · 2026-05-29T04:22:49Z

+                f"{DOCS_BASE}/optimization/memory#gradient-checkpointing",
+            )
+        if cls._supports_group_offloading:
+            rows["_supports_group_offloading"] = (


I really like the metadata + mixin direction but I'd like to understand a bit more: is there a fundamental reason why some features get their own self-contained mixin (Cache, Lora) and many others stay as methods on ModelMixin?

yiyixuxu · 2026-05-29T05:47:36Z

+        return "\n".join(lines)
+
+
+def register_metadata(metadata):


Is there a reason we have to keep two systems to attach metadata to a class? the register_metadata vs the class attribute like _cp_plan?

I think that is because for handling with stuff like _cp_plan we don't have a mixin like CPMixin. However for others, we have dedicated mixins.

But do we need this registration, though? All the available feature set can be queried through the main model class, no?

yiyixuxu · 2026-05-29T05:48:59Z

+
+
+@dataclass
+class TransformerBlockOutput(TransformerModuleOutput):


these are not used yet no?

sayakpaul · 2026-05-29T09:06:01Z

    _cached_parameter_indices: dict[str, int] = None

+    def _register(self, cls):
+        """Attach this metadata to ``cls`` and register it in :class:`TransformerBlockRegistry`.


Single "" is what we use across diffusers. Also, :class:` isn't something we do.

Suggested change

"""Attach this metadata to ``cls`` and register it in :class:`TransformerBlockRegistry`.

"""Attach this metadata to `cls` and register it in `TransformerBlockRegistry`.

sayakpaul

Did a pass on LoRA. Will now do a pass on modeling_utils.py.

I think it might be better to do the LoRA-related separation in another PR because it's difficult to truly assess if it's in a working state.

We are likely missing andling Flux Control LoRA and as such, handling text-encoder LoRA modules. It's also not clear how this would affect lora_pipeline.py and lora_conversion_utils.py.
One-off utilities that are not shared across multiple different functions / classes.
It's not clear how the existing https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/peft.py gets affected by this.

sayakpaul · 2026-05-29T09:06:37Z

    _cls: Type = None
    _cached_parameter_indices: dict[str, int] = None

+    def _register(self, cls):


I am a little unclear about the docstring. How is this method metadata (which is what the docstring reads)?

sayakpaul · 2026-05-29T09:07:13Z

+    def _register(self, cls):
+        """Attach this metadata to ``cls`` and register it in :class:`TransformerBlockRegistry`.
+
+        Lets ``@register_metadata(TransformerBlockMetadata(...))`` work for block classes that opt into the decorator


sayakpaul · 2026-05-29T09:08:51Z

+        self._cls = cls
+        cls._block_metadata = self
+        TransformerBlockRegistry._registry[cls] = self


Is it only applicable to classes with _repeated_blocks set?

sayakpaul · 2026-05-29T09:28:28Z



 @maybe_allow_in_graph
+@register_metadata(TransformerBlockMetadata(return_hidden_states_index=1, return_encoder_hidden_states_index=0))


What is the purpose of this?

sayakpaul · 2026-05-29T09:31:01Z

+            "_supports_cache": (
+                True,
+                "True",
+                "Supports caching techniques (PAB / FasterCache / FirstBlockCache) via `enable_cache`.",


Suggested change

"Supports caching techniques (PAB / FasterCache / FirstBlockCache) via `enable_cache`.",

"Supports caching techniques (e.g. FasterCache) via `enable_cache`.",

sayakpaul · 2026-05-29T10:00:54Z

+        r"""
+        Add an adapter to the underlying model.
+
+        ``source`` can be either:


There is nothing called source here.

sayakpaul · 2026-05-29T10:02:32Z

+
+        _maybe_warn_for_unhandled_keys(incompatible_keys, adapter_name)
+
+    def _inject_adapter(self, state_dict, lora_config, adapter_name, peft_kwargs):


Will prefer it inline as it's not shared.

sayakpaul · 2026-05-29T10:02:42Z

+            self._rollback_adapter(adapter_name, e)
+            raise
+
+    def _maybe_apply_deferred_hotswap_prep(self, lora_config):


sayakpaul · 2026-05-29T10:02:50Z

+        prepare_model_for_compiled_hotswap(self, config=lora_config, **self._lora_hotswap_kwargs)
+        self._lora_hotswap_kwargs = None
+
+    def _hotswap_adapter(self, state_dict, lora_config, adapter_name):


sayakpaul · 2026-05-29T10:02:56Z

+            self._rollback_adapter(adapter_name, e)
+            raise
+
+    def _rollback_adapter(self, adapter_name, error):


sayakpaul · 2026-05-29T10:07:55Z

+
+
+@dataclass
+class AttnProcessorOutput(TransformerModuleOutput):


Why does AttnProcessorOutput has to live in src/diffusers/models/transformers/utils.py? Could it not be used by VAEs or other components under src/diffusers/models/?

sayakpaul · 2026-05-29T10:12:55Z

+class ModelMetadata:
+    """Snapshot of a model class's feature attributes.
+
+    Constructed by :meth:`ModelMixin.metadata` — walks ``cls.__mro__`` collecting rows from each mixin's ``_metadata``


:meth: can be dangerous abbreviation 🤪

sayakpaul · 2026-05-29T10:14:24Z

+        return "\n".join(lines)
+
+
+def register_metadata(metadata):


I think that is because for handling with stuff like _cp_plan we don't have a mixin like CPMixin. However for others, we have dedicated mixins.

sayakpaul · 2026-05-29T10:15:54Z

+        return "\n".join(lines)
+
+
+def register_metadata(metadata):


But do we need this registration, though? All the available feature set can be queried through the main model class, no?

DN6 added 21 commits May 19, 2026 14:18

update

1d3c5c5

update

96c078e

update

8b458f8

update

0d1c885

update

a5ba743

update

644c3e7

update

d73d985

update

eefc961

update

99ba461

update

12ae376

update

68a3e9a

update

9831039

update

dafb81c

update

086b4cf

update

c026a68

update

ecd307d

update

d66b366

update

1fb496a

update

b430231

update

5eaaa3f

update

774807b

github-actions Bot added size/L PR with diff > 200 LOC models utils single-file hooks labels May 22, 2026

DN6 marked this pull request as ready for review May 27, 2026 16:35

DN6 requested review from dg845, sayakpaul and yiyixuxu May 27, 2026 16:35

DN6 requested a review from asomoza May 27, 2026 16:36

yiyixuxu reviewed May 29, 2026

View reviewed changes

sayakpaul reviewed May 29, 2026

View reviewed changes



		@dataclass
		class TransformerBlockOutput(TransformerModuleOutput):

	"""Attach this metadata to ``cls`` and register it in :class:`TransformerBlockRegistry`.
	"""Attach this metadata to `cls` and register it in `TransformerBlockRegistry`.



		@maybe_allow_in_graph
		@register_metadata(TransformerBlockMetadata(return_hidden_states_index=1, return_encoder_hidden_states_index=0))

	"Supports caching techniques (PAB / FasterCache / FirstBlockCache) via `enable_cache`.",
	"Supports caching techniques (e.g. FasterCache) via `enable_cache`.",


		_maybe_warn_for_unhandled_keys(incompatible_keys, adapter_name)

		def _inject_adapter(self, state_dict, lora_config, adapter_name, peft_kwargs):



		@dataclass
		class AttnProcessorOutput(TransformerModuleOutput):

Conversation

DN6 commented May 22, 2026

What does this PR do?

Motivation

Proposed Structure

Features Introduced

Model capability introspection via Model.metadata()

Before submitting

Who can review?

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Model capability introspection via `Model.metadata()`