Support InstantStyle by JY-Joy · Pull Request #7668 · huggingface/diffusers

JY-Joy · 2024-04-14T08:57:11Z

What does this PR do?

This PR is a follow-up from #7586 with modification as suggested by this comment.

Can now control the scales to IP-Adapter per-transformer block, set scale to 0 means skip the block. example usage:

To use the original IP-Adapter, simply set the scale to a float.

scale_config=1.0
pipeline.activate_ip_adapter(scale_config)

To use the style block (up_blocks.0.attentions.1).

scale_config = {
            "up": {
                "block_0": [0.0, 1.0, 0.0]
            },
        }
pipeline.activate_ip_adapter(scale_config)

To use style+layout blocks (up_blocks.0.attentions.1 and down_blocks.2.attentions.1).

scale_config = {
            "down": {
                "block_2": [0.0, 1.0]
            },
            "up": {
                "block_0": [0.0, 1.0, 0.0]
            },
        }
pipeline.activate_ip_adapter(scale_config)

haofanwang · 2024-04-14T09:01:22Z

@yiyixuxu @asomoza Could you review this new PR? @DannHuang will follow it up.

yiyixuxu · 2024-04-15T16:58:46Z

thanks for your PR!
@asomoza can you give this a first review and test it out?

HuggingFaceDocBuilderDev · 2024-04-15T17:06:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

asomoza · 2024-04-15T18:04:34Z

I’m currently testing it and comparing the results. In the meantime, I’m curious as to why you decided to create a new function instead of modifying set_ip_adapter_scale. Do you anticipate a use case where we’ll need to use both functions simultaneously?

asomoza

I've tested it and it works, thank you for you work. I left a couple of questions.

I think pretty much the use case for this is to be able to use one image for style and maybe another one for the composition, right now that's not possible though.

IMO we should enable that in this PR.

syle	composition	result	expected

JY-Joy · 2024-04-16T01:55:32Z

I’m currently testing it and comparing the results. In the meantime, I’m curious as to why you decided to create a new function instead of modifying set_ip_adapter_scale. Do you anticipate a use case where we’ll need to use both functions simultaneously?

I think you are right, the two functions essentially do the same thing and we shall merge them. Sorry for confusing.

JY-Joy · 2024-04-17T07:36:03Z

Hi all, this PR is updated. Specifically:

activate_ip_adapter() is now merged to set_ip_adapter_scale() and is fully compatible to the original usage. Now IP-Adapters can be controlled by a float or a list of float (which is the original usage), and also by a scale_config or a list of scale_configs. For example:

# To use style and layout from 2 reference images
scale_configs = [
            {
                "down": {
                    "block_2": [0.0, 1.0]
                }
            },
            {
                "up": {
                    "block_0": [0.0, 1.0, 0.0]
                }
            }
        ]
pipeline.set_ip_adapter_scale(scale_configs)

_maybe_expand_lora_scales() now takes an additional default_scale arg which has a default value 1.0. I believe the behavior of existing code is unchanged and the ip_adapter_utils.py is removed.

I've tested it and it works, thank you for you work. I left a couple of questions.

I think pretty much the use case for this is to be able to use one image for style and maybe another one for the composition, right now that's not possible though.

In case of multiple reference images, I believe we can achieve this by load the same IP-Adapter for multiple times, and set the correspond IP-Adapter to style-mode/layout-mode. The following code works for me:

...
pipe.load_ip_adapter(ip_adapter_path, subfolder="sdxl_models", weight_name=["ip-adapter_sdxl.bin", "ip-adapter_sdxl.bin"], image_encoder_path=image_encoder_path)
...
pipe.set_ip_adapter_scale(scale_configs)
images = pipe(
    prompt="a llama, masterpiece, best quality, high quality",
    ip_adapter_image=[style_img, composition_img],
    negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
    scale=1.0,
    guidance_scale=5,
    num_inference_steps=30,
    generator=generator,
).images

Please let me know if there is any other issue, thanks all :)

asomoza

Great work, just one comment and the rest looks good to me. The expected result also works. You can mark as resolved my comments from before.

@yiyixuxu this PR is ready for your final review

asomoza · 2024-04-17T12:18:33Z

For this:

In case of multiple reference images, I believe we can achieve this by load the same IP-Adapter for multiple times, and set the correspond IP-Adapter to style-mode/layout-mode.

This time I really don't see a solution to prevent loading the same weights multiple times without changing how IP Adapters are loaded in the pipelines.

JY-Joy · 2024-04-18T00:53:18Z

All the mentioned issues is solved. Please let me known if there is any other.

This time I really don't see a solution to prevent loading the same weights multiple times without changing how IP Adapters are loaded in the pipelines.

IMO we want to load two IP Adapters, one for style control and the other for layout. They happened to have same pre-trained weight in this specific case but they do not need to be the same actually. We believe this implementation have a minimum modification to the original IP-Adapter pipeline, but we have no idea if it is the best solution to @asomoza's case.

haofanwang · 2024-04-18T03:13:16Z

@yiyixuxu Should be ready to merge.

yiyixuxu

looking great! thanks!

yiyixuxu · 2024-04-18T08:39:18Z

cc @sayakpaul can you give a final review?

also, it seems like quality changed files that are not supposed to be changed - I've seen this issue in multiple PRs now, what's going on?

sayakpaul · 2024-04-19T03:02:38Z

also, it seems like quality changed files that are not supposed to be changed - I've seen this issue in multiple PRs now, what's going on?

Here's my hypothesis.

#7314 made changes to the dependencies included in "quality". Refer to the setup.py:

diffusers/setup.py

Line 209 in b5c8b55

extras["quality"] = deps_list("urllib3", "isort", "ruff", "hf-doc-builder")

So with this dependency included, whenever you run make style it will re-format the docstring and documentation pages if needed.

Contributors may not have updated the quality dependencies before running make quality and make style which is why there are likely changes in unexpected files. LMK if this is unclear.

sayakpaul · 2024-04-19T03:08:07Z

Yup, can confirm my hypothesis with c45b1c7. See the number of file changes dropped to 4 from 17.

sayakpaul

Looking great! Could we have some documentation with test code on how to use this feature as well?

When testing, prefer non-human objects.

@fabiorigano would be great to have your reviews on this too :-)

JY-Joy · 2024-04-19T05:08:52Z

Yup, can confirm my hypothesis with c45b1c7. See the number of file changes dropped to 4 from 17.

Thanks for your solution! This really confused us as the we passed the quality check at local version. We will test with the latest dependencies and solve the quality check ASAP.

JY-Joy · 2024-04-19T05:09:46Z

Looking great! Could we have some documentation with test code on how to use this feature as well?

When testing, prefer non-human objects.

@fabiorigano would be great to have your reviews on this too :-)

Absolutely, will update it soon.

JY-Joy · 2024-04-19T09:31:08Z

Most of the issues are resolved except those related to the use case of default_scale, I believe the current version is as expected. @sayakpaul Can I mark those comments as resolved? BTW, should I add documentation and tests to docs/source/en/using-diffusers/ip_adapter.md and tests/pipelines/ip_adapters?

JY-Joy · 2024-04-20T14:21:19Z

Hi all, I've just pushed some updates to this PR. Specifically:

multi masked IP inputs is now supported. For the case @yiyixuxu provides:

with InstantStyle that injecting ip_female_style and ip_male_style to only style layers:
now can set different scales for masked IP in set_ip_adapter_scale. For example, to set scales for 3 masked IP in up.blocks.0 layer, use this scale config:

scale_0 = { "up": { "block_0": [[0.75, 0.75, 0.3]]}}

this will set scale [0.75, 0.75, 0.3] for the corresponding 3 masked IP to both 3 transformer blocks in up.blocks.0. My solution here require a length-1 list over list of scales to avoid the ambiguity against specifying different scales for the transformer blocks. For details please refer to https://github.com/DannHuang/diffusers/blob/6fc9a3af2df947d99e80148cc7c40d4abb0ac86d/src/diffusers/loaders/unet_loader_utils.py#L143-L145
3) a test case from @yiyixuxu is included.
4) conflicts resolved.

JY-Joy · 2024-04-20T14:24:14Z

it seems like I still have some problems with my quality dependencies, can u help me with it again @sayakpaul, really thanks a lot!

fabiorigano · 2024-04-20T15:02:52Z

it seems like I still have some problems with my quality dependencies, can u help me with it again @sayakpaul, really thanks a lot!

you have to run make quality how to open a PR

haofanwang · 2024-04-20T16:09:57Z

@fabiorigano @sayakpaul Formatted.

yiyixuxu · 2024-04-22T23:20:37Z

very nice work! thank you all!!

wodsoe · 2024-04-24T14:06:46Z

run the example:
attention_processor.py", line 1157, in call
hidden_states.shape if encoder_hidden_states is None else encoder_hidden_states.shape
AttributeError: 'tuple' object has no attribute 'shape'
@DannHuang

JY-Joy · 2024-04-24T15:38:33Z

run the example: attention_processor.py", line 1157, in call hidden_states.shape if encoder_hidden_states is None else encoder_hidden_states.shape AttributeError: 'tuple' object has no attribute 'shape' @DannHuang

Hi @wodsoe,
It seems like there is a mistake in the example, really sorry for confusing.
The following code should work:

from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")

Please let me know if there is any further issue. Thanks a lot!

emberMd · 2024-04-25T09:38:00Z

I am not able to get the InstantStyle implementation working with IP-Adapter. It is not directly related to @wodsoe , but it comes from attention_processors.py too.

Im using the exact example from the Diffusers IP-Adapter documentation.

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch

pipeline = AutoPipelineForImage2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")

scale = {
    "down": {"block_2": [0.0, 1.0]},
    "up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)

style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")

generator = torch.Generator(device="cpu").manual_seed(42)
image = pipeline(
    prompt="a cat, masterpiece, best quality, high quality",
    image=style_image,
    negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
    guidance_scale=5,
    num_inference_steps=30,
    generator=generator,
).images[0]
image

Using AutoPipelineForImage2Image i got an error because the ip_adapter_image parameter is not defined in the code.

ValueError: <class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'> has the config param `encoder_hid_dim_type` set to 'ip_image_proj' which requires the keyword argument `image_embeds` to be passed in  `added_conditions`

Then, if I use AutoPipelineForText2Image giving 'style_image' as the 'ip_adapter_image' i get the next error. The same happens when with AutoPipelineForImage2Image using 'style_image' for both 'image' and 'ip_adapter_image'.

File ~/miniconda3/envs/if/lib/python3.10/site-packages/diffusers/models/attention_processor.py:2417, in IPAdapterAttnProcessor2_0.__call__(self, attn, hidden_states, encoder_hidden_states, attention_mask, temb, scale, ip_adapter_masks)
   2413         mask_downsample = mask_downsample.to(dtype=query.dtype, device=query.device)
   2415         current_ip_hidden_states = current_ip_hidden_states * mask_downsample
-> 2417     hidden_states = hidden_states + scale * current_ip_hidden_states
   2419 # linear proj
   2420 hidden_states = attn.to_out[0](hidden_states)

TypeError: unsupported operand type(s) for *: 'dict' and 'Tensor'

Maybe Im doing something wrong. IP-Adapter works fine when 'scale' is a float number for the traditional implementation, but it doesn't seem to work for this specific case. Thanks and sorry for the long reply :)

sayakpaul · 2024-04-25T09:59:41Z

I can reproduce this problem. The code snippet is the same as what's provided in https://huggingface.co/docs/diffusers/main/en/using-diffusers/ip_adapter#style--layout-control.

JY-Joy · 2024-04-25T11:59:20Z

Hi @emberMd, Thanks for trying InstantStyle! It's a little bit weird because replacing AutoPipelineForImage2Image with AutoPipelineForText2Image and passing style_image as ip_adapter_image to the pipe line works fine with me. Here is the complete code for your reference:

from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")

scale = {
    "down": {"block_2": [0.0, 1.0]},
    "up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)

style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
generator = torch.Generator(device="cpu").manual_seed(42)
image = pipeline(
    prompt="a cat, masterpiece, best quality, high quality",
    ip_adapter_image=style_image,
    negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
    guidance_scale=5,
    num_inference_steps=30,
    generator=generator,
).images[0]
image

For the second error in your reply, I guess it's due to the version problem as the dictionary type scale was set to each attn_processor. Could you please try clone the latest diffusers repo and directly install from source?

emberMd · 2024-04-25T12:07:16Z

Hi @DannHuang! thanks for the fast reply.

I guess it's weird, the code you passed still doesn't work for me. I imagined that it could be due to a version problem in Diffusers, but since I am already on 0.27.2 I wanted to confirm first that I was not doing anything strange in my code.

I will install from the repo and see if that solves it. Thanks so much for the help!

sayakpaul · 2024-04-25T12:14:27Z

image=style_image should be ip_adapter_image=style_image

JY-Joy · 2024-04-25T12:17:47Z

@sayakpaul my bad, thanks a lot! @emberMd the code snippet is updated.

from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")

scale = {
    "down": {"block_2": [0.0, 1.0]},
    "up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)

style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
generator = torch.Generator(device="cpu").manual_seed(42)
image = pipeline(
    prompt="a cat, masterpiece, best quality, high quality",
    ip_adapter_image=style_image,
    negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
    guidance_scale=5,
    num_inference_steps=30,
    generator=generator,
).images[0]
image

emberMd · 2024-04-25T12:24:15Z

Thanks @sayakpaul @DannHuang! Really appreciate it

I was aware of the 'image=style_image' error, but i was using DIffusers 0.27.2, now with 0.28.0.dev0 it works fine. My bad

JY-Joy · 2024-04-25T13:05:02Z

@emberMd Good to know the errors were solved. Hope you can enjoy InstantStyle :)

kadirnar · 2024-04-25T17:59:33Z

@DannHuang , @sayakpaul
I get an error when I add xformers and flash-attention.

AttributeError: 'tuple' object has no attribute 'shape'

yiyixuxu · 2024-04-26T00:10:01Z

@DannHuang can we update the doc with the correct example?

JY-Joy · 2024-04-26T01:38:52Z

@yiyixuxu Yeah sure, we will update the doc soon.

JY-Joy · 2024-04-28T13:04:59Z

@yiyixuxu Hi, the doc is updated in #7806. Please have a check!

yiyixuxu · 2024-04-28T20:35:25Z

@DannHuang
merged! thank you:)

* enable control ip-adapter per-transformer block on-the-fly --------- Co-authored-by: sayakpaul <spsayakpaul@gmail.com> Co-authored-by: ResearcherXman <xhs.research@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com>

enable control ip-adapter per-transformer block on-the-fly

1f6b84a

asomoza reviewed Apr 15, 2024

View reviewed changes

Comment thread src/diffusers/loaders/ip_adapter_utils.py Outdated

Comment thread src/diffusers/loaders/ip_adapter_utils.py Outdated

Comment thread src/diffusers/loaders/ip_adapter_utils.py Outdated

haofanwang mentioned this pull request Apr 16, 2024

Is this compatible with IP- Adapter faceID, can you explain how to use this code with pre trained IP-Adapter FaceID instantX-research/InstantStyle#27

Open

merge duplicate functions, enable multi IPA control

ef9694c

JY-Joy requested a review from UmerHA April 17, 2024 11:56

asomoza approved these changes Apr 17, 2024

View reviewed changes

Comment thread src/diffusers/loaders/ip_adapter.py Outdated

adapt to the repo's user warning convention

24224a0

make quality

9d09a56

yiyixuxu approved these changes Apr 18, 2024

View reviewed changes

Comment thread src/diffusers/loaders/ip_adapter.py Outdated

yiyixuxu requested review from sayakpaul and removed request for UmerHA April 18, 2024 08:37

fixed arg name

07a18fd

actual style.

c45b1c7

sayakpaul reviewed Apr 19, 2024

View reviewed changes

format doc-string, add ValueError

cb0ade6

JY-Joy added 3 commits April 20, 2024 12:52

ready for merge to remote main

5d0bdfa

Merge remote-tracking branch 'hf/main' into main

a98f498

support multiple masked IP inputs

6fc9a3a

format

1e550be

haofanwang mentioned this pull request Apr 21, 2024

关于面部风格化功能 instantX-research/InstantStyle#31

Open

yiyixuxu reviewed Apr 22, 2024

View reviewed changes

Comment thread utils/update_metadata.py Outdated

Update utils/update_metadata.py

8540c1c

yiyixuxu merged commit 21c747f into huggingface:main Apr 22, 2024

JY-Joy mentioned this pull request Apr 28, 2024

Update InstantStyle usage in IP-Adapter documentation #7806

Merged

6 tasks

Conversation

JY-Joy commented Apr 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

haofanwang commented Apr 14, 2024

Uh oh!

yiyixuxu commented Apr 15, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Apr 15, 2024

Uh oh!

asomoza commented Apr 15, 2024

Uh oh!

asomoza left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JY-Joy commented Apr 16, 2024

Uh oh!

JY-Joy commented Apr 17, 2024

Uh oh!

asomoza left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

asomoza commented Apr 17, 2024

Uh oh!

JY-Joy commented Apr 18, 2024

Uh oh!

haofanwang commented Apr 18, 2024

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yiyixuxu commented Apr 18, 2024

Uh oh!

sayakpaul commented Apr 19, 2024

Uh oh!

sayakpaul commented Apr 19, 2024

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JY-Joy commented Apr 19, 2024

Uh oh!

JY-Joy commented Apr 19, 2024

Uh oh!

JY-Joy commented Apr 19, 2024

Uh oh!

JY-Joy commented Apr 20, 2024

Uh oh!

JY-Joy commented Apr 20, 2024

Uh oh!

fabiorigano commented Apr 20, 2024

Uh oh!

haofanwang commented Apr 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

yiyixuxu commented Apr 22, 2024

Uh oh!

wodsoe commented Apr 24, 2024

Uh oh!

JY-Joy commented Apr 24, 2024

Uh oh!

emberMd commented Apr 25, 2024

Uh oh!

sayakpaul commented Apr 25, 2024

Uh oh!

JY-Joy commented Apr 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

JY-Joy commented Apr 14, 2024 •

edited

Loading

asomoza left a comment •

edited

Loading

haofanwang commented Apr 20, 2024 •

edited

Loading

JY-Joy commented Apr 25, 2024 •

edited

Loading