Skip to content

Support InstantStyle#7668

Merged
yiyixuxu merged 14 commits into
huggingface:mainfrom
JY-Joy:main
Apr 22, 2024
Merged

Support InstantStyle#7668
yiyixuxu merged 14 commits into
huggingface:mainfrom
JY-Joy:main

Conversation

@JY-Joy

@JY-Joy JY-Joy commented Apr 14, 2024

Copy link
Copy Markdown
Contributor

What does this PR do?

This PR is a follow-up from #7586 with modification as suggested by this comment.

Can now control the scales to IP-Adapter per-transformer block, set scale to 0 means skip the block. example usage:

  1. To use the original IP-Adapter, simply set the scale to a float.
scale_config=1.0
pipeline.activate_ip_adapter(scale_config)
  1. To use the style block (up_blocks.0.attentions.1).
scale_config = {
            "up": {
                "block_0": [0.0, 1.0, 0.0]
            },
        }
pipeline.activate_ip_adapter(scale_config)
  1. To use style+layout blocks (up_blocks.0.attentions.1 and down_blocks.2.attentions.1).
scale_config = {
            "down": {
                "block_2": [0.0, 1.0]
            },
            "up": {
                "block_0": [0.0, 1.0, 0.0]
            },
        }
pipeline.activate_ip_adapter(scale_config)

@haofanwang

Copy link
Copy Markdown
Contributor

@yiyixuxu @asomoza Could you review this new PR? @DannHuang will follow it up.

@yiyixuxu

Copy link
Copy Markdown
Collaborator

thanks for your PR!
@asomoza can you give this a first review and test it out?

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@asomoza

asomoza commented Apr 15, 2024

Copy link
Copy Markdown
Member

I’m currently testing it and comparing the results. In the meantime, I’m curious as to why you decided to create a new function instead of modifying set_ip_adapter_scale. Do you anticipate a use case where we’ll need to use both functions simultaneously?

@asomoza asomoza left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested it and it works, thank you for you work. I left a couple of questions.

I think pretty much the use case for this is to be able to use one image for style and maybe another one for the composition, right now that's not possible though.

IMO we should enable that in this PR.

syle composition result expected
20240401142319 20240415152914 20240415165501 20240415172455

Comment thread src/diffusers/loaders/ip_adapter_utils.py Outdated
Comment thread src/diffusers/loaders/ip_adapter_utils.py Outdated
Comment thread src/diffusers/loaders/ip_adapter_utils.py Outdated
@JY-Joy

JY-Joy commented Apr 16, 2024

Copy link
Copy Markdown
Contributor Author

I’m currently testing it and comparing the results. In the meantime, I’m curious as to why you decided to create a new function instead of modifying set_ip_adapter_scale. Do you anticipate a use case where we’ll need to use both functions simultaneously?

I think you are right, the two functions essentially do the same thing and we shall merge them. Sorry for confusing.

@JY-Joy

JY-Joy commented Apr 17, 2024

Copy link
Copy Markdown
Contributor Author

Hi all, this PR is updated. Specifically:

  1. activate_ip_adapter() is now merged to set_ip_adapter_scale() and is fully compatible to the original usage. Now IP-Adapters can be controlled by a float or a list of float (which is the original usage), and also by a scale_config or a list of scale_configs. For example:
# To use style and layout from 2 reference images
scale_configs = [
            {
                "down": {
                    "block_2": [0.0, 1.0]
                }
            },
            {
                "up": {
                    "block_0": [0.0, 1.0, 0.0]
                }
            }
        ]
pipeline.set_ip_adapter_scale(scale_configs)
  1. _maybe_expand_lora_scales() now takes an additional default_scale arg which has a default value 1.0. I believe the behavior of existing code is unchanged and the ip_adapter_utils.py is removed.

I've tested it and it works, thank you for you work. I left a couple of questions.

I think pretty much the use case for this is to be able to use one image for style and maybe another one for the composition, right now that's not possible though.

In case of multiple reference images, I believe we can achieve this by load the same IP-Adapter for multiple times, and set the correspond IP-Adapter to style-mode/layout-mode. The following code works for me:

...
pipe.load_ip_adapter(ip_adapter_path, subfolder="sdxl_models", weight_name=["ip-adapter_sdxl.bin", "ip-adapter_sdxl.bin"], image_encoder_path=image_encoder_path)
...
pipe.set_ip_adapter_scale(scale_configs)
images = pipe(
    prompt="a llama, masterpiece, best quality, high quality",
    ip_adapter_image=[style_img, composition_img],
    negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
    scale=1.0,
    guidance_scale=5,
    num_inference_steps=30,
    generator=generator,
).images

Please let me know if there is any other issue, thanks all :)

@JY-Joy JY-Joy requested a review from UmerHA April 17, 2024 11:56

@asomoza asomoza left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, just one comment and the rest looks good to me. The expected result also works. You can mark as resolved my comments from before.

@yiyixuxu this PR is ready for your final review

Comment thread src/diffusers/loaders/ip_adapter.py Outdated
@asomoza

asomoza commented Apr 17, 2024

Copy link
Copy Markdown
Member

For this:

In case of multiple reference images, I believe we can achieve this by load the same IP-Adapter for multiple times, and set the correspond IP-Adapter to style-mode/layout-mode.

This time I really don't see a solution to prevent loading the same weights multiple times without changing how IP Adapters are loaded in the pipelines.

@JY-Joy

JY-Joy commented Apr 18, 2024

Copy link
Copy Markdown
Contributor Author

All the mentioned issues is solved. Please let me known if there is any other.

This time I really don't see a solution to prevent loading the same weights multiple times without changing how IP Adapters are loaded in the pipelines.

IMO we want to load two IP Adapters, one for style control and the other for layout. They happened to have same pre-trained weight in this specific case but they do not need to be the same actually. We believe this implementation have a minimum modification to the original IP-Adapter pipeline, but we have no idea if it is the best solution to @asomoza's case.

@haofanwang

Copy link
Copy Markdown
Contributor

@yiyixuxu Should be ready to merge.

@yiyixuxu yiyixuxu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking great! thanks!

Comment thread src/diffusers/loaders/ip_adapter.py Outdated
@yiyixuxu yiyixuxu requested review from sayakpaul and removed request for UmerHA April 18, 2024 08:37
@yiyixuxu

Copy link
Copy Markdown
Collaborator

cc @sayakpaul can you give a final review?

also, it seems like quality changed files that are not supposed to be changed - I've seen this issue in multiple PRs now, what's going on?

@sayakpaul

Copy link
Copy Markdown
Member

also, it seems like quality changed files that are not supposed to be changed - I've seen this issue in multiple PRs now, what's going on?

Here's my hypothesis.

#7314 made changes to the dependencies included in "quality". Refer to the setup.py:

extras["quality"] = deps_list("urllib3", "isort", "ruff", "hf-doc-builder")

So with this dependency included, whenever you run make style it will re-format the docstring and documentation pages if needed.

Contributors may not have updated the quality dependencies before running make quality and make style which is why there are likely changes in unexpected files. LMK if this is unclear.

@sayakpaul

Copy link
Copy Markdown
Member

Yup, can confirm my hypothesis with c45b1c7. See the number of file changes dropped to 4 from 17.

@sayakpaul sayakpaul left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Could we have some documentation with test code on how to use this feature as well?

When testing, prefer non-human objects.

@fabiorigano would be great to have your reviews on this too :-)

Comment thread src/diffusers/loaders/ip_adapter.py Outdated
Comment thread src/diffusers/loaders/ip_adapter.py
Comment thread src/diffusers/loaders/unet_loader_utils.py
Comment thread src/diffusers/loaders/unet_loader_utils.py
Comment thread src/diffusers/loaders/unet_loader_utils.py
Comment thread src/diffusers/loaders/unet_loader_utils.py Outdated
@JY-Joy

JY-Joy commented Apr 19, 2024

Copy link
Copy Markdown
Contributor Author

Yup, can confirm my hypothesis with c45b1c7. See the number of file changes dropped to 4 from 17.

Thanks for your solution! This really confused us as the we passed the quality check at local version. We will test with the latest dependencies and solve the quality check ASAP.

@JY-Joy

JY-Joy commented Apr 19, 2024

Copy link
Copy Markdown
Contributor Author

Looking great! Could we have some documentation with test code on how to use this feature as well?

When testing, prefer non-human objects.

@fabiorigano would be great to have your reviews on this too :-)

Absolutely, will update it soon.

@JY-Joy

JY-Joy commented Apr 19, 2024

Copy link
Copy Markdown
Contributor Author

Most of the issues are resolved except those related to the use case of default_scale, I believe the current version is as expected. @sayakpaul Can I mark those comments as resolved? BTW, should I add documentation and tests to docs/source/en/using-diffusers/ip_adapter.md and tests/pipelines/ip_adapters?

@JY-Joy

JY-Joy commented Apr 20, 2024

Copy link
Copy Markdown
Contributor Author

Hi all, I've just pushed some updates to this PR. Specifically:

  1. multi masked IP inputs is now supported. For the case @yiyixuxu provides:
    multi_masks_org_out
    with InstantStyle that injecting ip_female_style and ip_male_style to only style layers:
    multi_mask
  2. now can set different scales for masked IP in set_ip_adapter_scale. For example, to set scales for 3 masked IP in up.blocks.0 layer, use this scale config:
scale_0 = { "up": { "block_0": [[0.75, 0.75, 0.3]]}}

this will set scale [0.75, 0.75, 0.3] for the corresponding 3 masked IP to both 3 transformer blocks in up.blocks.0. My solution here require a length-1 list over list of scales to avoid the ambiguity against specifying different scales for the transformer blocks. For details please refer to https://github.com/DannHuang/diffusers/blob/6fc9a3af2df947d99e80148cc7c40d4abb0ac86d/src/diffusers/loaders/unet_loader_utils.py#L143-L145
3) a test case from @yiyixuxu is included.
4) conflicts resolved.

@JY-Joy

JY-Joy commented Apr 20, 2024

Copy link
Copy Markdown
Contributor Author

it seems like I still have some problems with my quality dependencies, can u help me with it again @sayakpaul, really thanks a lot!

@fabiorigano

Copy link
Copy Markdown
Contributor

it seems like I still have some problems with my quality dependencies, can u help me with it again @sayakpaul, really thanks a lot!

you have to run make quality how to open a PR

@haofanwang

haofanwang commented Apr 20, 2024

Copy link
Copy Markdown
Contributor

@fabiorigano @sayakpaul Formatted.

Comment thread utils/update_metadata.py Outdated
@yiyixuxu yiyixuxu merged commit 21c747f into huggingface:main Apr 22, 2024
@yiyixuxu

Copy link
Copy Markdown
Collaborator

very nice work! thank you all!!

@wodsoe

wodsoe commented Apr 24, 2024

Copy link
Copy Markdown

0

run the example:
attention_processor.py", line 1157, in call
hidden_states.shape if encoder_hidden_states is None else encoder_hidden_states.shape
AttributeError: 'tuple' object has no attribute 'shape'

@DannHuang

@JY-Joy

JY-Joy commented Apr 24, 2024

Copy link
Copy Markdown
Contributor Author

run the example: attention_processor.py", line 1157, in call hidden_states.shape if encoder_hidden_states is None else encoder_hidden_states.shape AttributeError: 'tuple' object has no attribute 'shape' @DannHuang

Hi @wodsoe,
It seems like there is a mistake in the example, really sorry for confusing.
The following code should work:

from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")

Please let me know if there is any further issue. Thanks a lot!

@emberMd

emberMd commented Apr 25, 2024

Copy link
Copy Markdown

I am not able to get the InstantStyle implementation working with IP-Adapter. It is not directly related to @wodsoe , but it comes from attention_processors.py too.

Im using the exact example from the Diffusers IP-Adapter documentation.

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch

pipeline = AutoPipelineForImage2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")

scale = {
    "down": {"block_2": [0.0, 1.0]},
    "up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)

style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")

generator = torch.Generator(device="cpu").manual_seed(42)
image = pipeline(
    prompt="a cat, masterpiece, best quality, high quality",
    image=style_image,
    negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
    guidance_scale=5,
    num_inference_steps=30,
    generator=generator,
).images[0]
image

Using AutoPipelineForImage2Image i got an error because the ip_adapter_image parameter is not defined in the code.

ValueError: <class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'> has the config param `encoder_hid_dim_type` set to 'ip_image_proj' which requires the keyword argument `image_embeds` to be passed in  `added_conditions`

Then, if I use AutoPipelineForText2Image giving 'style_image' as the 'ip_adapter_image' i get the next error. The same happens when with AutoPipelineForImage2Image using 'style_image' for both 'image' and 'ip_adapter_image'.

File ~/miniconda3/envs/if/lib/python3.10/site-packages/diffusers/models/attention_processor.py:2417, in IPAdapterAttnProcessor2_0.__call__(self, attn, hidden_states, encoder_hidden_states, attention_mask, temb, scale, ip_adapter_masks)
   2413         mask_downsample = mask_downsample.to(dtype=query.dtype, device=query.device)
   2415         current_ip_hidden_states = current_ip_hidden_states * mask_downsample
-> 2417     hidden_states = hidden_states + scale * current_ip_hidden_states
   2419 # linear proj
   2420 hidden_states = attn.to_out[0](hidden_states)

TypeError: unsupported operand type(s) for *: 'dict' and 'Tensor'

Maybe Im doing something wrong. IP-Adapter works fine when 'scale' is a float number for the traditional implementation, but it doesn't seem to work for this specific case. Thanks and sorry for the long reply :)

@sayakpaul

Copy link
Copy Markdown
Member

I can reproduce this problem. The code snippet is the same as what's provided in https://huggingface.co/docs/diffusers/main/en/using-diffusers/ip_adapter#style--layout-control.

@JY-Joy

JY-Joy commented Apr 25, 2024

Copy link
Copy Markdown
Contributor Author

Hi @emberMd, Thanks for trying InstantStyle! It's a little bit weird because replacing AutoPipelineForImage2Image with AutoPipelineForText2Image and passing style_image as ip_adapter_image to the pipe line works fine with me. Here is the complete code for your reference:

from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")

scale = {
    "down": {"block_2": [0.0, 1.0]},
    "up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)

style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
generator = torch.Generator(device="cpu").manual_seed(42)
image = pipeline(
    prompt="a cat, masterpiece, best quality, high quality",
    ip_adapter_image=style_image,
    negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
    guidance_scale=5,
    num_inference_steps=30,
    generator=generator,
).images[0]
image

For the second error in your reply, I guess it's due to the version problem as the dictionary type scale was set to each attn_processor. Could you please try clone the latest diffusers repo and directly install from source?

@emberMd

emberMd commented Apr 25, 2024

Copy link
Copy Markdown

Hi @DannHuang! thanks for the fast reply.

I guess it's weird, the code you passed still doesn't work for me. I imagined that it could be due to a version problem in Diffusers, but since I am already on 0.27.2 I wanted to confirm first that I was not doing anything strange in my code.

I will install from the repo and see if that solves it. Thanks so much for the help!

@sayakpaul

Copy link
Copy Markdown
Member

image=style_image should be ip_adapter_image=style_image

@JY-Joy

JY-Joy commented Apr 25, 2024

Copy link
Copy Markdown
Contributor Author

@sayakpaul my bad, thanks a lot! @emberMd the code snippet is updated.

from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")

scale = {
    "down": {"block_2": [0.0, 1.0]},
    "up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)

style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
generator = torch.Generator(device="cpu").manual_seed(42)
image = pipeline(
    prompt="a cat, masterpiece, best quality, high quality",
    ip_adapter_image=style_image,
    negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
    guidance_scale=5,
    num_inference_steps=30,
    generator=generator,
).images[0]
image

@emberMd

emberMd commented Apr 25, 2024

Copy link
Copy Markdown

Thanks @sayakpaul @DannHuang! Really appreciate it

I was aware of the 'image=style_image' error, but i was using DIffusers 0.27.2, now with 0.28.0.dev0 it works fine. My bad

@JY-Joy

JY-Joy commented Apr 25, 2024

Copy link
Copy Markdown
Contributor Author

@emberMd Good to know the errors were solved. Hope you can enjoy InstantStyle :)

@kadirnar

Copy link
Copy Markdown
Contributor

@DannHuang , @sayakpaul
I get an error when I add xformers and flash-attention.

AttributeError: 'tuple' object has no attribute 'shape'

@yiyixuxu

Copy link
Copy Markdown
Collaborator

@DannHuang can we update the doc with the correct example?

@JY-Joy

JY-Joy commented Apr 26, 2024

Copy link
Copy Markdown
Contributor Author

@yiyixuxu Yeah sure, we will update the doc soon.

@JY-Joy

JY-Joy commented Apr 28, 2024

Copy link
Copy Markdown
Contributor Author

@yiyixuxu Hi, the doc is updated in #7806. Please have a check!

@yiyixuxu

Copy link
Copy Markdown
Collaborator

@DannHuang
merged! thank you:)

sayakpaul added a commit that referenced this pull request Dec 23, 2024
* enable control ip-adapter per-transformer block on-the-fly

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: ResearcherXman <xhs.research@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.