Skip to content

Enable BitsAndBytes quantization in MPS#13915

Merged
sayakpaul merged 3 commits into
huggingface:mainfrom
LucasSte:fix-mps-quant
Jun 23, 2026
Merged

Enable BitsAndBytes quantization in MPS#13915
sayakpaul merged 3 commits into
huggingface:mainfrom
LucasSte:fix-mps-quant

Conversation

@LucasSte

@LucasSte LucasSte commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Bitsandbytes now has basic support for the Apple MPS backend, as I can tell by bitsandbytes-foundation/bitsandbytes#1818 and
bitsandbytes-foundation/bitsandbytes#1875.

The issue is that diffusers does not allow me to use the quantization on Apple hardware, because of the error No GPU found. A GPU is needed for quantization. from here

if not (torch.cuda.is_available() or torch.xpu.is_available()):
raise RuntimeError("No GPU found. A GPU is needed for quantization.")
.

Adding MPS to that check, in addition to the other suggested change in #13361 (comment), allows us to enable BitsAndBytes for MPS.

I tested the change with the quantized version of FLUX.2-dev as described in https://github.com/black-forest-labs/flux2/blob/main/docs/flux2_dev_hf.md#4-bit-transformer-and-4-bit-text-encoder-20g-of-vram, and all worked fine.

This PR fixes #13361.

Before submitting

  • [X ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
    => Not the case.
  • [X ] Did you read the contributor guideline?
  • [X ] Did you read our philosophy doc (important for complex PRs)?
  • [X ] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
    => Enable MPS backend for bitsandbytes quantization #13361
  • [X ] Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
    => I haven't found any documentation that mentions supported backends on BitsAndBytes in the diffusers repository, so I didn't modify anything in the docs folder.
  • [X ] Did you write any new necessary tests?
    => I didn't write any tests, because the changes were trivial, but I can write them if reviewers deem necessary.

Who can review?

@sayakpaul
@asomoza

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul sayakpaul left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Could we also see some results from MPS? @asomoza would you be able to quickly do a test?

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@bghira

bghira commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

jfyi these BNB kernels are worse than just using torchao or quanto int8 because they're very lacking in optimisation

@iwr-redmond

Copy link
Copy Markdown

There was an alternative offered that could have been more performant (bitsandbytes#1853), but in either case this PR is still needed to enable general use and future optimization.

@bghira

bghira commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

for UX it should imo be converted to a warning instead of an exception, that the kernels are slow and experimental

@matthewdouglas

Copy link
Copy Markdown
Member

FWIW, there will be some improvement soon to MPS backend in our next bitsandbytes release. We've merged these so far since the last release:

With kernels available, on macOS 26+ we'll pull down some kernels hosted on the HF Hub. These are mostly focused on 4bit.

Separately, we'll continue to build on PR 1960 with a few more improvements, for the torch.compile fallback implementations which will be used on macOS 14/15, and when Hub kernels cannot be loaded.

So, I think it is reasonable to enable it. If anything, maybe guard on bitsandbytes >= 0.49.0 as the absolute floor. It's not clear to me if it's the responsibility of diffusers to warn you if it is going to be slow or not.

@huggingface huggingface deleted a comment from Vargol Jun 22, 2026
@sayakpaul

Copy link
Copy Markdown
Member

@Vargol thanks for your post but I had to delete because it looks NSFW to my eyes. Just letting you know for transparency.

@sayakpaul

Copy link
Copy Markdown
Member

@asomoza a gentle reminder.

@bghira

bghira commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

damn, requiring macos 26? but why? that causes a lockout of async simd as Apple has not provided any replacement.

@asomoza

asomoza commented Jun 23, 2026

Copy link
Copy Markdown
Member

@sayakpaul sorry I missed this, sadly currently I don't have any method to test MPS.

@sayakpaul

Copy link
Copy Markdown
Member

You mean any device? But anyway the changes in the PR look good enough to me to unblock MPS. So, will merge,

@sayakpaul sayakpaul merged commit e97a1ad into huggingface:main Jun 23, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable MPS backend for bitsandbytes quantization

7 participants