Enable BitsAndBytes quantization in MPS by LucasSte · Pull Request #13915 · huggingface/diffusers

LucasSte · 2026-06-10T22:19:48Z

What does this PR do?

Bitsandbytes now has basic support for the Apple MPS backend, as I can tell by bitsandbytes-foundation/bitsandbytes#1818 and
bitsandbytes-foundation/bitsandbytes#1875.

The issue is that diffusers does not allow me to use the quantization on Apple hardware, because of the error No GPU found. A GPU is needed for quantization. from here

diffusers/src/diffusers/quantizers/bitsandbytes/bnb_quantizer.py

Lines 64 to 65 in f2be8bd

    
           if not (torch.cuda.is_available() or torch.xpu.is_available()): 
        
               raise RuntimeError("No GPU found. A GPU is needed for quantization.")

.

Adding MPS to that check, in addition to the other suggested change in #13361 (comment), allows us to enable BitsAndBytes for MPS.

I tested the change with the quantized version of FLUX.2-dev as described in https://github.com/black-forest-labs/flux2/blob/main/docs/flux2_dev_hf.md#4-bit-transformer-and-4-bit-text-encoder-20g-of-vram, and all worked fine.

This PR fixes #13361.

Before submitting

[X ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
=> Not the case.
[X ] Did you read the contributor guideline?
[X ] Did you read our philosophy doc (important for complex PRs)?
[X ] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
=> Enable MPS backend for bitsandbytes quantization #13361
[X ] Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
=> I haven't found any documentation that mentions supported backends on BitsAndBytes in the diffusers repository, so I didn't modify anything in the docs folder.
[X ] Did you write any new necessary tests?
=> I didn't write any tests, because the changes were trivial, but I can write them if reviewers deem necessary.

Who can review?

@sayakpaul
@asomoza

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

sayakpaul

Thanks! Could we also see some results from MPS? @asomoza would you be able to quickly do a test?

HuggingFaceDocBuilderDev · 2026-06-11T02:26:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

bghira · 2026-06-18T18:30:30Z

jfyi these BNB kernels are worse than just using torchao or quanto int8 because they're very lacking in optimisation

iwr-redmond · 2026-06-18T19:24:21Z

There was an alternative offered that could have been more performant (bitsandbytes#1853), but in either case this PR is still needed to enable general use and future optimization.

bghira · 2026-06-18T19:49:27Z

for UX it should imo be converted to a warning instead of an exception, that the kernels are slow and experimental

matthewdouglas · 2026-06-22T23:41:13Z

FWIW, there will be some improvement soon to MPS backend in our next bitsandbytes release. We've merged these so far since the last release:

With kernels available, on macOS 26+ we'll pull down some kernels hosted on the HF Hub. These are mostly focused on 4bit.

Separately, we'll continue to build on PR 1960 with a few more improvements, for the torch.compile fallback implementations which will be used on macOS 14/15, and when Hub kernels cannot be loaded.

So, I think it is reasonable to enable it. If anything, maybe guard on bitsandbytes >= 0.49.0 as the absolute floor. It's not clear to me if it's the responsibility of diffusers to warn you if it is going to be slow or not.

sayakpaul · 2026-06-22T23:47:09Z

@Vargol thanks for your post but I had to delete because it looks NSFW to my eyes. Just letting you know for transparency.

sayakpaul · 2026-06-22T23:47:30Z

@asomoza a gentle reminder.

bghira · 2026-06-22T23:56:02Z

damn, requiring macos 26? but why? that causes a lockout of async simd as Apple has not provided any replacement.

asomoza · 2026-06-23T01:11:22Z

@sayakpaul sorry I missed this, sadly currently I don't have any method to test MPS.

sayakpaul · 2026-06-23T04:22:13Z

You mean any device? But anyway the changes in the PR look good enough to me to unblock MPS. So, will merge,

github-actions Bot added quantization size/S PR with diff < 50 LOC fixes-issue labels Jun 10, 2026

LucasSte mentioned this pull request Jun 10, 2026

Enable MPS backend for bitsandbytes quantization #13361

Closed

Fix BitsAndBytes quantization in MPS

fa0ad0b

LucasSte force-pushed the fix-mps-quant branch from e2b75b0 to fa0ad0b Compare June 10, 2026 22:21

sayakpaul approved these changes Jun 11, 2026

View reviewed changes

huggingface deleted a comment from Vargol Jun 22, 2026

Merge branch 'main' into fix-mps-quant

39c2c1b

Merge branch 'main' into fix-mps-quant

87d15e0

sayakpaul approved these changes Jun 23, 2026

View reviewed changes

sayakpaul merged commit e97a1ad into huggingface:main Jun 23, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable BitsAndBytes quantization in MPS#13915

Enable BitsAndBytes quantization in MPS#13915
sayakpaul merged 3 commits into
huggingface:mainfrom
LucasSte:fix-mps-quant

LucasSte commented Jun 10, 2026 •

edited

Loading

Uh oh!

sayakpaul left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Jun 11, 2026

Uh oh!

bghira commented Jun 18, 2026

Uh oh!

iwr-redmond commented Jun 18, 2026

Uh oh!

bghira commented Jun 18, 2026

Uh oh!

matthewdouglas commented Jun 22, 2026

Uh oh!

sayakpaul commented Jun 22, 2026

Uh oh!

sayakpaul commented Jun 22, 2026

Uh oh!

bghira commented Jun 22, 2026

Uh oh!

asomoza commented Jun 23, 2026

Uh oh!

sayakpaul commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

	if not (torch.cuda.is_available() or torch.xpu.is_available()):
	raise RuntimeError("No GPU found. A GPU is needed for quantization.")

Uh oh!

Conversation

LucasSte commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jun 11, 2026

Uh oh!

bghira commented Jun 18, 2026

Uh oh!

iwr-redmond commented Jun 18, 2026

Uh oh!

bghira commented Jun 18, 2026

Uh oh!

matthewdouglas commented Jun 22, 2026

Uh oh!

sayakpaul commented Jun 22, 2026

Uh oh!

sayakpaul commented Jun 22, 2026

Uh oh!

bghira commented Jun 22, 2026

Uh oh!

asomoza commented Jun 23, 2026

Uh oh!

sayakpaul commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

LucasSte commented Jun 10, 2026 •

edited

Loading