Enable BitsAndBytes quantization in MPS#13915
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
jfyi these BNB kernels are worse than just using torchao or quanto int8 because they're very lacking in optimisation |
|
There was an alternative offered that could have been more performant (bitsandbytes#1853), but in either case this PR is still needed to enable general use and future optimization. |
|
for UX it should imo be converted to a warning instead of an exception, that the kernels are slow and experimental |
|
FWIW, there will be some improvement soon to MPS backend in our next bitsandbytes release. We've merged these so far since the last release:
With Separately, we'll continue to build on PR 1960 with a few more improvements, for the So, I think it is reasonable to enable it. If anything, maybe guard on |
|
@Vargol thanks for your post but I had to delete because it looks NSFW to my eyes. Just letting you know for transparency. |
|
@asomoza a gentle reminder. |
|
damn, requiring macos 26? but why? that causes a lockout of async simd as Apple has not provided any replacement. |
|
@sayakpaul sorry I missed this, sadly currently I don't have any method to test MPS. |
|
You mean any device? But anyway the changes in the PR look good enough to me to unblock MPS. So, will merge, |
What does this PR do?
Bitsandbytes now has basic support for the Apple MPS backend, as I can tell by bitsandbytes-foundation/bitsandbytes#1818 and
bitsandbytes-foundation/bitsandbytes#1875.
The issue is that diffusers does not allow me to use the quantization on Apple hardware, because of the error
No GPU found. A GPU is needed for quantization.from herediffusers/src/diffusers/quantizers/bitsandbytes/bnb_quantizer.py
Lines 64 to 65 in f2be8bd
Adding MPS to that check, in addition to the other suggested change in #13361 (comment), allows us to enable BitsAndBytes for MPS.
I tested the change with the quantized version of FLUX.2-dev as described in https://github.com/black-forest-labs/flux2/blob/main/docs/flux2_dev_hf.md#4-bit-transformer-and-4-bit-text-encoder-20g-of-vram, and all worked fine.
This PR fixes #13361.
Before submitting
=> Not the case.
=> Enable MPS backend for bitsandbytes quantization #13361
documentation guidelines, and
here are tips on formatting docstrings.
=> I haven't found any documentation that mentions supported backends on BitsAndBytes in the diffusers repository, so I didn't modify anything in the docs folder.
=> I didn't write any tests, because the changes were trivial, but I can write them if reviewers deem necessary.
Who can review?
@sayakpaul
@asomoza
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.