fix: prevent crash in case of a mem alloc error and graceful exit#1566
fix: prevent crash in case of a mem alloc error and graceful exit#1566akleine wants to merge 1 commit into
Conversation
...and ensure a graceful exit
| continue; | ||
| } | ||
|
|
||
| if (dst_tensor->data == nullptr) { |
There was a problem hiding this comment.
I'm not sure this is the best place to check for this, because the allocation can't fail for individual tensors; it's all-or-nothing.
Take a look at StableDiffusionGGML::init, in stable-diffusion.cpp, a bit before load_tensors: some components do report allocation failure, but not all. We should probably propagate the underlying alloc_params_buffer error in all cases.
There was a problem hiding this comment.
I have seen that code in stable-diffusion.cpp and wondered about the reasoning behind it. I am just a newbie, and I guess there is a reason for writing the code this way.
So I decided to keep that as is - and better add this little quick fix.
And you’re right, the allocation problem is an all-or-nothing issue. So the message refers to the first tensor and then the program exits.
| continue; | ||
| } | ||
|
|
||
| if (dst_tensor->data == nullptr) { |
There was a problem hiding this comment.
I don't think load_tensors() is the right primary place to handle this.
dst_tensor->data == nullptr is only a downstream symptom. The actual allocation failure happens earlier in StableDiffusionGGML::init(): some components already check alloc_params_buffer() and return false on failure, but cond_stage_model, diffusion_model, and high_noise_diffusion_model currently ignore the return value.
Could we instead propagate those failures there, e.g. check:
cond_stage_model->alloc_params_buffer()diffusion_model->alloc_params_buffer()high_noise_diffusion_model->alloc_params_buffer()
and return false with a component-level error message?
The null-data check in load_tensors() can still be kept as a defensive guard, but the current message memory allocation failed '<tensor>' is misleading because params buffer allocation is all-or-nothing, not per tensor.
Summary
Added a check against nullptr to avoid crash + core dump in model.cpp
Related Issue / Discussion
This test script
./sd-cli --diffusion-model ~/SD_models/flux/LongCat-Image-Q4_K_M.gguf --llm ~/SD_models/flux/Qwen2.5-VL-7B-Instruct.Q3_K_M.gguf -p 'a lovely cat' --mmapends with Segmentation fault (core dumped) because there is not enough VRAM.
Here follows some output for
version: stable-diffusion.cpp version master-647-72e512a-1-ga397e03, commit a397e03Additional Information
After this patch the user gets some information and we run into a clean exit:
Checklist