Qualcomm AI Engine Direct - [LLM Quantization] Support dataloader-based prefill#20273
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20273
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 New Failures, 3 Unrelated Failures, 2 Unclassified FailuresAs of commit 73263c2 with merge base 05b977d ( NEW FAILURES - The following jobs have failed:
UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@psiddh Hi, this PR is to support Dataloader-based calibration in MLLMs. With this PR, LLMs can be calibrated using the full input sequence at once, eliminating the need for iterative autoregressive (AR) processing over long sequences. For example, instead of performing hundreds of iterations for a sequence length of 1024, calibration can now be completed in a single forward pass. Below is a comparison between AR iterative calibration and dataloader-based calibration across different models: MLLMs metrics
cc: @shewu-quic, @haowhsu-quic |
d80b723 to
01574e1
Compare
|
@pytorchbot label "release notes: qualcomm" |
Calibration dataset:
- Replace HF AutoModel token generation with direct tokenization of
curated corpus (llm eval tasks or JSON samples)
- Add default calibration samples: assets/samples/{text,vision,audio}.json
Architecture:
- Introduce PTQStrategy + DecoderInference as unified calibration
forward-pass primitives; remove decoder_utils.graph_module_inference
- Refactor dataset.py into dataset/ package:
builders, collators, config, datasets, loaders, preprocessors, schema
01574e1 to
73263c2
Compare
|
@claude review |
|
Claude finished @psiddh's task in 6m 50s —— View job Claude Code ReviewReviewed the refactor end-to-end (dataset/, inference/, quantize/, evaluator/, wrappers, tokenizer, masking, tests). This is a clean, well-structured rework — the Bugs / correctness1. Dead, self-contradicting warning in if TASKS_EVAL in args.eval_methods and args.eval_tasks is None:
if args.calib_tasks is None:
logging.warning(
"--eval_tasks is set but --calib_tasks is not; ..."
)
raise RuntimeError("Please provide --eval_tasks to eval perplexity")This block only runs when 2. Falsy-token-id bug in modality merging — self.audio_token_id or self.image_token_idappears in both the 3. Nits / polish
Tests
Overall: solid refactor, no blocking concerns beyond the dead warning (#1). Nice speedups. |
|
Nice refactor, and the calibration speedups look great.! |
Will update those. thanks for catching that! |
Summary
Calibration dataset:
Architecture:
Test plan
Test CI: