Skip to content

Disable cuDNN 9.23.0/9.23.1 for MXFP8 attention#3173

Open
cyanguwa wants to merge 2 commits into
NVIDIA:mainfrom
cyanguwa:disable_9.23
Open

Disable cuDNN 9.23.0/9.23.1 for MXFP8 attention#3173
cyanguwa wants to merge 2 commits into
NVIDIA:mainfrom
cyanguwa:disable_9.23

Conversation

@cyanguwa

@cyanguwa cyanguwa commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Description

There are some Inf/Nan issues with MXFP8 attention when running with cuDNN 9.23.0 and 9.23.1. They are fixed in 9.23.2.

https://docs.nvidia.com/deeplearning/cudnn/backend/latest/release-notes.html#cudnn-9-23-2

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

See Description.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
@cyanguwa cyanguwa added the 2.17 label Jul 2, 2026
@cyanguwa cyanguwa requested a review from KshitijLakhani July 2, 2026 22:50
@cyanguwa cyanguwa changed the title disable 9.23.0/.1 for mxfp8 attention Disable 9.23.0/.1 for MXFP8 attention Jul 2, 2026
@cyanguwa cyanguwa changed the title Disable 9.23.0/.1 for MXFP8 attention Disable 9.23.0/9.23.1 for MXFP8 attention Jul 2, 2026
@cyanguwa cyanguwa changed the title Disable 9.23.0/9.23.1 for MXFP8 attention Disable cuDNN 9.23.0/9.23.1 for MXFP8 attention Jul 2, 2026
@greptile-apps

greptile-apps Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a targeted version guard to disable MXFP8 FusedAttention when the detected cuDNN version is 9.23.0 or 9.23.1, both of which have known bugs causing Inf/NaN results in SDPA. The fix is minimal and scoped to the MXFP8 branch of the version-gating logic in get_attention_backend.

  • The cudnn_version in ((9, 23, 0), (9, 23, 1)) check is inserted between the existing < (9, 21, 0) lower-bound guard and the qkv_format == \"thd\" check, correctly allowing 9.21.x–9.22.x and 9.23.2+ through while blocking the two buggy point releases.
  • Affected users silently fall back to an alternate attention backend with no warning-level log message, making it harder to diagnose if the fallback causes unexpected performance or accuracy differences.

Confidence Score: 4/5

Safe to merge — the guard correctly blocks the two known-buggy cuDNN point releases and does not affect any other version.

The version tuple check and its placement in the conditional chain are correct. The only noteworthy gap is that the disable is logged at debug level, so users running a buggy cuDNN build get no visible indication they have fallen back to a slower or different attention backend.

transformer_engine/pytorch/attention/dot_product_attention/utils.py — specifically the log level used for the new disable message at line 634.

Important Files Changed

Filename Overview
transformer_engine/pytorch/attention/dot_product_attention/utils.py Adds a cuDNN version guard to disable MXFP8 FusedAttention on 9.23.0 and 9.23.1, which have known Inf/NaN bugs. Logic and placement are correct; the disable message uses debug-level logging like the rest of the block, but this is a silent correctness issue rather than a capability gap.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[fp8_recipe.mxfp8 is True] --> B{device_compute_capability < sm100?}
    B -- Yes --> Z1[Disable FusedAttention]
    B -- No --> C{fp8_recipe.fp8_mha?}
    C -- Yes --> Z2[Disable FusedAttention]
    C -- No --> D{cudnn_version < 9.21.0?}
    D -- Yes --> Z3[Disable FusedAttention]
    D -- No --> E{cudnn_version == 9.23.0 or 9.23.1?}
    E -- Yes --> Z4["Disable FusedAttention (NEW — known Inf/NaN bug)"]
    E -- No --> F{qkv_format == 'thd'?}
    F -- Yes --> Z5[Disable FusedAttention]
    F -- No --> G[FusedAttention Enabled]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[fp8_recipe.mxfp8 is True] --> B{device_compute_capability < sm100?}
    B -- Yes --> Z1[Disable FusedAttention]
    B -- No --> C{fp8_recipe.fp8_mha?}
    C -- Yes --> Z2[Disable FusedAttention]
    C -- No --> D{cudnn_version < 9.21.0?}
    D -- Yes --> Z3[Disable FusedAttention]
    D -- No --> E{cudnn_version == 9.23.0 or 9.23.1?}
    E -- Yes --> Z4["Disable FusedAttention (NEW — known Inf/NaN bug)"]
    E -- No --> F{qkv_format == 'thd'?}
    F -- Yes --> Z5[Disable FusedAttention]
    F -- No --> G[FusedAttention Enabled]
Loading

Reviews (1): Last reviewed commit: "[pre-commit.ci] auto fixes from pre-comm..." | Re-trigger Greptile

Comment thread transformer_engine/pytorch/attention/dot_product_attention/utils.py
@cyanguwa

cyanguwa commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Pipeline 56671690 for 9.23.0; 56671787 for 9.23.1; and 56672233 for 9.23.2. Nightly CI uses 9.24 so it's confirmed that 9.24 works.

Local testing confirms the fix in 9.23.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant