-
Notifications
You must be signed in to change notification settings - Fork 763
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Disable cuDNN 9.23.0/9.23.1 for MXFP8 attention
2.17
#3173
opened Jul 2, 2026 by
cyanguwa
Collaborator
Loading…
8 of 13 tasks
[Pytorch][Bug] Requires Grad doesnt flow through Autograd boundaries for QuantizedTensor
#3172
opened Jul 2, 2026 by
vthumbe1503
Collaborator
Loading…
13 tasks
Reverse MXFP8 quantization row raster
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3170
opened Jul 2, 2026 by
sraman-rgb
Contributor
Loading…
13 tasks
Remove cuDNN frontend submodule
2.18
#3169
opened Jul 2, 2026 by
vcherepanov-nv
Collaborator
Loading…
3 of 13 tasks
Add fused multi-tensor kernel for 1D blockwise FP8 quantization
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3168
opened Jul 2, 2026 by
shangxiaokang
•
Draft
13 tasks
Bump transformers from 4.57.0 to 5.3.0 in /docs/examples/te_llama
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#3167
opened Jul 2, 2026 by
dependabot
Bot
Loading…
Bump transformers from 4.55.0 to 5.3.0 in /docs/examples/te_gemma
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#3166
opened Jul 2, 2026 by
dependabot
Bot
Loading…
Improve readability of dgrad overlap variable
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3165
opened Jul 2, 2026 by
Prachi-kushwaha
Loading…
4 of 13 tasks
[JAX] Add attention tutorials
2.18
documentation
Improvements or additions to documentation
#3162
opened Jul 1, 2026 by
KshitijLakhani
Collaborator
Loading…
5 of 13 tasks
[PyTorch] Add optional caller-provided output/grad-input buffers to GroupedLinear and fused grouped MLP
#3161
opened Jul 1, 2026 by
phu0ngng
Collaborator
Loading…
7 of 13 tasks
[Common][PyTorch] Add strided batched GEMM in BF16/MXFP8
org-contribution
#3160
opened Jul 1, 2026 by
yaox12
Member
Loading…
8 of 13 tasks
Migrate norms and softmax kernels to NVRTC
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3156
opened Jun 30, 2026 by
CarlosGomes98
Contributor
Loading…
2 of 13 tasks
[PyTorch][torch.compile] Add TensorProto mechanism
#3153
opened Jun 29, 2026 by
pggPL
Collaborator
Loading…
4 of 13 tasks
[PyTorch][torch.compile] Make quantizers opaque value objects
#3152
opened Jun 29, 2026 by
pggPL
Collaborator
Loading…
8 of 13 tasks
Enable FA4 for context-parallel attention
#3149
opened Jun 26, 2026 by
sudhakarsingh27
Member
•
Draft
7 of 13 tasks
[Draft] Use vendored cuDNN frontend for Python
#3148
opened Jun 26, 2026 by
vcherepanov-nv
Collaborator
Loading…
1 of 13 tasks
Add MXFP8 support with cuBLASMp
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3145
opened Jun 25, 2026 by
almogsegal
Contributor
Loading…
13 tasks
Add multi_tensor_raw_moments kernel
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3144
opened Jun 25, 2026 by
philipcmonk
•
Draft
6 of 13 tasks
Single-launch CUTLASS grouped GEMM for per-tensor NVFP4
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3134
opened Jun 17, 2026 by
cael-ling
Contributor
Loading…
9 of 13 tasks
Enable NVFP4 RHT amax for grouped SReLU MLP
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3133
opened Jun 16, 2026 by
sraman-rgb
Contributor
Loading…
13 tasks
[Common] Support scaled & clamped swiglu, srelu for BF16
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3132
opened Jun 16, 2026 by
zhongbozhu
Collaborator
Loading…
13 tasks
[JAX] Remove shard_map from MoEBlock to support quant before FSDP AG using Grouped quant+GEMM custom partitioning rules
#3131
opened Jun 15, 2026 by
jberchtold-nvidia
Collaborator
•
Draft
13 tasks
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.