[Common] EP C API: version config structs and extend `nvte_ep_prepare` with `total_recv_tokens_per_rank` placeholder by phu0ngng · Pull Request #3154 · NVIDIA/TransformerEngine

phu0ngng · 2026-06-29T12:08:18Z

Description

Versions the EP config structs (NVTEEpGroupConfig, NVTEEpLayerConfig) with a leading struct_size field and passes them by pointer, so fields can be added without breaking ABI.
Adds a total_recv_tokens_per_rank placeholder output to nvte_ep_prepare for future use (accepted, may be null, ignored for now).
Renames the nvte_ep_prepare output token_counts to recv_tokens_per_expert for clarity.
Updates all call sites (JAX bindings, C++ distributed tests) and docs accordingly.
PENDING: Update PyT callers' side after PR [PyTorch] Expert Parallelism: PyTorch wrapper + autograd ops with symm-mem zero-copy #3035 is merged.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

greptile-apps · 2026-06-29T12:14:35Z

Greptile Summary

This PR versions the EP C API config structs (NVTEEpGroupConfig, NVTEEpLayerConfig) by prepending a struct_size field and changing all API functions to accept them by pointer, enabling future ABI-compatible field additions. It also renames token_counts → recv_tokens_per_expert for clarity and adds a total_recv_tokens_per_rank null-accepted placeholder to nvte_ep_prepare.

normalize_ep_config() in ep_api.cpp handles the versioning contract cleanly: struct_size == 0 is treated as the base layout, values below min_size are rejected with a clear diagnostic, and a partial memcpy lets older callers omit unknown trailing fields which then default to zero.
kGroupConfigMinSize and kLayerConfigMinSize are marked "frozen" and cover all current fields, so new fields appended in future versions will transparently default to zero for old callers — the design is correct for the first versioned release.
The NVTE_EP_*_CONFIG_INIT macros and designated-initialiser updates at all call sites (JAX, PyTorch, C++ tests) are consistent; PyTorch's ep_prepare Python-parameter rename is intentionally deferred pending PR [PyTorch] Expert Parallelism: PyTorch wrapper + autograd ops with symm-mem zero-copy #3035.

Confidence Score: 5/5

Safe to merge; the ABI versioning design is sound, all call sites have been updated, and the ignored placeholder is clearly annotated.

The normalize_ep_config() logic correctly handles all struct_size cases, the frozen min_size constants preserve backward compatibility for future field additions, and the rename/pointer-conversion is consistently applied across JAX, PyTorch, and C++ test call sites. The only finding is a style-level duplication in test code that carries no correctness risk.

No files require special attention. The PyTorch ep_prepare parameter name (token_counts) is intentionally left as-is pending a dependent PR.

Important Files Changed

Filename	Overview
transformer_engine/common/include/transformer_engine/ep.h	Adds struct_size versioning field to NVTEEpGroupConfig and NVTEEpLayerConfig; renames max_num_sms→num_comm_sms; adds NVTE_EP_*_CONFIG_INIT macros; extends nvte_ep_prepare with nullable total_recv_tokens_per_rank placeholder and switches config args to pointers.
transformer_engine/common/ep/ep_api.cpp	Introduces normalize_ep_config() template that handles struct_size versioning (0→min_size, range check, partial memcpy); rewires all public entry points to use pointer-typed config args; stubs correctly updated in the !NVTE_WITH_NCCL_EP branch.
transformer_engine/common/ep/ep_backend.cpp	Renames max_num_sms→num_comm_sms in validate_config and init(); adds total_recv_tokens_per_rank as an explicitly ignored parameter to prepare() with a clear "reserved placeholder" comment; max_token_dtype range check retained.
transformer_engine/common/ep/ep_backend.h	Updates prepare() signature to accept total_recv_tokens_per_rank alongside the renamed recv_tokens_per_expert; no other changes.
transformer_engine/jax/csrc/extensions/ep.cpp	Updates all config construction to use designated-initialiser syntax with struct_size, switches nvte_ep_* calls to pointer args, renames token_counts→recv_tokens_per_expert locally; passes nullptr for total_recv_tokens_per_rank.
transformer_engine/pytorch/csrc/extensions/ep.cpp	Migrates config construction to designated initialisers with struct_size, switches nvte_ep_* calls to pointer args; ep_prepare() Python-facing parameter name intentionally kept as token_counts pending PR #3035.
tests/cpp_distributed/test_ep.cu	Renames token_counts→recv_tokens_per_expert throughout; adds NVTEEpLayerConfig layer_cfg_ to both EPBuffers and EPTensors and initializes it with NVTE_EP_LAYER_CONFIG_INIT; all nvte_ep_prepare call sites updated to new pointer-based signature.
tests/cpp_distributed/test_ep_common.h	Replaces NVTEEpGroupConfig{} zero-init with NVTE_EP_GROUP_CONFIG_INIT in ep_bootstrap() and ep_reinitialize(); updates nvte_ep_initialize() calls to pass by pointer.

_{Reviews (3): Last reviewed commit: "Merge branch 'main' into ep-c-api" | Re-trigger Greptile}

jberchtold-nvidia

LGTM pending CI. I think for now the struct_size field is sufficient for versioning. I feel like we have a lot of structs that have needed versioning, so in future would be great for someone to align the C API with similar versioning functionality, like some VERSIONED_STRUCT macro. But out of scope for this PR

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

… test Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng · 2026-06-29T19:58:16Z

/te-ci L1

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng · 2026-06-30T07:56:59Z

/te-ci L1

…` with `total_recv_tokens_per_rank` placeholder (#3154) * versioning EP C configs Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * Rename EP prepare token_counts to recv_tokens_per_expert Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * Add total_recv_tokens_per_rank placeholder to nvte_ep_prepare Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * Adapt PyTorch EP binding to versioned nvte_ep C config API Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * Rename EP group config max_num_sms to num_comm_sms Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng requested review from jberchtold-nvidia and ptrendx as code owners June 29, 2026 12:08

greptile-apps Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread transformer_engine/common/ep/ep_backend.cpp

Comment thread transformer_engine/common/ep/ep_api.cpp

jberchtold-nvidia previously approved these changes Jun 29, 2026

View reviewed changes

phu0ngng added the 2.17 label Jun 29, 2026

phu0ngng added 6 commits June 29, 2026 09:45

versioning EP C configs

4cabf03

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

Rename EP prepare token_counts to recv_tokens_per_expert

ed3d740

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

Add total_recv_tokens_per_rank placeholder to nvte_ep_prepare

1dbbddb

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

Adapt PyTorch EP binding to versioned nvte_ep C config API

afa0656

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

Rename EP group config max_num_sms to num_comm_sms

c92997e

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

Detect active NVLink via nvlink --status link bandwidth in PyTorch EP…

70aee42

… test Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng dismissed jberchtold-nvidia’s stale review via 70aee42 June 29, 2026 19:54

phu0ngng force-pushed the ep-c-api branch from c7da769 to 70aee42 Compare June 29, 2026 19:54

phu0ngng requested a review from ksivaman as a code owner June 29, 2026 19:54

phu0ngng requested a review from jberchtold-nvidia June 29, 2026 19:54

[pre-commit.ci] auto fixes from pre-commit.com hooks

1db0c80

for more information, see https://pre-commit.ci

jberchtold-nvidia previously approved these changes Jun 29, 2026

View reviewed changes

Add max_token_dtype range check to nvte_ep_init for clearer error

a2b8fd3

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng dismissed jberchtold-nvidia’s stale review via a2b8fd3 June 29, 2026 19:56

phu0ngng requested a review from jberchtold-nvidia June 29, 2026 19:57

Merge branch 'main' into ep-c-api

2287d47

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

jberchtold-nvidia approved these changes Jun 30, 2026

View reviewed changes

phu0ngng merged commit 3df5e19 into NVIDIA:main Jun 30, 2026
44 of 54 checks passed

phu0ngng deleted the ep-c-api branch June 30, 2026 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Common] EP C API: version config structs and extend `nvte_ep_prepare` with `total_recv_tokens_per_rank` placeholder#3154

[Common] EP C API: version config structs and extend `nvte_ep_prepare` with `total_recv_tokens_per_rank` placeholder#3154
phu0ngng merged 9 commits into
NVIDIA:mainfrom
phu0ngng:ep-c-api

phu0ngng commented Jun 29, 2026

Uh oh!

greptile-apps Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jberchtold-nvidia left a comment •

edited

Loading

Uh oh!

phu0ngng commented Jun 29, 2026

Uh oh!

phu0ngng commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

phu0ngng commented Jun 29, 2026

Description

Type of change

Checklist:

Uh oh!

greptile-apps Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

jberchtold-nvidia left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phu0ngng commented Jun 29, 2026

Uh oh!

phu0ngng commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Jun 29, 2026 •

edited

Loading

jberchtold-nvidia left a comment •

edited

Loading