Skip to content

fix: INHERITS edges missing for Java extends+implements#279

Merged
DeusData merged 1 commit into
DeusData:mainfrom
loaychlih:fix/extract-base-classes-inherits-edges
May 9, 2026
Merged

fix: INHERITS edges missing for Java extends+implements#279
DeusData merged 1 commit into
DeusData:mainfrom
loaychlih:fix/extract-base-classes-inherits-edges

Conversation

@loaychlih
Copy link
Copy Markdown
Contributor

Problem

extract_base_classes() in internal/cbm/extract_defs.c had two bugs
that caused most INHERITS edges to be missing for Java (and likely other
languages using similar AST field names).

Bug 1 — Early return loses multiple inheritance targets

The field loop returned immediately on the first match:

for (const char **f = fields; *f; f++) {
    TSNode super = ts_node_child_by_field_name(node, *f, ...);
    if (!ts_node_is_null(super)) {
        return make_single_base(...);  // ← returns here, never checks interfaces
    }
}

For class Foo extends Bar implements Baz, it found superclass → returned
immediately → never processed super_interfaces. Result: 1 edge instead of 2.

Bug 2 — cbm_node_text returns full node text including keywords

Calling cbm_node_text on the superclass field node returned
"extends Bar" instead of "Bar". On super_interfaces it returned
"implements Baz, Qux" instead of the individual names.

Fix

Added collect_bases_from_field() which:

  • Walks into child AST nodes to extract type_identifier / generic_type /
    qualified_name text directly (skips keyword nodes like extends/implements)
  • Handles type_list / interface_type_list children for multiple interfaces
  • Strips generic args at < (e.g. List<String>List)
  • Falls back to raw cbm_node_text for languages where the field IS the type name

The field loop now collects from all matching fields before returning.

Result

Tested on Modelio
(Java codebase, 253 files):

Before After
INHERITS edges 3 116

Example edges now correctly emitted:

  • DefaultLinkToolDefaultDiagramTool (extends)
  • DefaultLinkToolILinkTool (implements)

Two bugs in extract_base_classes() caused most INHERITS edges to be
missing for Java (and maybe other languages also):

Bug 1 - Early return on first match: the field loop returned immediately
on the first matching field (e.g. superclass), never processing subsequent
fields (e.g. super_interfaces). Classes with both extends and implements
only produced one INHERITS edge instead of multiple.

Bug 2 - cbm_node_text returned full node text including keywords: calling
cbm_node_text on the superclass field node returned 'extends Bar' instead
of 'Bar', and on super_interfaces returned 'implements Baz, Qux' instead
of the individual names.

Fix: add collect_bases_from_field() which walks into child AST nodes to
extract type_identifier/generic_type/qualified_name text directly, handles
type_list children for multiple interfaces, and strips generic args at '<'.
The field loop now collects from all matching fields before returning.

Result on modelio codebase (Java): 3 -> 116 INHERITS edges.
@DeusData DeusData added bug Something isn't working parsing/quality Graph extraction bugs, false positives, missing edges labels May 4, 2026
@DeusData
Copy link
Copy Markdown
Owner

DeusData commented May 9, 2026

Thanks @loaychlih — clean diagnosis (two distinct bugs in the same function, with a nice repro on Modelio showing 3 → 116 INHERITS edges) and a bounds-safe fix that drops in cleanly alongside the existing arena-pool pattern in this file. Verified: max writes ≤ MAX_BASES_MINUS_1 with the per-iteration count < out_cap checks, no new system/network/file calls, the count == 0 fallback correctly preserves behavior for non-Java grammars where the field IS the type name. Merging now and pushing a small follow-up that adds a regression test for the extends+implements case so this doesn't silently regress.

@DeusData DeusData merged commit 323c68e into DeusData:main May 9, 2026
DeusData added a commit that referenced this pull request May 9, 2026
…es (#279)

Pins the bug-fix from #279 with a Java class declaring both extends and
implements. Asserts that base_classes contains:
  - the superclass name (DefaultDiagramTool)
  - every implements interface (ILinkTool, Closeable)
  - bare type names only — no 'extends' / 'implements' keyword text
    leaking into any entry

Without the fix, the field loop returned on the first match (only the
extends parent emitted) and cbm_node_text on the field returned the full
literal 'extends Bar' / 'implements Baz, Qux' string.
DeusData added a commit that referenced this pull request May 9, 2026
Resolved conflict in Makefile.cbm: keep both TEST_STACK_OVERFLOW_SRCS
(from main, #217) and the new py_lsp test variables (TEST_SCOPE_SRCS,
TEST_TYPE_REP_SRCS, TEST_PY_LSP_SRCS, TEST_PY_LSP_BENCH_SRCS,
TEST_PY_LSP_STRESS_SRCS, TEST_PY_LSP_SCALE_SRCS) in ALL_TEST_SRCS.

Other auto-merged files: internal/cbm/extract_defs.c (PR #279),
tests/test_main.c (multiple suite registrations on each side).

Brings in 28 commits from main since the branch was forked at 8fbdb0f
(#207 thread safety): #208 decorator USAGE, #209 memory helpers, #210
refactor, #217 traversal stacks, #224 Svelte/Vue imports, #231
search_graph default limit, #243 path aliases, #249 GH Actions shell
injection, #251 incremental destructive overwrite, #257 temporal
properties, #265 Nix flake, #267-270/#289 dependabot, #273 Pine Script,
#278 AUR docs, #279 INHERITS edges, #281 get_architecture wiring +
follow-up, codeql revert.
isc-tdyar pushed a commit to isc-tdyar/codebase-memory-mcp that referenced this pull request May 20, 2026
DeusData#279)

Two bugs in extract_base_classes() caused most INHERITS edges to be missing for Java (and any other language using similar AST field names):

Bug 1 — Early return: the field loop returned on the first matching field, so class Foo extends Bar implements Baz produced only one INHERITS edge instead of two — the loop never processed 'interfaces' after finding 'superclass'.

Bug 2 — Keyword text: cbm_node_text on the field node returned 'extends Bar' / 'implements Baz, Qux' literally instead of the bare type names. So even the one edge that did get emitted had a malformed target.

Fix: new collect_bases_from_field() walks into child AST nodes and extracts type_identifier / generic_type / qualified_name / scoped_type_identifier / user_type text directly. Handles type_list / interface_type_list children for the multi-interface case. Strips generics at '<'. Falls back to raw node text for languages where the field IS the type name. The outer field loop now collects from all matching fields before returning, and the result is arena-allocated with a NULL terminator like the surrounding extract_csharp_base_list pattern.

Bounds-safe: writes capped at MAX_BASES_MINUS_1 (15) with per-iteration count < out_cap checks; result fits in MAX_BASES (16) including NULL terminator.

Real-world impact: 3 → 116 INHERITS edges on the Modelio Java codebase (253 files).
isc-tdyar pushed a commit to isc-tdyar/codebase-memory-mcp that referenced this pull request May 20, 2026
…es (DeusData#279)

Pins the bug-fix from DeusData#279 with a Java class declaring both extends and
implements. Asserts that base_classes contains:
  - the superclass name (DefaultDiagramTool)
  - every implements interface (ILinkTool, Closeable)
  - bare type names only — no 'extends' / 'implements' keyword text
    leaking into any entry

Without the fix, the field loop returned on the first match (only the
extends parent emitted) and cbm_node_text on the field returned the full
literal 'extends Bar' / 'implements Baz, Qux' string.
isc-tdyar pushed a commit to isc-tdyar/codebase-memory-mcp that referenced this pull request May 20, 2026
Resolved conflict in Makefile.cbm: keep both TEST_STACK_OVERFLOW_SRCS
(from main, DeusData#217) and the new py_lsp test variables (TEST_SCOPE_SRCS,
TEST_TYPE_REP_SRCS, TEST_PY_LSP_SRCS, TEST_PY_LSP_BENCH_SRCS,
TEST_PY_LSP_STRESS_SRCS, TEST_PY_LSP_SCALE_SRCS) in ALL_TEST_SRCS.

Other auto-merged files: internal/cbm/extract_defs.c (PR DeusData#279),
tests/test_main.c (multiple suite registrations on each side).

Brings in 28 commits from main since the branch was forked at 8fbdb0f
(DeusData#207 thread safety): DeusData#208 decorator USAGE, DeusData#209 memory helpers, DeusData#210
refactor, DeusData#217 traversal stacks, DeusData#224 Svelte/Vue imports, DeusData#231
search_graph default limit, DeusData#243 path aliases, DeusData#249 GH Actions shell
injection, DeusData#251 incremental destructive overwrite, DeusData#257 temporal
properties, DeusData#265 Nix flake, DeusData#267-270/DeusData#289 dependabot, DeusData#273 Pine Script,
DeusData#278 AUR docs, DeusData#279 INHERITS edges, DeusData#281 get_architecture wiring +
follow-up, codeql revert.
lilisir0722-crypto pushed a commit to lilisir0722-crypto/codebase-memory-mcp that referenced this pull request May 23, 2026
DeusData#279)

Two bugs in extract_base_classes() caused most INHERITS edges to be missing for Java (and any other language using similar AST field names):

Bug 1 — Early return: the field loop returned on the first matching field, so class Foo extends Bar implements Baz produced only one INHERITS edge instead of two — the loop never processed 'interfaces' after finding 'superclass'.

Bug 2 — Keyword text: cbm_node_text on the field node returned 'extends Bar' / 'implements Baz, Qux' literally instead of the bare type names. So even the one edge that did get emitted had a malformed target.

Fix: new collect_bases_from_field() walks into child AST nodes and extracts type_identifier / generic_type / qualified_name / scoped_type_identifier / user_type text directly. Handles type_list / interface_type_list children for the multi-interface case. Strips generics at '<'. Falls back to raw node text for languages where the field IS the type name. The outer field loop now collects from all matching fields before returning, and the result is arena-allocated with a NULL terminator like the surrounding extract_csharp_base_list pattern.

Bounds-safe: writes capped at MAX_BASES_MINUS_1 (15) with per-iteration count < out_cap checks; result fits in MAX_BASES (16) including NULL terminator.

Real-world impact: 3 → 116 INHERITS edges on the Modelio Java codebase (253 files).
lilisir0722-crypto pushed a commit to lilisir0722-crypto/codebase-memory-mcp that referenced this pull request May 23, 2026
…es (DeusData#279)

Pins the bug-fix from DeusData#279 with a Java class declaring both extends and
implements. Asserts that base_classes contains:
  - the superclass name (DefaultDiagramTool)
  - every implements interface (ILinkTool, Closeable)
  - bare type names only — no 'extends' / 'implements' keyword text
    leaking into any entry

Without the fix, the field loop returned on the first match (only the
extends parent emitted) and cbm_node_text on the field returned the full
literal 'extends Bar' / 'implements Baz, Qux' string.
lilisir0722-crypto pushed a commit to lilisir0722-crypto/codebase-memory-mcp that referenced this pull request May 23, 2026
Resolved conflict in Makefile.cbm: keep both TEST_STACK_OVERFLOW_SRCS
(from main, DeusData#217) and the new py_lsp test variables (TEST_SCOPE_SRCS,
TEST_TYPE_REP_SRCS, TEST_PY_LSP_SRCS, TEST_PY_LSP_BENCH_SRCS,
TEST_PY_LSP_STRESS_SRCS, TEST_PY_LSP_SCALE_SRCS) in ALL_TEST_SRCS.

Other auto-merged files: internal/cbm/extract_defs.c (PR DeusData#279),
tests/test_main.c (multiple suite registrations on each side).

Brings in 28 commits from main since the branch was forked at 6210905
(DeusData#207 thread safety): DeusData#208 decorator USAGE, DeusData#209 memory helpers, DeusData#210
refactor, DeusData#217 traversal stacks, DeusData#224 Svelte/Vue imports, DeusData#231
search_graph default limit, DeusData#243 path aliases, DeusData#249 GH Actions shell
injection, DeusData#251 incremental destructive overwrite, DeusData#257 temporal
properties, DeusData#265 Nix flake, DeusData#267-270/DeusData#289 dependabot, DeusData#273 Pine Script,
DeusData#278 AUR docs, DeusData#279 INHERITS edges, DeusData#281 get_architecture wiring +
follow-up, codeql revert.
tigercosmos referenced this pull request in tigercosmos/cpp-codebase-memory-mcp May 29, 2026
…s (#279)

Two bugs in extract_base_classes() caused most INHERITS edges to be missing for Java (and any other language using similar AST field names):

Bug 1 — Early return: the field loop returned on the first matching field, so class Foo extends Bar implements Baz produced only one INHERITS edge instead of two — the loop never processed 'interfaces' after finding 'superclass'.

Bug 2 — Keyword text: cbm_node_text on the field node returned 'extends Bar' / 'implements Baz, Qux' literally instead of the bare type names. So even the one edge that did get emitted had a malformed target.

Fix: new collect_bases_from_field() walks into child AST nodes and extracts type_identifier / generic_type / qualified_name / scoped_type_identifier / user_type text directly. Handles type_list / interface_type_list children for the multi-interface case. Strips generics at '<'. Falls back to raw node text for languages where the field IS the type name. The outer field loop now collects from all matching fields before returning, and the result is arena-allocated with a NULL terminator like the surrounding extract_csharp_base_list pattern.

Bounds-safe: writes capped at MAX_BASES_MINUS_1 (15) with per-iteration count < out_cap checks; result fits in MAX_BASES (16) including NULL terminator.

Real-world impact: 3 → 116 INHERITS edges on the Modelio Java codebase (253 files).
tigercosmos referenced this pull request in tigercosmos/cpp-codebase-memory-mcp May 29, 2026
…es (#279)

Pins the bug-fix from #279 with a Java class declaring both extends and
implements. Asserts that base_classes contains:
  - the superclass name (DefaultDiagramTool)
  - every implements interface (ILinkTool, Closeable)
  - bare type names only — no 'extends' / 'implements' keyword text
    leaking into any entry

Without the fix, the field loop returned on the first match (only the
extends parent emitted) and cbm_node_text on the field returned the full
literal 'extends Bar' / 'implements Baz, Qux' string.
tigercosmos referenced this pull request in tigercosmos/cpp-codebase-memory-mcp May 29, 2026
Resolved conflict in Makefile.cbm: keep both TEST_STACK_OVERFLOW_SRCS
(from main, #217) and the new py_lsp test variables (TEST_SCOPE_SRCS,
TEST_TYPE_REP_SRCS, TEST_PY_LSP_SRCS, TEST_PY_LSP_BENCH_SRCS,
TEST_PY_LSP_STRESS_SRCS, TEST_PY_LSP_SCALE_SRCS) in ALL_TEST_SRCS.

Other auto-merged files: internal/cbm/extract_defs.c (PR #279),
tests/test_main.c (multiple suite registrations on each side).

Brings in 28 commits from main since the branch was forked at 8fbdb0f
(#207 thread safety): #208 decorator USAGE, #209 memory helpers, #210
refactor, #217 traversal stacks, #224 Svelte/Vue imports, #231
search_graph default limit, #243 path aliases, #249 GH Actions shell
injection, #251 incremental destructive overwrite, #257 temporal
properties, #265 Nix flake, #267-270/#289 dependabot, #273 Pine Script,
#278 AUR docs, #279 INHERITS edges, #281 get_architecture wiring +
follow-up, codeql revert.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working parsing/quality Graph extraction bugs, false positives, missing edges

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants