Skip to content

docs(#1429): add Cascade Specification#182

Open
dimitri-yatsenko wants to merge 2 commits into
mainfrom
fix/1429-cascade-spec
Open

docs(#1429): add Cascade Specification#182
dimitri-yatsenko wants to merge 2 commits into
mainfrom
fix/1429-cascade-spec

Conversation

@dimitri-yatsenko

Copy link
Copy Markdown
Member

Summary

Adds a detailed normative spec at `src/reference/specs/cascade.md` covering how DataJoint propagates restrictions across the foreign-key graph for cascade delete and cascade preview. Pairs with datajoint-python #1468, which fixes the Part-to-Master upward propagation under `part_integrity="cascade"` (#1429).

Closes the docs side of datajoint-python #1429. Slated for DataJoint 2.3.

Contents

Section What it covers
Overview Cascade entry points (`Table.delete`, `Diagram.cascade`), terminology
Dependency graph Node, edge, and alias-node structure — what the engine sees
Restriction propagation rules The 3 forward rules (F1 copy, F2 aliased rename, F3 projection) and 3 symmetric upward rules (U1, U2, U3)
Cascade flow Diagram + multi-pass termination
`part_integrity` modes `enforce` / `ignore` / `cascade`
Part-to-Master upward propagation Master identification, FK-path walk, alias-node transparency, intermediate-Part restrictions, materialization at the Master and why it's needed (MySQL 1093 avoidance)
Seed-is-Part case Why the engine triggers upward propagation explicitly for a leaf Part seed
Worked examples Two examples mirroring #1429 Case 1 (renamed FK via `.proj()`) and Case 2 (Part-of-Part chain)
Algorithmic complexity O(N · E) per pass; materialization fetch is the dominant cost
Out of scope `Diagram.trace()` (#1423), cross-schema cascade, user-defined rules

Examples use core DataJoint types (`int32`) per project convention.

Nav placement

New entry under Reference → Specifications → Data Operations, between Data Manipulation and AutoPopulate.

Sequencing

Reviewable now. Should land alongside or after datajoint-python #1468 so the spec doesn't describe code that isn't shipped.

Test plan

  • `mkdocs serve` renders the new spec under the right nav group
  • Cross-links resolve (`master-part.md`, `diagram.md`, `data-manipulation.md`, `delete-data.md`)
  • Wording matches the implementation in datajoint-python #1468
  • Examples reflect the actual API as used in the test fixtures

New normative spec at src/reference/specs/cascade.md (~200 lines)
covering how DataJoint propagates restrictions across the FK graph for
cascade delete and cascade preview. Pairs with datajoint-python #1468.

Contents:

- Overview of cascade entry points (Table.delete, Diagram.cascade)
- Dependency graph structure: nodes, edges, alias nodes for aliased FKs
- The three forward propagation rules (F1 copy, F2 aliased rename,
  F3 projection) and the three symmetric upward rules (U1, U2, U3)
- Cascade flow diagram and multi-pass termination
- part_integrity modes (enforce / ignore / cascade)
- Part-to-Master upward propagation: Master identification by naming
  convention, FK-path walk via nx.shortest_path, alias-node transparency,
  intermediate-Part restrictions, materialization at the Master
- Seed-is-Part case
- Two worked examples mirroring #1429 Case 1 (renamed FK) and Case 2
  (Part-of-Part chain)
- Algorithmic complexity
- Out-of-scope cross-references (Diagram.trace #1423, cross-schema
  cascade, custom rules)

Examples use core DataJoint types (int32) per project convention.
Nav entry added under Reference > Specifications > Data Operations.
MilagrosMarin
MilagrosMarin previously approved these changes Jun 10, 2026

@MilagrosMarin MilagrosMarin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-checked the spec against the implementation in #1468 line-by-line:

Forward rules F1/F2/F3 match _apply_propagation_rule (diagram.py:549-605) exactly
Upward rules U1/U2/U3 match _apply_propagation_rule_upward (diagram.py:607-661) exactly
Alias-node convention (integer-named nodes, both half-edges carry same attr_map/aliased) matches _find_real_edge_props
extract_master naming convention verified in dependencies.py:20
nx.shortest_path for FK path walking matches _propagate_part_to_master
Materialization rationale (MySQL 1093 error 'You can't specify target table for update in FROM clause') precisely stated; the spec's (master_ft & restrictions).proj().to_arrays() is equivalent to the impl's self._restricted_table(master_name).proj().to_arrays()
Seed-is-Part pre-loop trigger explanation matches diagram.py:467-474
✅ All cross-links resolve (master-part.md, diagram.md, data-manipulation.md, delete-data.md)
load_all_downstream exists in dependencies.py:229
✅ Nav placement under Reference → Specifications → Data Operations is sensible
✅ Both worked examples trace correctly through the rules

A few minor wording observations, none blocking:

1. Worked Example 1, step 3b. Line 148 says:

Apply U3Subject is restricted by Subject.Session.proj() (projected to subject_id).

The (projected to subject_id) parenthetical is slightly misleading. Subject.Session.proj() projects to Session's PK ({subject_id, session_id}), not to subject_id alone. Only subject_id matters in the resulting restriction because that's the shared column with Subject's PK in the natural join. Tighter wording: "…projected to Session's PK; the natural join with Subject filters on the shared subject_id".

2. Empty-result case in materialization not mentioned. Impl handles len(master_pk_values) == 0 by setting restrictions[master_name] = [False] — the master appears with zero affected rows. Spec's Materialization section doesn't mention this branch. One sentence would round it out.

3. Multiple FK paths from Master to Part. Same observation I raised on #1468. The spec uses nx.shortest_path (line 90) but doesn't note the limitation that multiple FK chains could exist between the same Master/Part pair. Worth one sentence in "What is not part of this specification" or a brief caveat where shortest_path is mentioned.

4. Naming-convention fragility. Line 86: "The Master is identified by naming convention via dependencies.extract_master". Worth noting this is fragile — a Part whose __ convention is broken, or a Part referenced from a different schema, wouldn't be matched. Current fallback ("otherwise the upward walk is skipped") is correct behavior, but the failure mode could be surprising.

5. Optional: brief notation section. Terms child_attrs, parent_attrs, parent_pk, child_pk, attr_map, aliased are used without an explicit glossary. A short "Notation" subsection (4–5 definitions) would lower the bar for newer contributors reading the spec.

6. Algorithmic complexity bound. Line 190's "O(N · E) per pass, with at most one pass per master pulled in" is fine but academically loose. Cleaner: O(P · N · E) where P is the number of distinct masters pulled in by upward propagation. Cosmetic.

Approving — the spec is accurate, well-organized, and the worked examples are exactly the right grounding. The minor items are optional polish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants