direct: ignore remote-only changes on name-based ID fields#5599
Conversation
de92903 to
380841e
Compare
Integration test reportCommit: ffb887f
23 interesting tests: 15 SKIP, 7 RECOVERED, 1 flaky
Top 25 slowest tests (at least 2 minutes):
|
4f7b7ab to
b99237b
Compare
f99e1b4 to
042a58e
Compare
| } | ||
| } | ||
| for _, p := range a.resourceConfig.ProvidedIDFields { | ||
| if path.HasPatternPrefix(p.Field) { |
There was a problem hiding this comment.
Q: why prefix and not exact match?
There was a problem hiding this comment.
We always use prefix for declarative fields config. Does not matter for top level scalar fields of course.
| func (a *Adapter) IsFieldInRecreateOnChanges(path *structpath.PathNode) bool { | ||
| // FieldTriggersRecreate reports whether a local change to the field forces a | ||
| // delete + create. Both recreate_on_changes and provided_id_fields do this, so a | ||
| // caller that knows the ID is preserved can conclude the field is unchanged. |
There was a problem hiding this comment.
Sentence is hard to parse. How would a caller know the ID is preserved?
There was a problem hiding this comment.
Caller is bundle_plan.go which knows action
bundle/direct/bundle_plan.go: if targetAction.KeepsID() {
bundle/direct/bundle_plan.go: canReadRemoteCache := targetAction == deployplan.Skip || (targetAction.KeepsID() && adapter.FieldTriggersRecreate(fieldPath))
| // A change is skipped when local and remote differ only by case. | ||
| NormalizeCase []FieldRule `yaml:"normalize_case,omitempty"` | ||
| // ProvidedIDFields: field patterns that compose the resource's ID — a name the | ||
| // user provides (not a server-generated id), which DoRead fetches by. Local |
There was a problem hiding this comment.
Note: DoRead never fetches by these fields, only by the ID.
There was a problem hiding this comment.
Right, it's not directly used right now. We typically use ID from the response even when we have it in request. I think we should change it and actually use ID from the config. But for now I'll fix the comment.
| schemas: | ||
| recreate_on_changes: | ||
| provided_id_fields: | ||
| # UC lowercases identifier names; remote returns "myschema" for config "MySchema". |
There was a problem hiding this comment.
Inline comment still relevant?
There was a problem hiding this comment.
yes, it gives specific example why this matters.
Resources fetched by a name-based ID (schemas, volumes, registered models, apps, secret scopes, postgres resources, etc.) cannot exhibit real remote drift on the fields composing that ID: a successful get-by-ID means a differing remote value can only be backend normalization (e.g. UC lowercasing), since an out-of-band rename would 404 and is already handled as resource-gone. Reacting with recreate or rename is therefore never correct for remote-only diffs on these fields. Encode this declaratively: - named_id_fields: ID components; local changes recreate, remote-only diffs are skipped. Replaces their recreate_on_changes entries. - update_id_on_local_changes (renamed from update_id_on_changes): same skip for remote-only diffs; local changes still rename via DoUpdateWithID. normalize_case is unchanged: it covers local case-only edits, which the get-by-ID argument does not, and which would otherwise recreate a resource UC considers unchanged. Co-authored-by: Isaac
Revert the cosmetic update_id_on_changes -> update_id_on_local_changes rename to minimize the diff (catalogs, external_locations, volumes, secret_scopes.permissions keep their existing key). The semantics shift is explained in comments instead: update_id_on_changes now only governs local changes, since remote-only diffs on these ID fields are skipped by shouldSkipIDField. Co-authored-by: Isaac
The fields hold a user-provided name that forms the resource ID (as opposed to a server-generated id); "provided" makes clear why a remote-only difference can only be backend normalization. Source-only rename; reason strings (id_field) are unchanged. Also regenerate volumes/uppercase-name/out.deploy.direct.txt, whose id_field reasons were reverted to the stale uc_case by the rebase auto-merge. Co-authored-by: Isaac
provided_id_fields already skips the remote-only diffs that normalize_case was added for (the no-op-redeploy bug), so normalize_case only guarded a local case-only edit — an untested corner, gated by the destructive-action prompt for schemas/volumes, and inconsistent with the other provided_id_fields resources that never had it. Remove the field, its shouldSkipNormalized branch, and the schemas/volumes yaml blocks. normalize_slash is unaffected. Also regenerate model_serving_endpoints/basic/out.second-plan.direct.json (immutable -> id_field), a cloud-only golden carrying the reason change from moving name to provided_id_fields; local task test skips it, so CI integration caught it. Co-authored-by: Isaac
The method now also covers provided_id_fields (both categories recreate on a local change), so the recreate_on_changes-specific name was inaccurate. Co-authored-by: Isaac
The provided_id_fields / update_id_on_changes logic was split across shouldSkipIDField (remote-only -> skip) and shouldUpdateOrRecreate (local -> recreate/rename), so correctness depended on the first running before the second in the ladder. Fold both into classifyIDField, which decides skip vs recreate/UpdateWithID in one place from Old==New. The leftover shouldUpdateOrRecreate then only matched recreate_on_changes, so inline it as a direct findMatchingRule call (cfg is never nil here). Pure refactor; no behavior or golden changes. Co-authored-by: Isaac
Co-authored-by: Isaac
recreate_on_changes, provided_id_fields, and update_id_on_changes each decide a field's action, and classifyIDField runs before recreate_on_changes in the ladder, so a field in more than one would have dead entries and the categories disagree on remote-only diffs. Add a test that a field appears in at most one. Co-authored-by: Isaac
Pairs with provided_id_fields: both compose the resource's user-provided ID and are handled by classifyIDField (remote-only diffs skipped), differing only in the local-change action — provided_id_fields recreates, updatable_id_fields renames via UpdateWithID. The *_id_fields naming makes the family explicit and is more accurate than "update_id_on_changes", which post-refactor also governs the remote-only skip. Source-only; the per-field reason (id_changes) is unchanged so no golden churn. Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
The comment said DoRead fetches by these config fields. It actually fetches by the ID recorded in state (the value the backend returned, e.g. a lowercased full_name). Clarify that the get-by-ID invariant rests on the stored ID, not the config fields: the resource is found at its ID, so a remote-only difference is backend normalization, and a real rename changes the ID and 404s. Co-authored-by: Isaac
66684c6 to
ffb887f
Compare
Integration test reportCommit: a4a2560
232 interesting tests: 152 MISS, 35 FAIL, 32 RECOVERED, 8 flaky, 2 KNOWN, 2 SKIP, 1 PANIC
Top 50 slowest tests (at least 2 minutes):
|
Only NEXT_CHANGELOG.md conflicted (kept the postgres_databases entry alongside main's #5599 entry). main's #5599 added provided_id_fields/updatable_id_fields for name-based IDs to schemas/volumes only — postgres resources keep recreate_on_changes, so no realignment was needed. Generated schema artifacts (#5611 codegen change) are consistent; regeneration produced no drift. Co-authored-by: Isaac
## Release v1.4.0 ### CLI * Improved error messages for `ssh connect`: when an SSH connection attempt fails, the client now fetches and prints the server's recent error logs ([#5555](#5555)). * Increase the SSH server startup timeout from 10 to 45 minutes when a GPU accelerator is requested via `databricks ssh connect --accelerator` ([#5569](#5569)). * Fix authentication falling back to the default profile in `.databrickscfg` when a host is already configured via the environment (e.g. `DATABRICKS_HOST` with `DATABRICKS_TOKEN`) ([#5616](#5616)). * ssh: fix opening remote environment in Cursor, which previously hung on default-extension install and never opened the editor ([#5619](#5619)). * Improve the error shown when `databricks labs install` cannot find a project's `labs.yml`: the message now explains that either the requested version does not exist or the project is not installable with the CLI, and links to the repository ([#5559](#5559)). ### Bundles * Remove API enum values and types that are still in development from the `databricks-bundles` Python package; these were never accepted by the backend ([#5484](#5484)). * direct: Fix resolving a resource reference that is used more than once within the same field ([#5558](#5558)). * Bundle variable references now accept Unicode letters in path segments (e.g. `${var.变量}`). ([#5532](#5532)) * Ignore remote changes for vector search direct_access_index_spec.schema_json to prevent drift when the backend normalizes the schema ([#5481](#5481)). * Remove hidden, never-functional `--existing-dashboard-id`, `--existing-dashboard-path`, `--existing-alert-id`, and `--existing-genie-space-id` alias flags from `bundle generate`; use the documented `--existing-id` / `--existing-path` flags instead ([#5591](#5591)). * engine/direct: Fix WAL corruption after two consecutive failed deploys ([#5606](#5606)). * engine/direct: Don't open the deployment state WAL when a deploy's plan fails ([#5607](#5607)). * Ignore unity catalog managed schema property defaults to avoid unnecessary drift ([#5195](#5195)). * Add `postgres_roles` and `postgres_databases` resources to create Postgres roles and databases on a Lakebase branch ([#5467](#5467), [#5627](#5627)). * direct: Stop spurious recreate/rename on redeploy when the backend normalizes a resource's name-based ID (e.g. Unity Catalog lowercasing a schema or volume name) ([#5599](#5599)). * Fix the generated pipeline README to suggest `databricks bundle run <pipeline> --refresh <table>` for running a single transformation; the previously documented `--select` flag is not supported by `bundle run` ([#5252](#5252)).
In #5531 we fixed spurious recreation of schemas/volumes due to backend lowercasing fields that are part of the id/name. This was fixed by making comparison for those fields case-insensitive.
This is more general fix for all resources that have ID fields configured in bundle config based on the following: if we can fetch the resource by ID, then even if fields come back changed, we should not recreate, because change can only be attributed to normalization, not replacement.