Skip to content

Move data-repo reconciliation from entrypoint shell into the API process #66

Description

@themightychris

What

Move the fetch / fast-forward / rebase / conflict-escape-hatch logic that currently lives in deploy/docker/entrypoint.sh into the API process, and shrink the entrypoint to the minimum boot-prep step.

Why

Today's split:

  • Entrypoint (shell): ensures /app/data is a valid git repo, fetches origin, reconciles local vs origin (in-sync / behind / ahead / diverged-clean / diverged-conflicts → escape-hatch branch), then execs the API.
  • API (Node): starts the gitsheets push daemon for outgoing commits. Has no knowledge of pulling.

Asymmetry: push in Node, pull in shell. That asymmetry just bit us twice in one session:

  1. Single-branch clone can't switch branches. git clone --branch X writes a narrow remote refspec; later git fetch origin Y (entrypoint) doesn't populate refs/remotes/origin/Y, then git checkout -b Y origin/Y fails. Hit when flipping CFP_DATA_BRANCH between fixture and published. A Node-side path using the gitsheets/hologit ref-management API doesn't make this class of mistake — refspecs are explicit, not implicit-from-clone-history.

  2. Pipe to sed swallows the rebase exit code. if git rebase origin/$BRANCH 2>&1 | sed 's/^/ /'; then … — the if evaluates sed's status (always 0), not rebase's. The entrypoint logged "rebase clean — pushing" on a rebase that actually aborted with "index contains uncommitted changes," then tried to push the stale local commit and got rejected non-fast-forward. The escape-hatch never fired. A Node implementation either uses the library's structured return value or awaits a Promise that rejects — no shell-pipe-eating-exit-code class of bug.

And separately from the bugs: the reconciliation logic needs to be callable from the hot-reload webhook (#65) anyway. Doing it once, in Node, is the obvious shape.

Sketch

Minimal entrypoint

#!/bin/sh
set -eu

: \"\${CFP_DATA_REPO_PATH:?required}\"
git config --global --add safe.directory \"\$CFP_DATA_REPO_PATH\"

# If there's no clone yet, do the initial one — the API needs a valid
# .git dir before openRepo() can run. Everything else (reconcile against
# origin on subsequent boots) is the API's job.
if [ ! -d \"\$CFP_DATA_REPO_PATH/.git\" ]; then
  if [ -z \"\${CFP_DATA_REMOTE:-}\" ]; then
    echo \"[entrypoint] ERROR: no working tree and no CFP_DATA_REMOTE\" >&2
    exit 1
  fi
  # Non-shallow so the API-side reconcile can rebase against any reasonable
  # divergence on subsequent boots.
  git clone --branch \"\${CFP_DATA_BRANCH:-main}\" \"\$CFP_DATA_REMOTE\" \"\$CFP_DATA_REPO_PATH\"
fi

exec \"\$@\"

Drops from ~190 lines to ~15.

API-side reconciliation

New plugin (or service) registered early in buildApp(), before servicesPlugin (which builds InMemoryState from the on-disk tree):

// apps/api/src/store/reconcile.ts (or similar)
export interface ReconcileResult {
  outcome: 'in-sync' | 'fast-forwarded' | 'pushed-ahead' | 'rebased' | 'conflict-escaped';
  oldCommit: string;
  newCommit: string;
  conflictBranch?: string; // when outcome === 'conflict-escaped'
}

export async function reconcileDataRepo(
  repo: Repository,
  opts: { branch: string; remote: string },
): Promise<ReconcileResult> {  }

Wired into two call sites:

  1. Boot path: registered as the first thing in buildApp after storePlugin (which calls openRepo) and before servicesPlugin (which builds the in-memory state). Plays the role the entrypoint plays today.
  2. Webhook (#65) handler: invoked from the POST /api/_internal/reload-data route under the write mutex, before rebuilding InMemoryState.

Same code path. Same logging. Same metrics (if we add any). Same conflict-escape-hatch semantics.

Implementation notes

  • Use gitsheets/hologit primitives where they exist (Repository.resolveRef, Repository.transact for any commit creation). Fall back to shelling git only where the library doesn't reach (e.g., git fetch, git push, git rebase). Cleaner refspec handling than today's bare clones.
  • Boot-time failures should crash the API process (let k8s restart). That preserves the entrypoint's current "loud failure on bad data-repo state" behavior — just expressed as a thrown error from await reconcileDataRepo(...) in buildApp.
  • Mutex sharing: the same write mutex that serializes API mutations must serialize reconciliation. A reconcile mid-transact would race against the gitsheets write.
  • Escape-hatch branch: still pushed to origin (named conflicts/<UTC>), still resets local to origin/<branch>. Reuses the existing entrypoint semantics. Operator gets a Slack/email if we wire that up later.

Out of scope

  • Auto-pull-on-interval (cron-style). Webhook + boot covers the cases we care about; cron is a fallback for if both go offline.
  • Removing the push daemon. It stays — outgoing direction is symmetric to incoming and there's no reason to consolidate further.
  • Multi-pod coordination. We're single-replica by architectural constraint.

Related

  • #65 — webhook endpoint that consumes this. The webhook handler is just a thin route wrapping reconcileDataRepo() plus the in-memory-state rebuild.
  • The two entrypoint bugs are tracked here by description, not in separate issues — moving the logic obsoletes them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions