feat(copilot): add seq ordinal to copilot_messages for order-preserving reads#4791
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
PR SummaryMedium Risk Overview Dual-write now writes Unit tests are extended for Reviewed by Cursor Bugbot for commit 38b9948. Configure here. |
Greptile SummaryThis PR adds a nullable
Confidence Score: 5/5Safe to merge — purely additive shadow write with no read-path changes and a well-validated backfill. The nullable column protects rolling deploys on old pods. The backfill CTE is correct (deduped by first occurrence, 0-based via ROW_NUMBER), the dual-write logic handles all edge cases (MAX in JS to avoid multi-row SQL collision, COALESCE-preserve on append, overwrite on replace, dedup before insert), and 15 new unit tests exercise every seq assignment path. No files require special attention. Important Files Changed
Reviews (2): Last reviewed commit: "feat(copilot): add seq ordinal to copilo..." | Re-trigger Greptile |
eb9bfe6 to
8f78f2c
Compare
…ng reads copilot_messages had no column preserving message order: created_at (set from each message's timestamp) ties at millisecond granularity in 58% of chats, and some chats have out-of-order timestamps within their array. The only other tiebreaker, id, is a random UUID — so ORDER BY created_at, id renders same-timestamp user/assistant pairs swapped. This blocks the R+1 read cutover. Add an integer seq = the message's 0-based index within the chat's JSONB array (ground-truth order), backfilled inline in migration 0219 (no script for self-hosters or us). Reads will use ORDER BY seq NULLS LAST, created_at, id at cutover; reads still come from JSONB after this PR. Design: - seq is a tiebreaker, not the sole sort key (concurrent-append/NULL safety). - Nullable now; defer NOT NULL so rolling-deploy old pods don't fail inserts. - replace (update-messages snapshot) overwrites seq = array index (re-densifies after a mid-conversation delete); append preserves existing seq via COALESCE and assigns base+idx from a single MAX(seq) read (never MAX+i in SQL — multi-row batches would collide). The non-atomic read-then-insert window is documented and bounded by the read tiebreak + snapshot re-densify. - Dedupe message ids before insert (87 prod chats carry dup ids; a repeated id in one INSERT...ON CONFLICT would otherwise throw). - Backfill picks first-occurrence per (chat,id), gap-free via ROW_NUMBER; validated on staging data (0-based, contiguous, 0 bad ranges). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
8f78f2c to
38b9948
Compare
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 38b9948. Configure here.
What
Adds an integer
seqcolumn tocopilot_messages= the message's 0-based index within the chat's JSONB array. This is the ground-truth order and unblocks the eventual read cutover (R+1) off the legacycopilot_chats.messagesJSONB.Reads still come from JSONB after this PR — this is purely the prerequisite that makes the new table safe to read from later.
Why
copilot_messageshad no column that preserves message order:created_at(set from each message'stimestamp) ties at millisecond granularity in 58% of chats (30,796 / 53,484 in prod).id, is a random UUID v4.So
ORDER BY created_at, idrenders same-timestamp user/assistant pairs swapped (assistant answer before the user question) — a proven production bug.How
ADD COLUMN seq→ backfill fromjsonb_array_elements ... WITH ORDINALITY(first-occurrence per(chat,id), gap-free viaROW_NUMBER) →CREATE INDEX (chat_id, seq) WHERE deleted_at IS NULL.messages-dual-write.ts):replace(update-messages snapshot, authoritative on order) overwritesseq = array index— re-densifies positions after a mid-conversation delete.appendpreserves existing seq viaCOALESCEand assignsbase + idxfrom a singleMAX(seq)read (neverMAX+iin SQL — multi-row batches would collide).INSERT ... ON CONFLICTwould otherwise throw).Design decisions
seqis a tiebreaker, not the sole sort key — read order at cutover isORDER BY seq NULLS LAST, created_at, id, so a concurrent-append seq tie or an un-backfilled NULL row can never re-introduce the swap.NOT NULL— during rolling deploys, old pods insert rows withoutseq; aNOT NULLcolumn would abort their best-effort inserts and silently drop dual-write coverage.Performance
Neutral-to-better steady-state. The
(chat_id, seq)index (int4) is smaller than the existing(chat_id, created_at, id); reads are the same O(log n + k). Appends add one cheapMAX(seq)probe; the snapshot-replace path assigns seq while already iterating. The only cost is the one-time backfillUPDATE, run inline withstatement_timeoutdisabled — same shape/cost as the 0217 backfill that already ran cleanly in prod.Verification
bun run check:api-validationpasses.Follow-ups (not this PR)
copilot_messages ORDER BY seq NULLS LAST, created_at, idbehind a % rollout flag, gated on a parity query returning 0.SET NOT NULLonseq(after a NULL sweep) + dropcopilot_messages_chat_created_at_idx.🤖 Generated with Claude Code