perf(copilot): read chat transcripts from copilot_messages (R+1 cutover)#4808
Conversation
Flip user-facing chat reads from the legacy copilot_chats.messages JSONB array (5.7GB, 99% TOAST) to the normalized copilot_messages table via a new loadCopilotChatMessages helper ordered by seq NULLS LAST, created_at, id — the verified canonical order. Both chat-detail getters (getAccessibleCopilotChat, getAccessibleCopilotChatWithMessages) now drop the messages column from their metadata select (no more whole-array detoast on every load) and assemble the transcript from the table after authorization. This cascades to the copilot + mothership GET endpoints and to resolveOrCreateChat's conversationHistory (the LLM payload). The normalize/effective-transcript pipeline is source-agnostic (copilot_messages.content == a JSONB array element), so transcripts are byte-identical. Dual-write and the JSONB column stay in place as the internal-logic source and fallback; removing JSONB writes is a later step. Prod integrity verified before cutover: 0 messages missing, 0 NULL-seq, 0 dup keys/seq, 0 orphans, order-parity vs JSONB = 0 mismatches. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryMedium Risk Overview Adds Reviewed by Cursor Bugbot for commit 2e7f4ec. Configure here. |
Greptile SummaryThis PR cuts over copilot chat transcript reads from the legacy
Confidence Score: 5/5Safe to merge — reads from the normalized table only after authorization succeeds, dual-write and JSONB column are untouched, and pre-cutover integrity was verified on prod. The change is a well-scoped read-path swap: the JSONB column remains written and available as a fallback, authorization always gates the new messages query, and the normalized table was fully validated against the JSONB source before cutover. The implementation is clean, the tests exercise the critical invariants (including the auth-denied no-query contract added in the head SHA), and the type-check and full suite pass. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant getAccessibleCopilotChatWithMessages
participant copilot_chats DB
participant authorizeCopilotChatRow
participant loadCopilotChatMessages
participant copilot_messages DB
Caller->>getAccessibleCopilotChatWithMessages: (chatId, userId)
getAccessibleCopilotChatWithMessages->>copilot_chats DB: SELECT metadata columns WHERE id=chatId AND userId=userId LIMIT 1
copilot_chats DB-->>getAccessibleCopilotChatWithMessages: chat row (no messages JSONB)
getAccessibleCopilotChatWithMessages->>authorizeCopilotChatRow: (chat, chatId, userId)
alt not found or auth denied
authorizeCopilotChatRow-->>getAccessibleCopilotChatWithMessages: null
getAccessibleCopilotChatWithMessages-->>Caller: null (no messages query)
else authorized
authorizeCopilotChatRow-->>getAccessibleCopilotChatWithMessages: authorized row
getAccessibleCopilotChatWithMessages->>loadCopilotChatMessages: (chatId)
loadCopilotChatMessages->>copilot_messages DB: SELECT content WHERE chat_id=chatId AND deleted_at IS NULL ORDER BY seq ASC NULLS LAST, created_at ASC, id ASC
copilot_messages DB-->>loadCopilotChatMessages: [{content}, ...]
loadCopilotChatMessages-->>getAccessibleCopilotChatWithMessages: "Record<string,unknown>[]"
getAccessibleCopilotChatWithMessages-->>Caller: "{...authorizedRow, messages}"
end
Reviews (2): Last reviewed commit: "test(copilot): cover auth-deny on a foun..." | Re-trigger Greptile |
Address PR review: exercise the `if (!authorized) return null` contract — when the chat row exists but authorization fails, the getter returns null and never issues the copilot_messages read. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 2e7f4ec. Configure here.
What
Cuts over user-facing copilot chat reads from the legacy
copilot_chats.messagesJSONB array to the normalizedcopilot_messagestable. This is the R+1 read cutover — the payoff for the table +seqordinal work that already shipped.Why
copilot_chatsis 5.7 GB, 99% of it the messages JSONB in TOAST. Every chat load detoasted + decompressed the whole array. Reading fromcopilot_messagesvia the(chat_id, seq)index avoids that entirely — biggest win on large/tail chats and on keeping the base table lean.How
loadCopilotChatMessages(chatId)inlifecycle.tsreadscontentfromcopilot_messagesordered byseq ASC NULLS LAST, created_at ASC, id ASC(the verified canonical order; rawsqlfragment because Drizzle'sasc()omitsNULLS LAST).getAccessibleCopilotChat,getAccessibleCopilotChatWithMessages) dropmessagesfrom the metadata select (no more detoast) and assemble the transcript from the table after authorization (no wasted query on denied access)./api/copilot/chat), mothership GET (/api/mothership/chats/[chatId]), andresolveOrCreateChat'sconversationHistory(the LLM payload) — all via the two getters.messages: []without a second query.The normalize → effective-transcript pipeline is unchanged and source-agnostic (
copilot_messages.contentis the same shape as a JSONB array element), so transcripts are byte-identical.Scope / safety
Integrity verified on prod before cutover
0 messages missing from the table · 0 NULL-seq · 0 duplicate keys · 0 duplicate seq within a chat · 0 orphans · order-parity vs JSONB = 0 mismatches.
Tests
lifecycle.test.ts: getters source messages from the table in order; empty chat →[]; auth-deny →nullwith no messages query; legacy getter;resolveOrCreateChatexisting (table-sourced history) vs new (empty, no read).check:api-validationpasses.Post-deploy verification
Staging smoke: load a large chat via both GETs, confirm identical transcript;
EXPLAINshowscopilot_messages_chat_seq_idxand no detoast ofcopilot_chats.messages. Re-run the low-load TABLESAMPLE parity spot-check (currently 0).