diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
index 12edd194..538d88ed 100644
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -178,9 +178,10 @@ crawler/src/
- **`getSummary`**: サイト全体の統計(内部/外部のページ数とコンテンツ数、ステータス分布、Content-Type 分布、メタデータ充足率)
- **`getPageDetail`**: 単一ページの詳細情報(メタデータ、アウトバウンド/インバウンドリンク、リダイレクト元)
- **`getPageHtml`**: HTML スナップショット取得(truncation サポート)
-- **`listLinks`**: リンク分析(`type: 'broken' | 'external'`、anchor 単位 = 1 行 1 `` タグ、重複排除なし)。dest は `pages.redirectDestId` 経由で canonical destination まで解決した上で broken/external 判定(`includeRedirectSources: true` で解決を無効化し literal を見る)。関数自体は変更していないため CLI/MCP は従来通り `type: 'external'` で anchor 単位の生データを取得できるが、**viewer の `/api/links?type=external` だけは `listExternalLinks` に切り替え済み**(後述) — 「外部リンク」ビューは宛先ごとに集約した一覧を必要とするため
-- **`listExternalLinks`**: viewer の「外部リンク」ビュー用の legacy 経路(read model が無い/古いアーカイブのフォールバック)。外部リンク先を canonical destination(`listLinks` と同じ `COALESCE(canonical.*, dest.*)` 解決パターン)ごとに `GROUP BY` で重複排除し、`referrerCount`(`COUNT(DISTINCT source.id)` — 同一ページからの複数アンカーは 1 件として数える)を付与した一覧。ページネーションの `total` は distinct 宛先数(anchor 数ではない)を GROUP BY サブクエリでラップして算出 — `paginateQuery` ヘルパーは素朴な `count(idColumn)` のため GROUP BY 済みクエリと非互換で使えない。宛先の詳細(参照元ページ一覧)は新規ビューを作らず既存の `getPageDetail`(`isExternal`/`scraped` 制約なし)の `inboundLinks` をそのまま再利用する。**`viewer_external_links` read model が current な場合は `listViewerExternalLinks` に切り替わる**(後述の「設計注意(外部リンク read model)」参照)— この関数自体はそのフォールバックとして無変更のまま残る
-- **`listViewerExternalLinks`**: `viewer_external_links` read model 専用の fast path。`listExternalLinks` と同じオプション/レスポンス形だが、集計(JOIN + GROUP BY + COUNT DISTINCT)は read model ビルド時に1回だけ実行済みなので、実行時は単純な indexed SELECT + `paginateQuery`(GROUP BY 不要になったため素朴な helper がそのまま使える)
+- **`listLinks`**: リンク分析(`type: 'broken' | 'external'`、anchor 単位 = 1 行 1 `` タグ、重複排除なし)。dest は `pages.redirectDestId` 経由で canonical destination まで解決した上で broken/external 判定(`includeRedirectSources: true` で解決を無効化し literal を見る)。関数自体は変更していないため CLI/MCP は従来通り `type: 'broken' | 'external'` で anchor 単位の生データを取得できる。viewer 側は `type: 'external'` は `listExternalLinks`/`listViewerExternalLinks`、`type: 'broken'` は `listViewerBrokenLinks` が current な read model を持つ場合に切り替わり(後述)、この関数は両方の legacy フォールバックとしてのみ残る
+- **`listExternalLinks`**: viewer の「外部リンク」ビュー用の legacy 経路(read model が無い/古いアーカイブのフォールバック)。外部リンク先を canonical destination(`listLinks` と同じ `COALESCE(canonical.*, dest.*)` 解決パターン)ごとに `GROUP BY` で重複排除し、`referrerCount`(`COUNT(DISTINCT source.id)` — 同一ページからの複数アンカーは 1 件として数える)を付与した一覧。ページネーションの `total` は distinct 宛先数(anchor 数ではない)を GROUP BY サブクエリでラップして算出 — `paginateQuery` ヘルパーは素朴な `count(idColumn)` のため GROUP BY 済みクエリと非互換で使えない。宛先の詳細(参照元ページ一覧)は新規ビューを作らず既存の `getPageDetail`(`isExternal`/`scraped` 制約なし)の `inboundLinks` をそのまま再利用する。**`viewer_external_links` read model が current な場合は `listViewerExternalLinks` に切り替わる**(後述の「設計注意(viewer_anchor_facts read model、issue #114)」参照)— この関数自体はそのフォールバックとして無変更のまま残る
+- **`listViewerExternalLinks`**: `viewer_external_links` read model 専用の fast path。`listExternalLinks` と同じオプション/レスポンス形。集計は read model ビルド時、`viewer_anchor_facts` を組み立てるのと同じ `anchors` スキャン1回から**メモリ上で**導出済み(`deriveExternalLinkSummaryRows`、issue #114 で `computeExternalLinkRows` の独自スキャンを置き換え)なので、実行時は単純な indexed SELECT + `paginateQuery`(GROUP BY 不要になったため素朴な helper がそのまま使える)
+- **`listViewerBrokenLinks`**: `viewer_anchor_facts` read model 専用の fast path(issue #114)。`viewer_pages` と同じ4系統(初回/forward keyset/backward keyset/offset直読み)のcursorページネーションを実装し、`/api/links?type=broken` の `nextCursor`/`prevCursor` 契約を担う。`urlPattern`(source/dest 2列に跨る LIKE)と `includeRedirectSources`(read modelは正規化済みdestinationしか持たない)を指定された場合は fast path が使えないため `listLinks` にフォールバックする — `/api/pages` の `urlPattern`/`directory` 除外と同じ考え方
- **`listIsolatedPages`** / **`listIsolatedClusters`** / **`getIsolatedCluster`**: inventory subgraph の **完全孤立** (singleton) / **孤立集合** (connected component, size ≥ 2)。crawled-wins downgrade の不変量により crawled 行は定義上 isolated 判定から除外される。cluster の edge は redirect 解決済み anchor を無向で見た weakly connected component(共通ヘルパー `compute-isolated-clusters.ts` が `resolve-redirect-chain` + union-find で計算)
- **`listResources`**: サブリソース一覧(CSS, JS, 画像、フォント)
- **`listImages`**: 画像一覧(alt 欠損、寸法欠損、オーバーサイズ検出)
@@ -255,7 +256,7 @@ nitpicker viewer
→ SIGINT/SIGTERM: manager.closeAll() → server.close() → resolve(CLI が exit)
```
-**REST API(アーカイブは起動時固定なので archiveId 不要):** `GET /api/summary`, `/api/pages`(`hasCSP`/`hasXFrameOptions`/`hasXContentTypeOptions`/`hasHSTS` の 4 列を含む。旧 `/api/headers`・「Headers」ビューは「ページ」ビューへ統合済み、CLI/MCP 向けの `checkHeaders` 自体は残存), `/api/pages/detail?url=`, `/api/pages/html?url=`, `/api/links?type=`(`broken` は `listLinks` 経由で anchor 単位のまま、canonical destination が HTTP 404 のみ。403/5xx/未取得(NULL) は broken 扱いしない。`external` は canonical destination ごとに重複排除され `referrerCount` を返す — read model が current なら `listViewerExternalLinks`、そうでなければ `listExternalLinks` にフォールバック(`/api/pages` と同じ二層構成)。宛先の参照元一覧は新規エンドポイントを作らず既存の `/api/pages/detail` の inboundLinks を再利用する), `/api/resources`, `/api/resources/referrers?resourceUrl=`, `/api/images`, `/api/violations`, `/api/duplicates`, `/api/mismatches`, `/api/graph`(内部ページのリンクグラフ、`getLinkGraph`), `/api/directory-tree`(全 root の初期 3 depth ツリー、`getDirectoryTree`), `/api/directory-tree/children?nodeId=`(1 ノード直下の子ディレクトリ、`listDirectoryChildren`), `/api/directory-tree/pages?nodeId=&cursor=&limit=`(1 ディレクトリ直下ページの cursor 一覧、`listDirectoryPages`), `/api/info`(開いているアーカイブの絶対パス、フッター表示用)。クエリパラメータ → query options 変換は `query-params/to-number.ts` / `to-boolean.ts`、エラーは `sanitize-error-message.ts` で絶対パスを伏せて JSON 返却(mcp-server と同方針)。旧 `/api/page-links`(`listPageLinks`)は「ページリンク」ビューの廃止に伴い削除 — per-page の status/referrers/redirect-from は Page Detail ビュー(`/api/pages/detail`)の inbound/outbound/redirectFrom で個別ページ単位に確認する。`getPageDetail` は `isSkipped`/`skipReason`(robots.txt / `excludeUrls` による除外理由)も返すようになり、URL 既知の場合は除外理由を引き続き確認できる。**受容したギャップ**: `listPages` / `listPagesByTag` / `listPagesByJsonLdType` はすべて `scraped = 1` 前提のため、「除外されて一度も取得されていない URL 一覧」を一括列挙する手段は無くなった(旧 `listPageLinks` だけが `scraped` 制約なしだった)。URL が分かっていれば `getPageDetail` で確認できるが、一括把握が必要な場合は `nitpicker query error-kinds` や archive の `pages` テーブルを直接クエリすること。
+**REST API(アーカイブは起動時固定なので archiveId 不要):** `GET /api/summary`, `/api/pages`(`hasCSP`/`hasXFrameOptions`/`hasXContentTypeOptions`/`hasHSTS` の 4 列を含む。旧 `/api/headers`・「Headers」ビューは「ページ」ビューへ統合済み、CLI/MCP 向けの `checkHeaders` 自体は残存), `/api/pages/detail?url=`, `/api/pages/html?url=`, `/api/links?type=`(`broken` は canonical destination が HTTP 404 のみ(403/5xx/未取得(NULL) は broken 扱いしない)を `nextCursor`/`prevCursor` 付きで返す — read model が current かつ `urlPattern`/`includeRedirectSources` 未指定なら `listViewerBrokenLinks`(`viewer_anchor_facts` fast path、keyset cursor)、そうでなければ `listLinks`(legacy、anchor 単位、offset を文字列化した疑似cursor)にフォールバック。`external` は canonical destination ごとに重複排除され `referrerCount` を返す — read model が current なら `listViewerExternalLinks`、そうでなければ `listExternalLinks` にフォールバック(同じ二層構成だが除外条件なし)。宛先の参照元一覧は新規エンドポイントを作らず既存の `/api/pages/detail` の inboundLinks を再利用する), `/api/resources`, `/api/resources/referrers?resourceUrl=`, `/api/images`, `/api/violations`, `/api/duplicates`, `/api/mismatches`, `/api/graph`(内部ページのリンクグラフ、`getLinkGraph`), `/api/directory-tree`(全 root の初期 3 depth ツリー、`getDirectoryTree`), `/api/directory-tree/children?nodeId=`(1 ノード直下の子ディレクトリ、`listDirectoryChildren`), `/api/directory-tree/pages?nodeId=&cursor=&limit=`(1 ディレクトリ直下ページの cursor 一覧、`listDirectoryPages`), `/api/info`(開いているアーカイブの絶対パス、フッター表示用)。クエリパラメータ → query options 変換は `query-params/to-number.ts` / `to-boolean.ts`、エラーは `sanitize-error-message.ts` で絶対パスを伏せて JSON 返却(mcp-server と同方針)。旧 `/api/page-links`(`listPageLinks`)は「ページリンク」ビューの廃止に伴い削除 — per-page の status/referrers/redirect-from は Page Detail ビュー(`/api/pages/detail`)の inbound/outbound/redirectFrom で個別ページ単位に確認する。`getPageDetail` は `isSkipped`/`skipReason`(robots.txt / `excludeUrls` による除外理由)も返すようになり、URL 既知の場合は除外理由を引き続き確認できる。**受容したギャップ**: `listPages` / `listPagesByTag` / `listPagesByJsonLdType` はすべて `scraped = 1` 前提のため、「除外されて一度も取得されていない URL 一覧」を一括列挙する手段は無くなった(旧 `listPageLinks` だけが `scraped` 制約なしだった)。URL が分かっていれば `getPageDetail` で確認できるが、一括把握が必要な場合は `nitpicker query error-kinds` や archive の `pages` テーブルを直接クエリすること。
**バイナリ:** なし(CLI の `viewer` サブコマンド経由で起動)
@@ -341,13 +342,21 @@ nitpicker viewer
>
> **`getDirectoryTree` の ORDER BY は `path_sort_key` 単独、`root_key` を含めない**: 全 root を 1 クエリで返す設計上、`root_key` の等価フィルタが存在しないため、`vdn_root_depth_path (root_key, depth, path_sort_key, node_id)` のような `root_key` 先頭 index は `depth <= 3` という range 条件との組み合わせで一切活用できず、`EXPLAIN QUERY PLAN` で実測すると `USE TEMP B-TREE FOR LAST TERM OF ORDER BY` が付く(PR #96 の `idx_pages_listfilter` column 順ミスと同型の教訓)。`path_sort_key` を先頭に置いた `vdn_path_depth (path_sort_key, depth, node_id)` に張り替え、`ORDER BY path_sort_key` のみに変更することで `SCAN ... USING INDEX vdn_path_depth`(sort 無し、`depth` は残差フィルタ)に収まることを確認済み。root_key を ORDER BY から外しても、grouping は JS 側で `Map` に振り分けるだけなので各 root 内の相対順序(`path_sort_key` 昇順)は保たれる。**検索キーワード**: 「directory-tree」「ディレクトリツリー」「has_children」「vdn_path_depth」「USE TEMP B-TREE」。
-> **設計注意(外部リンク read model):** `listExternalLinks`(PR #153)は `anchors JOIN pages(source) JOIN pages(dest) LEFT JOIN pages(canonical)` を `COALESCE` 計算列で `GROUP BY` し `COUNT(DISTINCT source.id)` を求める形で、リクエストごとにこの JOIN+集計を(`total` 用サブクエリと data 用の)2 回実行していた。SQLite は `COUNT(DISTINCT ...)` で既存 index を使わず別の b-tree を都度構築することが知られており(SQLite forum 実測: `count(distinct id)` 単体 6.4 秒、他の集約と同一クエリに混ぜると 55.2 秒まで悪化する例が報告されている)、`GROUP BY` も式インデックス(`CREATE INDEX` の式と `WHERE`/`GROUP BY` の式が構文的に完全一致しないと使われない)では確実に解決できない。回避策として同フォーラムが推奨するのは集計をあらかじめ一時テーブルに書き出す方式で、これは本リポジトリの `viewer_pages`/`viewer_directory_nodes`(issue #106〜#112)と同じ「read model を作って計測してから最適化する」方針そのものである。
+> **設計注意(viewer_anchor_facts read model、issue #114):** `listExternalLinks`(PR #153)は `anchors JOIN pages(source) JOIN pages(dest) LEFT JOIN pages(canonical)` を `COALESCE` 計算列で `GROUP BY` し `COUNT(DISTINCT source.id)` を求める形で、リクエストごとにこの JOIN+集計を(`total` 用サブクエリと data 用の)2 回実行していた。SQLite は `COUNT(DISTINCT ...)` で既存 index を使わず別の b-tree を都度構築することが知られており(SQLite forum 実測: `count(distinct id)` 単体 6.4 秒、他の集約と同一クエリに混ぜると 55.2 秒まで悪化する例が報告されている)、`GROUP BY` も式インデックス(`CREATE INDEX` の式と `WHERE`/`GROUP BY` の式が構文的に完全一致しないと使われない)では確実に解決できない。加えて `/api/links?type=broken`(`listLinks`)は fast path を持たず 13-16 秒級の anchor スキャンのまま、ページネーションも offset ベースで `#103` 自身の "Do not introduce large OFFSET based pagination for virtualized lists" に反していた。issue #114 は broken/external 両方を `viewer_anchor_facts` に載せる設計を提示していたが、実装時に以下3点を **read/write/storage のいずれも妥協しない** 基準で再検討した。
>
-> `viewer_external_links`(`dest_page_id` PK / `dest_url` / `status` / `referrer_count`)は `buildViewerReadModel` の同じトランザクション内で `computeExternalLinkRows`(`viewer-read-model/compute-external-link-rows.ts`)が構築する。集計ロジック(`COALESCE` 解決・`COUNT(DISTINCT source.id)`)は `listExternalLinks` から一切変更せずそのまま移植 — `referrerCount` は `getPageDetail.inboundLinks`(#71)と同じ数え方(重複アンカーは 1 referrer)を保つ契約があるため。`viewer_pages`/directory tree と違い、`sourceRows`(`pages` のみ)を再利用できず `anchors` への専用クエリが必要(リンク情報は `anchors` にしかない)。
+> 1. **`url_refs`/`content_items`(issue #139 のref-table方式)は使わない**: `#114` が参照するドキュメント上のスキーマは正規化されたURL辞書テーブルを前提にするが、`#139` はまだ着手されておらず、`#103` 自身の実行順序も `#139` を `#114` より後(16番目 vs 7番目)に置いている。今すぐref-table化するのは前提条件が揃っていない。代わりに `viewer_pages.url_sort_key` と同じ発想で `source_url_sort_key`/`dest_url_sort_key` を build 時にコピーしたテキスト列とする——indexed `ORDER BY` にはjoin前のsort keyが必須で、これは避けられないコスト。ただしフルURLを複数列・複数箇所に複製するのではなく「表示に使う実際の値そのもの」を1列に持たせるだけに絞り、`viewer_pages`のように`url`と`url_sort_key`を別々に複製することもしない。実測: 5万ページ・40万anchor規模で追加DBサイズ152 MiB(後述のベンチマーク)——`#114`が警告する1300万行規模での「+5GB」は現実のベンチマーク規模(40万行オーダー、CLAUDE.md/ARCHITECTURE.mdの既存ベンチマーク全てがこの規模)とは2桁以上異なり、今この規模で正規化コストを払う判断はしない。将来的に本当に1300万行規模に達したら `#139` のref-table化を検討する、というスコープの切り方。
+> 2. **`viewer_external_links` はテーブルとして分離したまま維持し、`viewer_anchor_facts` から1回のスキャンで導出する**: `viewer_anchor_facts` は `(source_page_id, dest_page_id)` ペア単位でdedup済みのedgeテーブルなので、宛先ごとのreferrerCountは「そのdest_page_idを持つedge行の数」を数えるだけで求まる(edge単位で既にdedup済みなので `COUNT(DISTINCT source)` と数学的に同値)——`GROUP BY`のランタイム再導入は不要。`compute-anchor-fact-rows.ts` が `anchors` を1回だけスキャンし、その結果(メモリ上の配列)から `derive-external-link-summary-rows.ts`(純粋関数、DBアクセス無し)が `viewer_external_links` 行を導出する。旧 `compute-external-link-rows.ts`(独自の2回目の `anchors` スキャン)は廃止。テーブルを統合して edge 単位一本化すると External Links ビューの「宛先ごとの参照元数」というUXが失われる(PR #153 のUX決定を破壊する)ため、2つの独立したテーブルとして残す判断をした。
+> 3. **Broken Links は edge dedup(`count` 列)を採用**: 同一 `(source_page_id, dest_page_id)` ペアの重複アンカー(同じリンクがヘッダー/フッターに複数回出現する等)は1行に集約し `count` で観測数を持つ。read(走査行数減)・write(build時に1回集約するだけ)・storage(重複edgeの行数削減)のいずれでも1anchor=1行より優れる——`listLinks`(legacy)とは総件数が変わり得るが、これは `/api/pages` の plain sort vs natural sort と同種の、根拠のある fast path/legacy 分岐として受容する。
>
-> **keyset cursor ではなく `paginateQuery`(offset ベース)を使う**: `viewer_pages` が `status_sort_key`/`status_desc_key`/`NULL_STATUS_SENTINEL` という仕掛けを持つのは keyset cursor 特有の要件(SQL の 3 値論理で `NULL` 比較が壊れる、`DESC` を常に `ASC` 方向スキャンにする必要がある)で、`/api/links?type=external` の REST 契約はそもそも offset ベースのまま変更していないため、この複雑さは不要。`viewer_external_links` の 3 index(`vel_url` / `vel_status` / `vel_referrer_count`)はいずれも単純な単方向 index で、`DESC` は同じ index の逆順スキャンで足りる。
+> **スキーマ**: `viewer_anchor_facts(edge_id PK, source_page_id, dest_page_id, source_url_sort_key, dest_url_sort_key, status, status_sort_key, status_desc_key, count, is_broken, is_external_link)`。`is_external_link` は永続化するが(SQLite の INTEGER 0/1 は実質無コスト)index は張らない——read時にこの列でフィルタするクエリは存在せず、build時の `deriveExternalLinkSummaryRows` の in-memory pass でのみ使われるため。`status_desc_key`(`viewer_pages` と同じ負数キー)が必要な理由: `docs/viewer-sql-query-plan.md` の Stable Ordering 規則は `status desc` でも `source_url_sort_key`/`edge_id` のタイブレークを ASC のまま保つが、row-value keyset タプル比較は列ごとに方向を混在させられないため、主キーを負数化して常に ASC スキャンにする(`sourceUrl`/`destUrl` は主キーとタイブレークが同方向に揃うため、この仕掛けは不要)。
>
-> **fast path / legacy の二層構成**: `register-links-route.ts` は `/api/pages` と同じパターンで `isViewerReadModelCurrent` を見て `listViewerExternalLinks`(fast path)と `listExternalLinks`(legacy、無変更のまま残存)を切り替える。`urlPattern`/`status` はどちらの経路でも同じ列に対応するため、`/api/pages` の `hasCSP` 等のような「特定フィルタ指定時は強制 legacy」という除外条件は無い。スキーマ変更を伴うため `VIEWER_READ_MODEL_SCHEMA_VERSION` を 4→5 に bump し、旧バージョンの read model は自動再ビルド対象にした。**検索キーワード**: 「external links」「外部リンク」「COUNT DISTINCT」「viewer_external_links」「GROUP BY 遅い」。
+> **cursor pagination**: `viewer-anchor-facts-cursor/` を `viewer-pages-cursor/` を模した専用モジュールとして新設(既存2つの cursor 実装 `viewer-pages-cursor`/`directory-pages-cursor` のどちらも汎用ジェネリック化されていない慣習に合わせ、共有モジュール化はしない)。`listViewerBrokenLinks` は `listViewerPages` と同じ4系統(初回/forward keyset/backward keyset/offset直読み)を実装するが、`source_url_sort_key`/`dest_url_sort_key`/`status` が既にwriteモデルへの再joinなしで表示可能な値そのものなので、`listViewerPages`のような「id解決→limit後にwideテーブルへjoin」というステップが不要——`viewer_anchor_facts`単体へのSELECTがそのまま最終結果になる。
+>
+> **ページネーション契約の変更**: `/api/links?type=broken` は従来 `{items, total}` のみのoffsetベース応答だったが、`#103`の"Do not introduce large OFFSET based pagination for virtualized lists"を満たすため`nextCursor`/`prevCursor`を持つ契約に変更した(`/api/pages`が`listPages`→`listViewerPages`昇格時に行った移行と同型)。legacy (`listLinks`) 経路は `buildLegacyPagesCursors`(offsetを文字列化した疑似cursor)で同じ契約を満たし、フロントの `useLinksInfinite`(`use-links-infinite.ts`)は`nextCursor`のみを見ればどちらの経路でも動作する。**列・ソート・フィルタの見た目のUIは無変更**——`broken-links-view.tsx`はsourceUrl/destUrl/statusの3列のみ表示し、変更したのはhookの内部実装(offset→cursor)のみ。MPAページネーション(`usePagedQuery`経由)はこの変更と無関係(`/api/links`はplainなoffset/limitも引き続き受け付ける)。
+>
+> **fast path / legacy の二層構成**: `register-links-route.ts` は `external`/`broken` 両方で `isViewerReadModelCurrent` を見て切り替える。`external`は除外条件なし(`urlPattern`/`status`が両経路の同じ列に対応)。`broken`は `urlPattern`(source/dest 2列に跨るLIKEで単一indexで満たせない)または `includeRedirectSources`(read modelは正規化済みdestinationしか持たない)が指定された場合に強制legacy——`/api/pages`の`urlPattern`/`directory`除外と同じ考え方。スキーマ変更を伴うため `VIEWER_READ_MODEL_SCHEMA_VERSION` を 5→6 に bump。
+>
+> **ベンチマーク実測**(`scripts/bench-viewer-anchor-facts.mjs`、synthetic archive、実顧客データ不使用): 5万ページ・40万anchor規模で、read model build time 5.9秒、追加DBサイズ152 MiB(`viewer_anchor_facts`はedge dedup後35万行)。`/api/links?type=broken`は `sourceUrl`/`destUrl`/`status` 昇順・降順の全5パターンで `EXPLAIN QUERY PLAN` が一貫して `SEARCH viewer_anchor_facts USING COVERING INDEX vaf_broken_*` (`TEMP B-TREE`無し)となり、**warm p50 1.2ms、p95 1.2-1.8ms**、cold(初回) 1.2-9.6ms——`docs/viewer-sql-query-plan.md`のtarget(20-80ms)を大幅に上回る。旧 `listLinks` の13-16秒から数千倍の改善。**検索キーワード**: 「broken links」「external links」「COUNT DISTINCT」「viewer_anchor_facts」「viewer_external_links」「GROUP BY 遅い」「issue #114」。
### @nitpicker/cli
diff --git a/CLAUDE.md b/CLAUDE.md
index c70a8e90..2ec491b1 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -82,7 +82,7 @@ packages/
> **Note (ディレクトリツリー read model、issue #107)**: `viewer_directory_nodes` / `viewer_directory_pages` は `viewer_pages` を返す `sourceRows` を再利用し `buildDirectoryTreeRows` が純粋関数としてメモリ上に構築する。**root_key はホスト単位、ただし internal ページを 1 件も持たないホストは除外**(外部リンク先ドメインの無意味な 1 ページツリーを防ぐ)。**ディレクトリ/ページ境界は末尾スラッシュで判定**(`/blog/2024/post-1` と `/blog/2024/` は同じ `/blog/2024/` ノードに着地)。**`has_children` は `direct_child_dir_count > 0` のみ**(`direct_page_count` を含めると構築ロジック上絶対に `false` にならないため、UI の展開矢印が意味を持つよう子ディレクトリの有無だけを見る)。この機能に legacy フォールバックは存在しないため、3関数(`getDirectoryTree`/`listDirectoryChildren`/`listDirectoryPages`)とも `hasViewerReadModel` ではなく `isViewerReadModelCurrent` を guard に使う。詳細は ARCHITECTURE.md の `@nitpicker/viewer` 節「設計注意(ディレクトリツリー read model...)」を正とする。
-> **Note (外部リンク read model)**: `listExternalLinks`(PR #153)は `anchors` の JOIN + `COALESCE` 計算列での `GROUP BY` + `COUNT(DISTINCT source.id)` をリクエストごとに(`total` 用と data 用で)2 回実行していた。SQLite の `COUNT(DISTINCT ...)` は既存 index を使わず別 b-tree を都度構築する既知のパフォーマンス病理を持つため(実測: 単体 6.4 秒、他の集約と混ぜると 55.2 秒まで悪化する例が SQLite forum に報告されている)、`viewer_pages`/`viewer_directory_nodes` と同じ read model パターンに乗せた。`viewer_external_links`(`dest_page_id` PK / `dest_url` / `status` / `referrer_count`)は `buildViewerReadModel` 内で `computeExternalLinkRows` が `anchors` への専用クエリ(`sourceRows` 再利用不可 — リンク情報は `pages` にはない)で1回だけ集計して構築する。集計ロジック自体(`COALESCE` 解決、referrer 重複排除)は `listExternalLinks` から無変更で移植 — `getPageDetail.inboundLinks`(#71)とのカウント粒度契約を崩さないため。ページネーションは keyset cursor ではなく `paginateQuery`(offset ベース、REST 契約が offset のままなので不要な複雑さを持ち込まない)。`register-links-route.ts` は `/api/pages` と同じ二層構成で `isViewerReadModelCurrent` を見て `listViewerExternalLinks`(fast path)↔ `listExternalLinks`(legacy、無変更で残存)を切り替える。スキーマ変更のため `VIEWER_READ_MODEL_SCHEMA_VERSION` を 4→5 に bump。詳細は ARCHITECTURE.md の `@nitpicker/viewer` 節「設計注意(外部リンク read model)」を正とする。
+> **Note (viewer_anchor_facts read model、issue #114)**: `listExternalLinks`(PR #153)は `anchors` の JOIN + `COALESCE` 計算列での `GROUP BY` + `COUNT(DISTINCT source.id)` を、`listLinks(type:'broken')` はfast pathなしの13-16秒級anchorスキャン+offsetページネーションのまま、それぞれ抱えていた。issue #114 は broken/external 両方を `viewer_anchor_facts` に載せる設計を提示していたが、実装は「read/write/storageのいずれも妥協しない」基準で再検討し、ドキュメント通りのref-table(`url_refs`/`content_items`、issue #139)方式は採用しなかった(#139はまだ未着手で `#103` の実行順序上も `#114` より後)。代わりに `source_url_sort_key`/`dest_url_sort_key` を `viewer_pages.url_sort_key` と同じ発想でインライン複製するのみに絞った。`viewer_anchor_facts`(`edge_id` PK、`(source_page_id, dest_page_id)` ペア単位でdedupし`count`で重複anchorを吸収、`is_broken`/`is_external_link`フラグ、`status_sort_key`/`status_desc_key`)は `compute-anchor-fact-rows.ts` が `anchors` を1回だけスキャンして構築する。`viewer_external_links` はこの1回のスキャン結果から `derive-external-link-summary-rows.ts`(純粋関数、DBアクセス無し)が導出するよう変更——旧 `compute-external-link-rows.ts`(独自の2回目の`anchors`スキャン)は廃止。`listViewerBrokenLinks` は `listViewerPages` と同じ4系統cursorページネーションを実装し、`/api/links?type=broken` の応答契約もoffsetのみから `nextCursor`/`prevCursor` 付きに変更した(`#103`の"large OFFSETを使うな"に対応、フロントのUI見た目は無変更)。`register-links-route.ts` は `external`/`broken` 両方で `isViewerReadModelCurrent` による二層dispatchを持つ(`broken`は`urlPattern`/`includeRedirectSources`指定時に強制legacy)。スキーマ変更のため `VIEWER_READ_MODEL_SCHEMA_VERSION` を 5→6 に bump。5万ページ・40万anchor規模の実測で `viewer_anchor_facts` はwarm p50 1.2ms(旧13-16秒から数千倍改善)。詳細は ARCHITECTURE.md の `@nitpicker/viewer` 節「設計注意(viewer_anchor_facts read model、issue #114)」を正とする。
## CLI コマンド
diff --git a/README.md b/README.md
index fd6559e0..db3e9222 100644
--- a/README.md
+++ b/README.md
@@ -285,7 +285,7 @@ npx @nitpicker/cli viewer-build [--force]
- **MPA**: Prev / Next + ページ番号 + ジャンプ入力。現在ページとページサイズはどちらも URL クエリ(`?page=N` / `?pageSize=N`、ともに 1-indexed)に乗るため deep-link / 共有 / ブラウザ戻る/進むが完全に成立する(ページサイズが URL に無いと、`?page=5` を共有しても受け手側のサイズ次第で別の行が見えてしまう)。表示件数は 50 / 100 / 200。フィルタ変更で `?page=` は自動クリア、ページサイズ変更時も `?page=` を 1 に戻す(旧オフセットは新しい窓では意味を持たない)。デフォルト値(page=1, pageSize=100)は URL から省略
- **仮想スクロール**: TanStack Query infinite query + TanStack Virtual。**10 万行規模をクライアント全件ロードせず一定メモリで表示**するため、deep-link は捨てて巨大データの探索性を優先したいときの opt-in
-モード本体は localStorage(`nitpicker-pagination-mode`)。ページサイズも localStorage(`nitpicker-page-size`)に保存されるが、これは新規タブ・直 URL 訪問時の hint であり、URL の `?pageSize=` が常に優先される。両モードとも backend は同じ `limit`/`offset` API(無改修)。
+モード本体は localStorage(`nitpicker-pagination-mode`)。ページサイズも localStorage(`nitpicker-page-size`)に保存されるが、これは新規タブ・直 URL 訪問時の hint であり、URL の `?pageSize=` が常に優先される。両モードとも同じ REST エンドポイントを叩くが、継続方法はビュー次第: MPA は常に `?page=`/`?pageSize=` から `limit`/`offset` を組み立てる一方、仮想スクロールは Pages / Broken Links では read model のキーセット `nextCursor` を、それ以外のビューでは `limit`/`offset` を使う。
### Errors ビュー
diff --git a/packages/@nitpicker/query/src/list-viewer-broken-links.spec.ts b/packages/@nitpicker/query/src/list-viewer-broken-links.spec.ts
new file mode 100644
index 00000000..15583c43
--- /dev/null
+++ b/packages/@nitpicker/query/src/list-viewer-broken-links.spec.ts
@@ -0,0 +1,610 @@
+import path from 'node:path';
+
+import { tryParseUrl as parseUrl } from '@d-zero/shared/parse-url';
+import { Archive } from '@nitpicker/crawler';
+import { afterAll, beforeAll, describe, expect, it } from 'vitest';
+
+import { listViewerBrokenLinks } from './list-viewer-broken-links.js';
+import { buildViewerReadModel } from './viewer-read-model/build-viewer-read-model.js';
+
+const __filename = new URL(import.meta.url).pathname;
+const __dirname = path.dirname(__filename);
+const workingDir = path.resolve(__dirname, '__test_fixtures_list_viewer_broken_links__');
+
+const META = {
+ lang: null,
+ title: null,
+ description: null,
+ keywords: null,
+ noindex: false,
+ nofollow: false,
+ noarchive: false,
+ canonical: null,
+ alternate: null,
+ 'og:type': null,
+ 'og:title': null,
+ 'og:site_name': null,
+ 'og:description': null,
+ 'og:url': null,
+ 'og:image': null,
+ 'twitter:card': null,
+};
+
+describe('listViewerBrokenLinks', () => {
+ let archive: InstanceType;
+ const archiveFilePath = path.resolve(
+ workingDir,
+ 'list-viewer-broken-links-test.nitpicker',
+ );
+
+ beforeAll(async () => {
+ const { mkdirSync } = await import('node:fs');
+ mkdirSync(workingDir, { recursive: true });
+
+ archive = await Archive.create({ filePath: archiveFilePath, cwd: workingDir });
+ await archive.setConfig({
+ baseUrl: 'https://example.com',
+ name: 'test',
+ version: '0.10.0',
+ recursive: true,
+ interval: 0,
+ image: true,
+ fetchExternal: false,
+ parallels: 1,
+ roots: ['https://example.com'],
+ excludes: [],
+ excludeKeywords: [],
+ excludeUrls: [],
+ maxExcludedDepth: 0,
+ retry: 3,
+ fromList: false,
+ disableQueries: false,
+ userAgent: 'test',
+ ignoreRobots: false,
+ });
+
+ await archive.setPage({
+ url: parseUrl('https://example.com/page-a')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: { ...META, title: 'Page A' },
+ anchorList: [
+ {
+ href: parseUrl('https://example.com/broken-a')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Broken A',
+ },
+ {
+ href: parseUrl('https://example.com/forbidden')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Forbidden',
+ },
+ {
+ href: parseUrl('https://example.com/server-error')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Server error',
+ },
+ {
+ href: parseUrl('https://example.com/never-fetched')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Never fetched',
+ },
+ ],
+ imageList: [],
+ isSkipped: false,
+ });
+ await archive.setPage({
+ url: parseUrl('https://example.com/page-b')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: { ...META, title: 'Page B' },
+ anchorList: [
+ {
+ href: parseUrl('https://example.com/broken-b')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Broken B',
+ },
+ ],
+ imageList: [],
+ isSkipped: false,
+ });
+
+ await archive.setPage({
+ url: parseUrl('https://example.com/broken-a')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 404,
+ statusText: 'Not Found',
+ contentType: 'text/html',
+ contentLength: 0,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+ await archive.setPage({
+ url: parseUrl('https://example.com/broken-b')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 404,
+ statusText: 'Not Found',
+ contentType: 'text/html',
+ contentLength: 0,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+ await archive.setPage({
+ url: parseUrl('https://example.com/forbidden')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 403,
+ statusText: 'Forbidden',
+ contentType: 'text/html',
+ contentLength: 0,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+ await archive.setPage({
+ url: parseUrl('https://example.com/server-error')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 500,
+ statusText: 'Internal Server Error',
+ contentType: 'text/html',
+ contentLength: 0,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+ // No `setPage` call for https://example.com/never-fetched: the anchor
+ // on page-a above already caused the crawler to insert a discovery
+ // placeholder row for it (scraped=0, status=NULL) — matching
+ // list-links.ts's scope note that such rows must never satisfy
+ // `status = 404`.
+
+ await buildViewerReadModel(archive);
+ });
+
+ afterAll(async () => {
+ if (archive) {
+ await archive.releaseHandle();
+ }
+ const { rmSync } = await import('node:fs');
+ rmSync(workingDir, { recursive: true, force: true });
+ });
+
+ it('returns only 404 destinations, excluding 403/5xx/never-fetched', async () => {
+ const result = await listViewerBrokenLinks(archive);
+ expect(result.items.map((item) => item.destUrl).toSorted()).toEqual([
+ 'https://example.com/broken-a',
+ 'https://example.com/broken-b',
+ ]);
+ expect(result.total).toBe(2);
+ });
+
+ it('reports source, dest, and status but always null textContent (not stored in the fast path)', async () => {
+ const result = await listViewerBrokenLinks(archive, { sortBy: 'destUrl' });
+ expect(result.items[0]).toMatchObject({
+ sourceUrl: 'https://example.com/page-a',
+ destUrl: 'https://example.com/broken-a',
+ status: 404,
+ isExternal: false,
+ textContent: null,
+ });
+ });
+
+ it('filters by status (broken links are always 404, so a non-404 filter matches nothing)', async () => {
+ const matching = await listViewerBrokenLinks(archive, { status: 404 });
+ expect(matching.total).toBe(2);
+ const nonMatching = await listViewerBrokenLinks(archive, { status: 500 });
+ expect(nonMatching.total).toBe(0);
+ });
+
+ it('sorts by destUrl ascending', async () => {
+ const result = await listViewerBrokenLinks(archive, {
+ sortBy: 'destUrl',
+ sortOrder: 'asc',
+ });
+ expect(result.items.map((item) => item.destUrl)).toEqual([
+ 'https://example.com/broken-a',
+ 'https://example.com/broken-b',
+ ]);
+ });
+
+ it('status ties (every broken link is 404) still paginate without duplicates or gaps, in both directions', async () => {
+ // Every row here has the exact same status_sort_key/status_desc_key —
+ // this is what the source_url_sort_key tie-breaker in the keyset
+ // tuple exists to disambiguate.
+ const [pageAsc0, pageAsc1] = await Promise.all([
+ listViewerBrokenLinks(archive, {
+ sortBy: 'status',
+ sortOrder: 'asc',
+ limit: 1,
+ offset: 0,
+ }),
+ listViewerBrokenLinks(archive, {
+ sortBy: 'status',
+ sortOrder: 'asc',
+ limit: 1,
+ offset: 1,
+ }),
+ ]);
+ expect([pageAsc0.items[0]!.destUrl, pageAsc1.items[0]!.destUrl].toSorted()).toEqual([
+ 'https://example.com/broken-a',
+ 'https://example.com/broken-b',
+ ]);
+
+ const [pageDesc0, pageDesc1] = await Promise.all([
+ listViewerBrokenLinks(archive, {
+ sortBy: 'status',
+ sortOrder: 'desc',
+ limit: 1,
+ offset: 0,
+ }),
+ listViewerBrokenLinks(archive, {
+ sortBy: 'status',
+ sortOrder: 'desc',
+ limit: 1,
+ offset: 1,
+ }),
+ ]);
+ expect([pageDesc0.items[0]!.destUrl, pageDesc1.items[0]!.destUrl].toSorted()).toEqual(
+ ['https://example.com/broken-a', 'https://example.com/broken-b'],
+ );
+ });
+
+ it('paginates forward via nextCursor with no duplicates or gaps', async () => {
+ const page1 = await listViewerBrokenLinks(archive, { sortBy: 'destUrl', limit: 1 });
+ expect(page1.items).toHaveLength(1);
+ expect(page1.nextCursor).not.toBeNull();
+ expect(page1.prevCursor).toBeNull();
+
+ const page2 = await listViewerBrokenLinks(archive, {
+ sortBy: 'destUrl',
+ limit: 1,
+ cursor: page1.nextCursor!,
+ });
+ expect(page2.items).toHaveLength(1);
+ expect(page2.nextCursor).toBeNull();
+ expect(page2.prevCursor).not.toBeNull();
+
+ expect([...page1.items, ...page2.items].map((item) => item.destUrl)).toEqual([
+ 'https://example.com/broken-a',
+ 'https://example.com/broken-b',
+ ]);
+ });
+
+ it('walks backward from a forward cursor via direction: "prev" and restores the same page', async () => {
+ const page1 = await listViewerBrokenLinks(archive, { sortBy: 'destUrl', limit: 1 });
+ const page2 = await listViewerBrokenLinks(archive, {
+ sortBy: 'destUrl',
+ limit: 1,
+ cursor: page1.nextCursor!,
+ });
+ const back = await listViewerBrokenLinks(archive, {
+ sortBy: 'destUrl',
+ limit: 1,
+ cursor: page2.prevCursor!,
+ direction: 'prev',
+ });
+ expect(back.items).toEqual(page1.items);
+ });
+
+ it('supports a direct offset read for MPA page-number jumps', async () => {
+ const result = await listViewerBrokenLinks(archive, {
+ sortBy: 'destUrl',
+ limit: 1,
+ offset: 1,
+ });
+ expect(result.items).toHaveLength(1);
+ expect(result.items[0]!.destUrl).toBe('https://example.com/broken-b');
+ });
+
+ it('throws on a cursor minted under a different sort/filter combination', async () => {
+ const page1 = await listViewerBrokenLinks(archive, { sortBy: 'destUrl', limit: 1 });
+ await expect(
+ listViewerBrokenLinks(archive, {
+ sortBy: 'sourceUrl',
+ limit: 1,
+ cursor: page1.nextCursor!,
+ }),
+ ).rejects.toThrow(/does not match/);
+ });
+});
+
+/**
+ * Mirrors `list-links.spec.ts`'s redirect-resolution coverage: a broken
+ * anchor reached both directly and via an internal redirect source must
+ * collapse into separate edge rows (one per distinct referring page) that
+ * both report the canonical (post-redirect) destination and status.
+ */
+describe('listViewerBrokenLinks — redirect resolution', () => {
+ const redirectWorkingDir = path.resolve(
+ __dirname,
+ '__test_fixtures_list_viewer_broken_links_redirect__',
+ );
+ let archive: InstanceType;
+ const archiveFilePath = path.resolve(
+ redirectWorkingDir,
+ 'list-viewer-broken-links-redirect-test.nitpicker',
+ );
+
+ beforeAll(async () => {
+ const { mkdirSync } = await import('node:fs');
+ mkdirSync(redirectWorkingDir, { recursive: true });
+ archive = await Archive.create({
+ filePath: archiveFilePath,
+ cwd: redirectWorkingDir,
+ });
+ await archive.setConfig({
+ baseUrl: 'https://example.com',
+ name: 'test',
+ version: '0.10.0',
+ recursive: true,
+ interval: 0,
+ image: true,
+ fetchExternal: false,
+ parallels: 1,
+ roots: ['https://example.com'],
+ excludes: [],
+ excludeKeywords: [],
+ excludeUrls: [],
+ maxExcludedDepth: 0,
+ retry: 3,
+ fromList: false,
+ disableQueries: false,
+ userAgent: 'test',
+ ignoreRobots: false,
+ });
+
+ await archive.setPage({
+ url: parseUrl('https://example.com/direct')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: { ...META, title: 'Direct' },
+ anchorList: [
+ {
+ href: parseUrl('https://example.com/canonical-target')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Direct link',
+ },
+ ],
+ imageList: [],
+ isSkipped: false,
+ });
+ await archive.setPage({
+ url: parseUrl('https://example.com/via-redirect')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: { ...META, title: 'Via redirect' },
+ anchorList: [
+ {
+ href: parseUrl('https://example.com/old')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Old link',
+ hash: null,
+ },
+ ],
+ imageList: [],
+ isSkipped: false,
+ });
+ await archive.setPage({
+ url: parseUrl('https://example.com/canonical-target')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 404,
+ statusText: 'Not Found',
+ contentType: 'text/html',
+ contentLength: 0,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+ await archive.setRedirect({
+ url: parseUrl('https://example.com/old')!,
+ redirectPaths: ['https://example.com/canonical-target'],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+
+ await buildViewerReadModel(archive);
+ });
+
+ afterAll(async () => {
+ if (archive) {
+ await archive.releaseHandle();
+ }
+ const { rmSync } = await import('node:fs');
+ rmSync(redirectWorkingDir, { recursive: true, force: true });
+ });
+
+ it('reports the canonical destination for both the direct and redirect-source-routed anchors', async () => {
+ const result = await listViewerBrokenLinks(archive, { sortBy: 'sourceUrl' });
+ expect(result.items).toHaveLength(2);
+ for (const item of result.items) {
+ expect(item).toMatchObject({
+ destUrl: 'https://example.com/canonical-target',
+ status: 404,
+ });
+ }
+ expect(result.items.map((item) => item.sourceUrl).toSorted()).toEqual([
+ 'https://example.com/direct',
+ 'https://example.com/via-redirect',
+ ]);
+ });
+});
+
+/**
+ * A broken link and an external link are independent judgments on the same
+ * `viewer_anchor_facts` row (`is_broken`/`is_external_link` are separate
+ * flags) — a destination can be both. Isolated into its own archive so it
+ * doesn't perturb the main describe block's exact item/pagination counts.
+ */
+describe('listViewerBrokenLinks — a destination that is both broken and external', () => {
+ const brokenExternalWorkingDir = path.resolve(
+ __dirname,
+ '__test_fixtures_list_viewer_broken_links_broken_external__',
+ );
+ let archive: InstanceType;
+ const archiveFilePath = path.resolve(
+ brokenExternalWorkingDir,
+ 'list-viewer-broken-links-broken-external-test.nitpicker',
+ );
+
+ beforeAll(async () => {
+ const { mkdirSync } = await import('node:fs');
+ mkdirSync(brokenExternalWorkingDir, { recursive: true });
+ archive = await Archive.create({
+ filePath: archiveFilePath,
+ cwd: brokenExternalWorkingDir,
+ });
+ await archive.setConfig({
+ baseUrl: 'https://example.com',
+ name: 'test',
+ version: '0.10.0',
+ recursive: true,
+ interval: 0,
+ image: true,
+ fetchExternal: false,
+ parallels: 1,
+ roots: ['https://example.com'],
+ excludes: [],
+ excludeKeywords: [],
+ excludeUrls: [],
+ maxExcludedDepth: 0,
+ retry: 3,
+ fromList: false,
+ disableQueries: false,
+ userAgent: 'test',
+ ignoreRobots: false,
+ });
+
+ await archive.setPage({
+ url: parseUrl('https://example.com/page-a')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: { ...META, title: 'Page A' },
+ anchorList: [
+ {
+ href: parseUrl('https://external.example.com/broken-ext')!,
+ isExternal: true,
+ title: null,
+ textContent: 'Broken external',
+ },
+ ],
+ imageList: [],
+ isSkipped: false,
+ });
+ await archive.setPage({
+ url: parseUrl('https://external.example.com/broken-ext')!,
+ redirectPaths: [],
+ isExternal: true,
+ isTarget: false,
+ status: 404,
+ statusText: 'Not Found',
+ contentType: 'text/html',
+ contentLength: 0,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+
+ await buildViewerReadModel(archive);
+ });
+
+ afterAll(async () => {
+ if (archive) {
+ await archive.releaseHandle();
+ }
+ const { rmSync } = await import('node:fs');
+ rmSync(brokenExternalWorkingDir, { recursive: true, force: true });
+ });
+
+ it('reports isExternal: true for a broken destination that is also external', async () => {
+ const result = await listViewerBrokenLinks(archive);
+ expect(result.items).toEqual([
+ expect.objectContaining({
+ sourceUrl: 'https://example.com/page-a',
+ destUrl: 'https://external.example.com/broken-ext',
+ status: 404,
+ isExternal: true,
+ }),
+ ]);
+ });
+});
diff --git a/packages/@nitpicker/query/src/list-viewer-broken-links.ts b/packages/@nitpicker/query/src/list-viewer-broken-links.ts
new file mode 100644
index 00000000..89d40e6c
--- /dev/null
+++ b/packages/@nitpicker/query/src/list-viewer-broken-links.ts
@@ -0,0 +1,307 @@
+import type {
+ CursorPaginatedLinkList,
+ LinkEntry,
+ ListViewerBrokenLinksOptions,
+} from './types.js';
+import type {
+ AnchorFactsKeysetRow,
+ AnchorFactsSortSpec,
+} from './viewer-anchor-facts-cursor/types.js';
+import type { ArchiveAccessor } from '@nitpicker/crawler';
+import type { Knex } from 'knex';
+
+import { buildAnchorFactsFilterKey } from './viewer-anchor-facts-cursor/build-anchor-facts-filter-key.js';
+import { decodeAnchorFactsCursor } from './viewer-anchor-facts-cursor/decode-anchor-facts-cursor.js';
+import { encodeAnchorFactsCursor } from './viewer-anchor-facts-cursor/encode-anchor-facts-cursor.js';
+import { extractAnchorFactsSortValues } from './viewer-anchor-facts-cursor/extract-anchor-facts-sort-values.js';
+import { getAnchorFactsSortSpec } from './viewer-anchor-facts-cursor/get-anchor-facts-sort-spec.js';
+import { VIEWER_READ_MODEL_SCHEMA_VERSION } from './viewer-read-model/viewer-read-model-schema-version.js';
+
+/**
+ * Adds a keyset comparison tuple as a `WHERE` predicate — `(col1, col2, …)
+ * {>|<} (?, ?, …)` — using SQLite's row-value comparison. Column names come
+ * from the fixed {@link AnchorFactsSortSpec} column set, never from request
+ * input, so interpolating them into the SQL text (rather than parameter
+ * binding, which only covers values) carries no injection risk. Mirrors
+ * `list-viewer-pages.ts`'s identical helper — not shared as a common module
+ * since the two existing keyset-cursor implementations in this package have
+ * never been generalised into one, matching `list-directory-pages.ts`'s
+ * independent, table-specific cursor scheme.
+ * @param qb - The query builder to constrain.
+ * @param columns - The keyset tuple columns, in comparison order.
+ * @param operator - `'>'` for a forward (ascending-tuple) seek, `'<'` for a
+ * backward one.
+ * @param values - The boundary row's tuple values, in `columns` order.
+ */
+function applyKeysetPredicate(
+ qb: Knex.QueryBuilder,
+ columns: readonly string[],
+ operator: '>' | '<',
+ values: readonly (string | number)[],
+): void {
+ const columnList = columns.join(', ');
+ const placeholders = columns.map(() => '?').join(', ');
+ qb.whereRaw(`(${columnList}) ${operator} (${placeholders})`, [...values]);
+}
+
+/**
+ * Applies the (currently sole) filter — `status` — on top of the fixed
+ * `is_broken = 1` predicate every read shares.
+ * @param qb - The query builder to constrain.
+ * @param options - The caller's filter options.
+ */
+function applyBrokenLinksFilters(
+ qb: Knex.QueryBuilder,
+ options: ListViewerBrokenLinksOptions,
+): void {
+ qb.where('is_broken', 1);
+ if (options.status != null) {
+ qb.where('status', options.status);
+ }
+}
+
+/**
+ * Counts the total `is_broken = 1` rows matching the caller's filters.
+ * @param knex - The archive's Knex instance.
+ * @param options - The caller's filter options.
+ * @returns The total matching row count.
+ */
+async function countAnchorFactsTotal(
+ knex: Knex,
+ options: ListViewerBrokenLinksOptions,
+): Promise {
+ const qb = knex('viewer_anchor_facts');
+ applyBrokenLinksFilters(qb, options);
+ const result = await qb.count<{ count: string }[]>({ count: '*' });
+ return Number(result[0]?.count ?? 0);
+}
+
+/**
+ * Runs one `viewer_anchor_facts` read: applies filters, an optional keyset
+ * predicate, an `ORDER BY` in `orderDirection`, and `limit + 1` rows (the
+ * `+1` lets the caller detect "is there another row past this page"
+ * without a second query). Unlike `list-viewer-pages.ts`'s equivalent, no
+ * id-then-join step follows: `source_url_sort_key`/`dest_url_sort_key`/
+ * `status` are already the exact display values, so this window read IS
+ * the final row set.
+ * @param knex - The archive's Knex instance.
+ * @param options - The caller's filter options.
+ * @param spec - The resolved sort spec (columns to select/order by).
+ * @param orderDirection - The physical scan direction for this read.
+ * @param limit - The page size (the read fetches `limit + 1` rows).
+ * @param keyset - The keyset predicate to apply, or `undefined` for an
+ * unconstrained (initial / offset) read.
+ * @param keyset.operator - `'>'` or `'<'`, per {@link applyKeysetPredicate}.
+ * @param keyset.values - The boundary row's tuple values.
+ * @param offset - Row offset for a direct `OFFSET` read (page-number jumps).
+ * Ignored when `keyset` is supplied.
+ * @returns Up to `limit + 1` rows.
+ */
+async function readAnchorFactsWindow(
+ knex: Knex,
+ options: ListViewerBrokenLinksOptions,
+ spec: AnchorFactsSortSpec,
+ orderDirection: 'asc' | 'desc',
+ limit: number,
+ keyset: { operator: '>' | '<'; values: readonly (string | number)[] } | undefined,
+ offset: number,
+): Promise<
+ (AnchorFactsKeysetRow & {
+ source_url_sort_key: string;
+ dest_url_sort_key: string;
+ status: number | null;
+ is_external_link: number;
+ })[]
+> {
+ const qb = knex('viewer_anchor_facts');
+ applyBrokenLinksFilters(qb, options);
+ if (keyset) {
+ applyKeysetPredicate(qb, spec.columns, keyset.operator, keyset.values);
+ }
+ const selectColumns = [
+ ...new Set([
+ 'edge_id',
+ 'source_url_sort_key',
+ 'dest_url_sort_key',
+ 'status',
+ 'status_sort_key',
+ 'status_desc_key',
+ 'is_external_link',
+ ...spec.columns,
+ ]),
+ ];
+ let query = qb
+ .select(selectColumns)
+ .orderBy(spec.columns.map((column) => ({ column, order: orderDirection })))
+ .limit(limit + 1);
+ if (!keyset && offset > 0) {
+ query = query.offset(offset);
+ }
+ return query;
+}
+
+/**
+ * Maps one raw window row to the public {@link LinkEntry} shape.
+ * `textContent` is always `null`: `viewer_anchor_facts` doesn't store per-
+ * anchor text (broken-links-view.tsx never renders it, and storing it would
+ * duplicate potentially large strings across every edge row — see
+ * ARCHITECTURE.md「設計注意(viewer_anchor_facts read model、issue
+ * #114)」). `isExternal` reflects the edge's `is_external_link` flag —
+ * broken and external are independent judgments, so a broken link CAN also
+ * be external.
+ * @param row - One row from {@link readAnchorFactsWindow}.
+ * @param row.source_url_sort_key
+ * @param row.dest_url_sort_key
+ * @param row.status
+ * @param row.is_external_link
+ * @returns The corresponding {@link LinkEntry}.
+ */
+function toLinkEntry(row: {
+ source_url_sort_key: string;
+ dest_url_sort_key: string;
+ status: number | null;
+ is_external_link?: number;
+}): LinkEntry {
+ return {
+ sourceUrl: row.source_url_sort_key,
+ destUrl: row.dest_url_sort_key,
+ status: row.status,
+ isExternal: !!row.is_external_link,
+ textContent: null,
+ };
+}
+
+/**
+ * Lists broken links from `viewer_anchor_facts` — the read-model-backed,
+ * cursor-paginated counterpart of `listLinks(accessor, { type: 'broken' })`
+ * that powers `/api/links?type=broken`'s fast path.
+ *
+ * Filter/sort resolution runs entirely against `viewer_anchor_facts`; there
+ * is no id-then-join step (unlike `listViewerPages`) because
+ * `source_url_sort_key`/`dest_url_sort_key`/`status` are already the exact
+ * display values — see that table's `create-viewer-read-model-tables.ts`
+ * docs for why this doesn't reintroduce the URL-duplication cost issue
+ * #114 warns about at 13M-edge scale (negligible at this package's actual
+ * benchmark scale; see ARCHITECTURE.md).
+ *
+ * The initial read (no `cursor`), the forward keyset read, the backward
+ * keyset read, and the direct-`offset` read are four separate code paths —
+ * no `(:cursor IS NULL OR …)`-style nullable predicate ties them together,
+ * mirroring `listViewerPages`.
+ * @param accessor - The archive accessor to query. Callers are responsible
+ * for confirming the read model is built and current (see
+ * `isViewerReadModelCurrent`) AND that `urlPattern` is not set (see
+ * `ListViewerBrokenLinksOptions`'s docs) before calling this.
+ * @param options - Filter, sort, and pagination options.
+ * @returns A cursor-paginated list of broken-link entries.
+ * @throws {Error} If `options.cursor` is malformed, stale, or was minted
+ * under a different filter/sort combination.
+ * @example
+ * // Virtual-scroll continuation — the caller only ever inspects nextCursor:
+ * const page1 = await listViewerBrokenLinks(accessor, { limit: 100 });
+ * const page2 = page1.nextCursor
+ * ? await listViewerBrokenLinks(accessor, { limit: 100, cursor: page1.nextCursor })
+ * : null;
+ */
+export async function listViewerBrokenLinks(
+ accessor: ArchiveAccessor,
+ options: ListViewerBrokenLinksOptions = {},
+): Promise {
+ const knex = accessor.getKnex();
+ const limit = options.limit ?? 100;
+ const sortBy = options.sortBy ?? 'sourceUrl';
+ const sortOrder = options.sortOrder ?? 'asc';
+ const spec = getAnchorFactsSortSpec(sortBy, sortOrder);
+ const filterKey = buildAnchorFactsFilterKey(options);
+
+ const total = await countAnchorFactsTotal(knex, options);
+
+ /**
+ * Builds the final result from a `limit`-or-fewer window, already in
+ * final display order.
+ * @param window - The trimmed row window.
+ * @param hasMoreAfter - Whether a subsequent page exists.
+ * @param hasMoreBefore - Whether a preceding page exists.
+ * @returns The full paginated result.
+ */
+ function buildResult(
+ window: Awaited>,
+ hasMoreAfter: boolean,
+ hasMoreBefore: boolean,
+ ): CursorPaginatedLinkList {
+ const items = window.map((row) => toLinkEntry(row));
+ const lastRow = window.at(-1);
+ const firstRow = window[0];
+ const nextCursor =
+ hasMoreAfter && lastRow
+ ? encodeAnchorFactsCursor({
+ v: VIEWER_READ_MODEL_SCHEMA_VERSION,
+ filterKey,
+ sortBy,
+ sortOrder,
+ values: extractAnchorFactsSortValues(spec, lastRow),
+ })
+ : null;
+ const prevCursor =
+ hasMoreBefore && firstRow
+ ? encodeAnchorFactsCursor({
+ v: VIEWER_READ_MODEL_SCHEMA_VERSION,
+ filterKey,
+ sortBy,
+ sortOrder,
+ values: extractAnchorFactsSortValues(spec, firstRow),
+ })
+ : null;
+ return { items, total, nextCursor, prevCursor };
+ }
+
+ if (options.cursor) {
+ const decoded = decodeAnchorFactsCursor(options.cursor, {
+ filterKey,
+ sortBy,
+ sortOrder,
+ columns: spec.columns,
+ });
+ if (options.direction === 'prev') {
+ const oppositeDirection = spec.scanDirection === 'asc' ? 'desc' : 'asc';
+ const fetched = await readAnchorFactsWindow(
+ knex,
+ options,
+ spec,
+ oppositeDirection,
+ limit,
+ { operator: spec.scanDirection === 'asc' ? '<' : '>', values: decoded.values },
+ 0,
+ );
+ const hasMoreBefore = fetched.length > limit;
+ const window = fetched.slice(0, limit).toReversed();
+ return buildResult(window, true, hasMoreBefore);
+ }
+ const fetched = await readAnchorFactsWindow(
+ knex,
+ options,
+ spec,
+ spec.scanDirection,
+ limit,
+ { operator: spec.scanDirection === 'asc' ? '>' : '<', values: decoded.values },
+ 0,
+ );
+ const hasMoreAfter = fetched.length > limit;
+ const window = fetched.slice(0, limit);
+ return buildResult(window, hasMoreAfter, true);
+ }
+
+ const offset = options.offset ?? 0;
+ const fetched = await readAnchorFactsWindow(
+ knex,
+ options,
+ spec,
+ spec.scanDirection,
+ limit,
+ undefined,
+ offset,
+ );
+ const hasMoreAfter = fetched.length > limit;
+ const window = fetched.slice(0, limit);
+ return buildResult(window, hasMoreAfter, offset > 0);
+}
diff --git a/packages/@nitpicker/query/src/list-viewer-external-links.ts b/packages/@nitpicker/query/src/list-viewer-external-links.ts
index 815d6c18..22980c6b 100644
--- a/packages/@nitpicker/query/src/list-viewer-external-links.ts
+++ b/packages/@nitpicker/query/src/list-viewer-external-links.ts
@@ -9,7 +9,7 @@ import { paginateQuery } from './paginate-query.js';
* model — the fast-path counterpart of {@link listExternalLinks}, backed by
* a table pre-aggregated once at read-model build time instead of a live
* `anchors` JOIN + `GROUP BY` per request (see
- * ARCHITECTURE.md「設計注意(外部リンク read model)」for why the live
+ * ARCHITECTURE.md「設計注意(viewer_anchor_facts read model、issue #114)」for why the live
* version's `GROUP BY` + `COUNT(DISTINCT ...)` combination is a known
* SQLite performance pitfall).
*
diff --git a/packages/@nitpicker/query/src/query.ts b/packages/@nitpicker/query/src/query.ts
index 75cc61d2..59b84fa5 100644
--- a/packages/@nitpicker/query/src/query.ts
+++ b/packages/@nitpicker/query/src/query.ts
@@ -50,6 +50,7 @@ export { listPagesByJsonLdType } from './list-pages-by-jsonld-type.js';
export { listPagesByTag } from './list-pages-by-tag.js';
export { listResources } from './list-resources.js';
export { listUnusedResources } from './list-unused-resources.js';
+export { listViewerBrokenLinks } from './list-viewer-broken-links.js';
export { listViewerExternalLinks } from './list-viewer-external-links.js';
export { listViewerPages } from './list-viewer-pages.js';
export { prepareUrlSortTempTable } from './url-sort-temp-table.js';
diff --git a/packages/@nitpicker/query/src/types.ts b/packages/@nitpicker/query/src/types.ts
index a0089e95..9ed7b38b 100644
--- a/packages/@nitpicker/query/src/types.ts
+++ b/packages/@nitpicker/query/src/types.ts
@@ -1129,6 +1129,63 @@ export interface LinkAnalysisResult {
total: number;
}
+/**
+ * Filter/sort/pagination options for {@link listViewerBrokenLinks} — the
+ * `viewer_anchor_facts` read-model fast path for broken-link listing.
+ *
+ * `urlPattern` and `includeRedirectSources` are deliberately absent:
+ * `urlPattern` matches source OR destination across two columns
+ * (`ListLinksOptions`'s semantics), which no single index on
+ * `viewer_anchor_facts` can satisfy, so callers with a `urlPattern` set
+ * must use `listLinks` instead (see `register-links-route.ts`).
+ * `includeRedirectSources` has no equivalent here: `viewer_anchor_facts`
+ * only ever stores the canonical (redirect-resolved) destination.
+ */
+export interface ListViewerBrokenLinksOptions {
+ /** Filter by destination HTTP status. Broken links are always `404`, so this is effectively a no-op unless set to a non-`404` value (which then matches nothing). */
+ status?: number;
+ /** Field to sort results by. Defaults to `'sourceUrl'`. */
+ sortBy?: 'sourceUrl' | 'destUrl' | 'status';
+ /** Sort direction. Defaults to `'asc'`. */
+ sortOrder?: SortOrder;
+ /** Maximum number of results to return. Defaults to 100. */
+ limit?: number;
+ /**
+ * Opaque keyset cursor from a previous {@link CursorPaginatedLinkList}'s
+ * `nextCursor`/`prevCursor`. Mutually exclusive with `offset` — when both
+ * are supplied, `cursor` wins. Omit for the first page.
+ */
+ cursor?: string;
+ /**
+ * Direction to walk from `cursor`: `'next'` (forward, default) or
+ * `'prev'` (backward). Ignored when `cursor` is omitted.
+ */
+ direction?: 'next' | 'prev';
+ /**
+ * Row offset for page-number jumps (MPA pagination). Mutually exclusive
+ * with `cursor`.
+ */
+ offset?: number;
+}
+
+/**
+ * Paginated result wrapper for {@link listViewerBrokenLinks} —
+ * {@link LinkAnalysisResult} plus keyset cursors for virtual-scroll
+ * continuation.
+ */
+export interface CursorPaginatedLinkList extends LinkAnalysisResult {
+ /**
+ * Opaque cursor to fetch the next page in the current sort order, or
+ * `null` when this is the last page.
+ */
+ nextCursor: string | null;
+ /**
+ * Opaque cursor to fetch the previous page in the current sort order, or
+ * `null` when this is already the first page.
+ */
+ prevCursor: string | null;
+}
+
/**
* Filter/sort/pagination options for {@link listExternalLinks}.
*
diff --git a/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/build-anchor-facts-filter-key.spec.ts b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/build-anchor-facts-filter-key.spec.ts
new file mode 100644
index 00000000..4675423b
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/build-anchor-facts-filter-key.spec.ts
@@ -0,0 +1,23 @@
+import { describe, expect, it } from 'vitest';
+
+import { buildAnchorFactsFilterKey } from './build-anchor-facts-filter-key.js';
+
+describe('buildAnchorFactsFilterKey', () => {
+ it('produces the same key for an empty options object and an explicit status: undefined', () => {
+ expect(buildAnchorFactsFilterKey({})).toBe(
+ buildAnchorFactsFilterKey({ status: undefined }),
+ );
+ });
+
+ it('produces a different key for different status values', () => {
+ expect(buildAnchorFactsFilterKey({ status: 404 })).not.toBe(
+ buildAnchorFactsFilterKey({ status: 500 }),
+ );
+ });
+
+ it('produces a different key when status is set vs unset', () => {
+ expect(buildAnchorFactsFilterKey({})).not.toBe(
+ buildAnchorFactsFilterKey({ status: 404 }),
+ );
+ });
+});
diff --git a/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/build-anchor-facts-filter-key.ts b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/build-anchor-facts-filter-key.ts
new file mode 100644
index 00000000..50b435dd
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/build-anchor-facts-filter-key.ts
@@ -0,0 +1,16 @@
+import type { AnchorFactsCursorFilterKeyInput } from './types.js';
+
+/**
+ * Builds the normalized `filterKey` embedded in a cursor. Two calls with the
+ * same effective filters (regardless of `undefined` vs omitted key order)
+ * always produce the same string.
+ * @param filters - The filter-affecting subset of the caller's options.
+ * @returns A canonical JSON string uniquely identifying the filter set.
+ */
+export function buildAnchorFactsFilterKey(
+ filters: AnchorFactsCursorFilterKeyInput,
+): string {
+ return JSON.stringify({
+ status: filters.status ?? null,
+ });
+}
diff --git a/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/decode-anchor-facts-cursor.spec.ts b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/decode-anchor-facts-cursor.spec.ts
new file mode 100644
index 00000000..67ac7cc0
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/decode-anchor-facts-cursor.spec.ts
@@ -0,0 +1,101 @@
+import { describe, expect, it } from 'vitest';
+
+import { VIEWER_READ_MODEL_SCHEMA_VERSION } from '../viewer-read-model/viewer-read-model-schema-version.js';
+
+import { decodeAnchorFactsCursor } from './decode-anchor-facts-cursor.js';
+import { encodeAnchorFactsCursor } from './encode-anchor-facts-cursor.js';
+
+const PAYLOAD_BASE = {
+ filterKey: '{"status":null}',
+ sortBy: 'sourceUrl' as const,
+ sortOrder: 'asc' as const,
+};
+
+const EXPECTED = {
+ ...PAYLOAD_BASE,
+ columns: ['source_url_sort_key', 'edge_id'] as const,
+};
+
+describe('decodeAnchorFactsCursor', () => {
+ it('decodes a cursor that matches the expected filter/sort', () => {
+ const cursor = encodeAnchorFactsCursor({
+ v: VIEWER_READ_MODEL_SCHEMA_VERSION,
+ ...PAYLOAD_BASE,
+ values: ['https://example.com/a', 1],
+ });
+ expect(decodeAnchorFactsCursor(cursor, EXPECTED)).toEqual({
+ v: VIEWER_READ_MODEL_SCHEMA_VERSION,
+ ...PAYLOAD_BASE,
+ values: ['https://example.com/a', 1],
+ });
+ });
+
+ it('throws on an undecodable string', () => {
+ expect(() => decodeAnchorFactsCursor('%%%not-base64%%%', EXPECTED)).toThrow(
+ /not decodable/,
+ );
+ });
+
+ it('throws on a decodable but malformed payload', () => {
+ const cursor = Buffer.from(JSON.stringify({ foo: 'bar' }), 'utf8').toString(
+ 'base64url',
+ );
+ expect(() => decodeAnchorFactsCursor(cursor, EXPECTED)).toThrow(/malformed/);
+ });
+
+ it('throws on a cursor minted under a stale schema version', () => {
+ const cursor = encodeAnchorFactsCursor({
+ v: VIEWER_READ_MODEL_SCHEMA_VERSION - 1,
+ ...PAYLOAD_BASE,
+ values: ['https://example.com/a', 1],
+ });
+ expect(() => decodeAnchorFactsCursor(cursor, EXPECTED)).toThrow(/[Ss]tale/);
+ });
+
+ it('throws on a cursor minted under a different filter', () => {
+ const cursor = encodeAnchorFactsCursor({
+ v: VIEWER_READ_MODEL_SCHEMA_VERSION,
+ ...PAYLOAD_BASE,
+ filterKey: '{"status":404}',
+ values: ['https://example.com/a', 1],
+ });
+ expect(() => decodeAnchorFactsCursor(cursor, EXPECTED)).toThrow(/does not match/);
+ });
+
+ it('throws on a cursor minted under a different sort', () => {
+ const cursor = encodeAnchorFactsCursor({
+ v: VIEWER_READ_MODEL_SCHEMA_VERSION,
+ ...PAYLOAD_BASE,
+ sortBy: 'status',
+ values: [404, 'https://example.com/a', 1],
+ });
+ expect(() => decodeAnchorFactsCursor(cursor, EXPECTED)).toThrow(/does not match/);
+ });
+
+ it('throws on a values array whose length does not match the expected column count', () => {
+ const cursor = encodeAnchorFactsCursor({
+ v: VIEWER_READ_MODEL_SCHEMA_VERSION,
+ ...PAYLOAD_BASE,
+ values: ['https://example.com/a'],
+ });
+ expect(() => decodeAnchorFactsCursor(cursor, EXPECTED)).toThrow(/keyset value count/);
+ });
+
+ it('throws on a numeric-column position holding a string value', () => {
+ const cursor = encodeAnchorFactsCursor({
+ v: VIEWER_READ_MODEL_SCHEMA_VERSION,
+ ...PAYLOAD_BASE,
+ values: ['https://example.com/a', 'not-a-number'],
+ });
+ expect(() => decodeAnchorFactsCursor(cursor, EXPECTED)).toThrow(/must be a number/);
+ });
+
+ it('throws on a text-column position holding a numeric value', () => {
+ const cursor = encodeAnchorFactsCursor({
+ v: VIEWER_READ_MODEL_SCHEMA_VERSION,
+ ...PAYLOAD_BASE,
+ values: [123, 1],
+ });
+ expect(() => decodeAnchorFactsCursor(cursor, EXPECTED)).toThrow(/must be a string/);
+ });
+});
diff --git a/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/decode-anchor-facts-cursor.ts b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/decode-anchor-facts-cursor.ts
new file mode 100644
index 00000000..9735f995
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/decode-anchor-facts-cursor.ts
@@ -0,0 +1,89 @@
+import type { AnchorFactsCursorPayload, AnchorFactsSortColumn } from './types.js';
+
+import { VIEWER_READ_MODEL_SCHEMA_VERSION } from '../viewer-read-model/viewer-read-model-schema-version.js';
+
+import { isNumericAnchorFactsSortColumn } from './types.js';
+
+/**
+ * The current request's identity to validate a decoded cursor against.
+ */
+export interface ExpectedAnchorFactsCursor {
+ /** See `buildAnchorFactsFilterKey`. */
+ filterKey: string;
+ /** The current request's sort field. */
+ sortBy: 'sourceUrl' | 'destUrl' | 'status';
+ /** The current request's sort direction. */
+ sortOrder: 'asc' | 'desc';
+ /**
+ * `getAnchorFactsSortSpec(sortBy, sortOrder).columns` — both the exact
+ * tuple length `payload.values` must carry, and the per-position type
+ * (`isNumericAnchorFactsSortColumn`) each value is checked against.
+ * Without the length check, a `values` array of the wrong length would
+ * reach the keyset predicate's positional column/value zip and build a
+ * malformed SQL comparison; without the per-position type check, a
+ * same-length but wrong-typed `values` array (e.g. a string standing in
+ * for `edge_id`) would silently seek to the wrong keyset boundary via
+ * SQLite's type-affinity comparison rules instead of erroring.
+ */
+ columns: readonly AnchorFactsSortColumn[];
+}
+
+/**
+ * Decodes and validates an opaque cursor against the caller's current
+ * filters/sort. Rejects cursors minted under a different schema version or a
+ * different effective filter/sort combination — replaying a cursor across a
+ * changed query would silently seek to a nonsensical position.
+ * @param cursor - The opaque cursor string from the request.
+ * @param expected - The current request's filter key + sort, to validate against.
+ * @returns The decoded, validated payload.
+ * @throws {Error} If the cursor is malformed, stale, or was minted under a
+ * different filter/sort combination.
+ */
+export function decodeAnchorFactsCursor(
+ cursor: string,
+ expected: ExpectedAnchorFactsCursor,
+): AnchorFactsCursorPayload {
+ let payload: AnchorFactsCursorPayload;
+ try {
+ payload = JSON.parse(Buffer.from(cursor, 'base64url').toString('utf8'));
+ } catch {
+ throw new Error('Invalid /api/links?type=broken cursor: not decodable');
+ }
+ if (
+ typeof payload !== 'object' ||
+ payload === null ||
+ !Array.isArray(payload.values) ||
+ typeof payload.filterKey !== 'string' ||
+ typeof payload.v !== 'number'
+ ) {
+ throw new Error('Invalid /api/links?type=broken cursor: malformed payload');
+ }
+ if (payload.v !== VIEWER_READ_MODEL_SCHEMA_VERSION) {
+ throw new Error(
+ 'Stale /api/links?type=broken cursor: read-model schema has changed since it was issued',
+ );
+ }
+ if (
+ payload.filterKey !== expected.filterKey ||
+ payload.sortBy !== expected.sortBy ||
+ payload.sortOrder !== expected.sortOrder
+ ) {
+ throw new Error(
+ 'Invalid /api/links?type=broken cursor: does not match the current filter/sort combination',
+ );
+ }
+ if (payload.values.length !== expected.columns.length) {
+ throw new Error(
+ 'Invalid /api/links?type=broken cursor: unexpected keyset value count',
+ );
+ }
+ for (const [i, column] of expected.columns.entries()) {
+ const expectedType = isNumericAnchorFactsSortColumn(column) ? 'number' : 'string';
+ if (typeof payload.values[i] !== expectedType) {
+ throw new TypeError(
+ `Invalid /api/links?type=broken cursor: value at position ${i} must be a ${expectedType}`,
+ );
+ }
+ }
+ return payload;
+}
diff --git a/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/encode-anchor-facts-cursor.spec.ts b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/encode-anchor-facts-cursor.spec.ts
new file mode 100644
index 00000000..786c73e2
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/encode-anchor-facts-cursor.spec.ts
@@ -0,0 +1,19 @@
+import { describe, expect, it } from 'vitest';
+
+import { encodeAnchorFactsCursor } from './encode-anchor-facts-cursor.js';
+
+describe('encodeAnchorFactsCursor', () => {
+ it('round-trips through base64url without loss', () => {
+ const payload = {
+ v: 6,
+ filterKey: '{"status":null}',
+ sortBy: 'sourceUrl' as const,
+ sortOrder: 'asc' as const,
+ values: ['https://example.com/a', 1],
+ };
+ const cursor = encodeAnchorFactsCursor(payload);
+ expect(JSON.parse(Buffer.from(cursor, 'base64url').toString('utf8'))).toEqual(
+ payload,
+ );
+ });
+});
diff --git a/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/encode-anchor-facts-cursor.ts b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/encode-anchor-facts-cursor.ts
new file mode 100644
index 00000000..1c5170d1
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/encode-anchor-facts-cursor.ts
@@ -0,0 +1,10 @@
+import type { AnchorFactsCursorPayload } from './types.js';
+
+/**
+ * Encodes a cursor payload as an opaque, URL-safe string.
+ * @param payload - The cursor payload to encode.
+ * @returns The base64url-encoded cursor.
+ */
+export function encodeAnchorFactsCursor(payload: AnchorFactsCursorPayload): string {
+ return Buffer.from(JSON.stringify(payload), 'utf8').toString('base64url');
+}
diff --git a/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/extract-anchor-facts-sort-values.spec.ts b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/extract-anchor-facts-sort-values.spec.ts
new file mode 100644
index 00000000..dacdfa03
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/extract-anchor-facts-sort-values.spec.ts
@@ -0,0 +1,25 @@
+import type { AnchorFactsKeysetRow } from './types.js';
+
+import { describe, expect, it } from 'vitest';
+
+import { extractAnchorFactsSortValues } from './extract-anchor-facts-sort-values.js';
+
+describe('extractAnchorFactsSortValues', () => {
+ it('extracts values in spec.columns order, ignoring columns not in the spec', () => {
+ const row: AnchorFactsKeysetRow = {
+ source_url_sort_key: 'https://example.com/a',
+ dest_url_sort_key: 'https://example.com/b',
+ status_sort_key: 404,
+ status_desc_key: -404,
+ edge_id: 7,
+ };
+ const values = extractAnchorFactsSortValues(
+ {
+ columns: ['status_sort_key', 'source_url_sort_key', 'edge_id'],
+ scanDirection: 'asc',
+ },
+ row,
+ );
+ expect(values).toEqual([404, 'https://example.com/a', 7]);
+ });
+});
diff --git a/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/extract-anchor-facts-sort-values.ts b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/extract-anchor-facts-sort-values.ts
new file mode 100644
index 00000000..6df7b435
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/extract-anchor-facts-sort-values.ts
@@ -0,0 +1,15 @@
+import type { AnchorFactsKeysetRow, AnchorFactsSortSpec } from './types.js';
+
+/**
+ * Extracts a row's keyset tuple values in `spec.columns` order — the values
+ * bound into a cursor's comparison tuple.
+ * @param spec - The sort spec whose columns to read.
+ * @param row - The source row (must carry every column in `spec.columns`).
+ * @returns The tuple values, in `spec.columns` order.
+ */
+export function extractAnchorFactsSortValues(
+ spec: AnchorFactsSortSpec,
+ row: AnchorFactsKeysetRow,
+): (string | number)[] {
+ return spec.columns.map((column) => row[column]);
+}
diff --git a/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/get-anchor-facts-sort-spec.spec.ts b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/get-anchor-facts-sort-spec.spec.ts
new file mode 100644
index 00000000..e3dfdd30
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/get-anchor-facts-sort-spec.spec.ts
@@ -0,0 +1,47 @@
+import { describe, expect, it } from 'vitest';
+
+import { getAnchorFactsSortSpec } from './get-anchor-facts-sort-spec.js';
+
+describe('getAnchorFactsSortSpec', () => {
+ it('sorts by sourceUrl ascending using source_url_sort_key/edge_id, scanned ascending', () => {
+ expect(getAnchorFactsSortSpec('sourceUrl', 'asc')).toEqual({
+ columns: ['source_url_sort_key', 'edge_id'],
+ scanDirection: 'asc',
+ });
+ });
+
+ it('sorts by sourceUrl descending by flipping the scan direction, no negated key needed', () => {
+ expect(getAnchorFactsSortSpec('sourceUrl', 'desc')).toEqual({
+ columns: ['source_url_sort_key', 'edge_id'],
+ scanDirection: 'desc',
+ });
+ });
+
+ it('sorts by destUrl ascending using dest_url_sort_key/edge_id, scanned ascending', () => {
+ expect(getAnchorFactsSortSpec('destUrl', 'asc')).toEqual({
+ columns: ['dest_url_sort_key', 'edge_id'],
+ scanDirection: 'asc',
+ });
+ });
+
+ it('sorts by destUrl descending by flipping the scan direction', () => {
+ expect(getAnchorFactsSortSpec('destUrl', 'desc')).toEqual({
+ columns: ['dest_url_sort_key', 'edge_id'],
+ scanDirection: 'desc',
+ });
+ });
+
+ it('sorts by status ascending using status_sort_key with a source_url_sort_key tie-breaker, scanned ascending', () => {
+ expect(getAnchorFactsSortSpec('status', 'asc')).toEqual({
+ columns: ['status_sort_key', 'source_url_sort_key', 'edge_id'],
+ scanDirection: 'asc',
+ });
+ });
+
+ it('sorts by status descending using the negated status_desc_key, ALWAYS scanned ascending', () => {
+ expect(getAnchorFactsSortSpec('status', 'desc')).toEqual({
+ columns: ['status_desc_key', 'source_url_sort_key', 'edge_id'],
+ scanDirection: 'asc',
+ });
+ });
+});
diff --git a/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/get-anchor-facts-sort-spec.ts b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/get-anchor-facts-sort-spec.ts
new file mode 100644
index 00000000..7a7f7783
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/get-anchor-facts-sort-spec.ts
@@ -0,0 +1,31 @@
+import type { AnchorFactsSortSpec } from './types.js';
+
+/**
+ * Resolves the keyset sort plan for a `sortBy`/`sortOrder` pair.
+ * @param sortBy - The field to sort by.
+ * @param sortOrder - The sort direction.
+ * @returns The resolved {@link AnchorFactsSortSpec}.
+ */
+export function getAnchorFactsSortSpec(
+ sortBy: 'sourceUrl' | 'destUrl' | 'status',
+ sortOrder: 'asc' | 'desc',
+): AnchorFactsSortSpec {
+ switch (sortBy) {
+ case 'status': {
+ return {
+ columns: [
+ sortOrder === 'desc' ? 'status_desc_key' : 'status_sort_key',
+ 'source_url_sort_key',
+ 'edge_id',
+ ],
+ scanDirection: 'asc',
+ };
+ }
+ case 'destUrl': {
+ return { columns: ['dest_url_sort_key', 'edge_id'], scanDirection: sortOrder };
+ }
+ default: {
+ return { columns: ['source_url_sort_key', 'edge_id'], scanDirection: sortOrder };
+ }
+ }
+}
diff --git a/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/types.ts b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/types.ts
new file mode 100644
index 00000000..86a2f4c4
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-anchor-facts-cursor/types.ts
@@ -0,0 +1,101 @@
+/**
+ * The columns (in tuple order) that make up a given sort's keyset — both the
+ * `ORDER BY` clause and the cursor comparison tuple. Always ends in
+ * `edge_id`, the stable tie-breaker.
+ */
+export type AnchorFactsSortColumn =
+ | 'source_url_sort_key'
+ | 'dest_url_sort_key'
+ | 'status_sort_key'
+ | 'status_desc_key'
+ | 'edge_id';
+
+/**
+ * Resolved sort plan for one `sortBy`/`sortOrder` pair: which
+ * `viewer_anchor_facts` columns form the keyset tuple, and which physical
+ * scan direction (`asc`/`desc`) reads them in display order.
+ *
+ * `status` desc uses `status_desc_key` (`= -status_sort_key`) walked
+ * ascending, so the `source_url_sort_key`/`edge_id` tie-breakers stay
+ * ascending too — ties always display in source-URL order regardless of the
+ * primary sort direction, mirroring `viewer_pages`'s identical
+ * `ViewerPagesSortSpec` rationale (`docs/viewer-sql-query-plan.md`'s "Stable
+ * Ordering" section). `sourceUrl`/`destUrl` don't need this negation trick
+ * (text has no numeric negation, and their tie-breaker — `edge_id` alone —
+ * flips direction together with the primary column, so no per-column
+ * direction mixing occurs).
+ */
+export interface AnchorFactsSortSpec {
+ /** Keyset tuple columns, in comparison/`ORDER BY` order. */
+ readonly columns: readonly AnchorFactsSortColumn[];
+ /** Physical scan direction that yields display order for `columns`. */
+ readonly scanDirection: 'asc' | 'desc';
+}
+
+/** One `viewer_anchor_facts` row's worth of keyset column values, keyed by column name. */
+export type AnchorFactsKeysetRow = Record & {
+ edge_id: number;
+};
+
+/**
+ * The `viewer_anchor_facts` columns whose keyset value is a SQLite INTEGER
+ * (bound as a JS `number`) rather than TEXT (`string`). Used by
+ * `decodeAnchorFactsCursor` to reject a cursor whose `values` array has the
+ * right length but a value of the wrong type at some position (e.g. a
+ * string where `edge_id` belongs) — SQLite's type-affinity comparison rules
+ * would otherwise silently seek to the wrong keyset boundary instead of
+ * erroring.
+ */
+const NUMERIC_ANCHOR_FACTS_SORT_COLUMNS: ReadonlySet = new Set([
+ 'status_sort_key',
+ 'status_desc_key',
+ 'edge_id',
+]);
+
+/**
+ * Whether `column`'s keyset value is a SQLite INTEGER (`number`) rather
+ * than TEXT (`string`).
+ * @param column - The sort-spec column to check.
+ * @returns `true` for `status_sort_key`/`status_desc_key`/`edge_id`, `false` for the URL sort-key columns.
+ */
+export function isNumericAnchorFactsSortColumn(column: AnchorFactsSortColumn): boolean {
+ return NUMERIC_ANCHOR_FACTS_SORT_COLUMNS.has(column);
+}
+
+/**
+ * The subset of `ListViewerBrokenLinksOptions` that affects which rows
+ * match — used to build a cursor's `filterKey` so a cursor minted under one
+ * filter/sort combination can't silently be replayed under another. Unlike
+ * `viewer_pages`, `is_broken` itself is never variable here (this cursor
+ * family only ever backs the broken-link listing), and `urlPattern` is
+ * excluded entirely: it matches source OR destination across two columns
+ * (`list-links.ts`'s semantics), which no single index here can satisfy, so
+ * the caller (`register-links-route.ts`) forces the legacy fallback instead
+ * of ever reaching this cursor machinery with a `urlPattern` set — the same
+ * precedent `register-pages-route.ts` already established for `/api/pages`.
+ */
+export interface AnchorFactsCursorFilterKeyInput {
+ /** See `ListViewerBrokenLinksOptions.status`. */
+ status?: number;
+}
+
+/**
+ * Decoded shape of an opaque `/api/links?type=broken` viewer cursor.
+ */
+export interface AnchorFactsCursorPayload {
+ /**
+ * The read-model schema version the cursor was minted under (see
+ * `VIEWER_READ_MODEL_SCHEMA_VERSION`). A schema bump changes column
+ * meanings (or removes them), so a cursor from a stale schema must never
+ * be replayed.
+ */
+ v: number;
+ /** See `buildAnchorFactsFilterKey`. */
+ filterKey: string;
+ /** The sort field the cursor was minted under. */
+ sortBy: 'sourceUrl' | 'destUrl' | 'status';
+ /** The sort direction the cursor was minted under. */
+ sortOrder: 'asc' | 'desc';
+ /** The boundary row's keyset tuple values, in sort-spec column order. */
+ values: (string | number)[];
+}
diff --git a/packages/@nitpicker/query/src/viewer-read-model/build-viewer-read-model.spec.ts b/packages/@nitpicker/query/src/viewer-read-model/build-viewer-read-model.spec.ts
index 5de18cc9..385247bf 100644
--- a/packages/@nitpicker/query/src/viewer-read-model/build-viewer-read-model.spec.ts
+++ b/packages/@nitpicker/query/src/viewer-read-model/build-viewer-read-model.spec.ts
@@ -1102,7 +1102,9 @@ describe('buildViewerReadModel', () => {
isSkipped: false,
});
- // A second, distinct referring page to the same destination.
+ // A second, distinct referring page to the same destination, plus two
+ // duplicate anchors to a broken destination — must collapse to one
+ // viewer_anchor_facts row with count=2, not two rows.
await archive.setPage({
url: parseUrl('https://example.com/page-b')!,
redirectPaths: [],
@@ -1122,6 +1124,18 @@ describe('buildViewerReadModel', () => {
title: null,
textContent: 'Ad sidebar',
},
+ {
+ href: parseUrl('https://example.com/broken')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Broken link 1',
+ },
+ {
+ href: parseUrl('https://example.com/broken')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Broken link 2',
+ },
],
imageList: [],
isSkipped: false,
@@ -1143,6 +1157,22 @@ describe('buildViewerReadModel', () => {
imageList: [],
isSkipped: false,
});
+ await archive.setPage({
+ url: parseUrl('https://example.com/broken')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 404,
+ statusText: 'Not Found',
+ contentType: 'text/html',
+ contentLength: 0,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
await buildViewerReadModel(archive);
});
@@ -1171,5 +1201,32 @@ describe('buildViewerReadModel', () => {
expect(rows).toHaveLength(1);
expect(rows[0]).toMatchObject({ referrer_count: 2 });
});
+
+ it('populates viewer_anchor_facts with one row per unique (source,dest) pair, collapsing duplicate anchors via count', async () => {
+ const rows = await archive
+ .getKnex()('viewer_anchor_facts')
+ .where('dest_url_sort_key', 'https://example.com/broken')
+ .select('*');
+ expect(rows).toHaveLength(1);
+ expect(rows[0]).toMatchObject({ count: 2, is_broken: 1, is_external_link: 0 });
+ });
+
+ it('flags the external-destination edges as is_external_link without indexing them for read (no vaf_external_* index exists)', async () => {
+ const rows = await archive
+ .getKnex()('viewer_anchor_facts')
+ .where('dest_url_sort_key', 'https://ads.example.com')
+ .select('*');
+ expect(rows).toHaveLength(2);
+ for (const row of rows) {
+ expect(row).toMatchObject({ is_broken: 0, is_external_link: 1 });
+ }
+ });
+
+ it('rebuilds viewer_anchor_facts idempotently — a second build leaves the same row count', async () => {
+ await buildViewerReadModel(archive);
+ const rows = await archive.getKnex()('viewer_anchor_facts').select('*');
+ // 2 edges to ads.example.com (page-a, page-b) + 1 edge to /broken (page-b).
+ expect(rows).toHaveLength(3);
+ });
});
});
diff --git a/packages/@nitpicker/query/src/viewer-read-model/build-viewer-read-model.ts b/packages/@nitpicker/query/src/viewer-read-model/build-viewer-read-model.ts
index 664ea08d..07963cc7 100644
--- a/packages/@nitpicker/query/src/viewer-read-model/build-viewer-read-model.ts
+++ b/packages/@nitpicker/query/src/viewer-read-model/build-viewer-read-model.ts
@@ -5,36 +5,17 @@ import { classifyContentType } from '../classify-content-type.js';
import { excludeSkippedPages } from '../exclude-skipped-pages.js';
import { buildDirectoryTreeRows } from './build-directory-tree-rows.js';
-import { computeExternalLinkRows } from './compute-external-link-rows.js';
+import { computeAnchorFactRows } from './compute-anchor-fact-rows.js';
import { computePageFacetBuckets } from './compute-page-facet-buckets.js';
import { createViewerReadModelTables } from './create-viewer-read-model-tables.js';
+import { deriveExternalLinkSummaryRows } from './derive-external-link-summary-rows.js';
import { dropViewerReadModelTables } from './drop-viewer-read-model-tables.js';
+import { NULL_STATUS_SENTINEL } from './null-status-sentinel.js';
import { VIEWER_READ_MODEL_SCHEMA_VERSION } from './viewer-read-model-schema-version.js';
/** Number of rows written per `INSERT` statement while populating `viewer_pages`. */
const INSERT_CHUNK_SIZE = 500;
-/**
- * Sentinel `status_sort_key` value substituted for `null` status (errored /
- * not-yet-classified rows). Chosen smaller than any real HTTP status code
- * (100-599) so unknown-status rows keep sorting first in ascending order —
- * matching `listPages`'s prior behavior of ordering directly on the nullable
- * `status` column, where SQLite treats `NULL` as smaller than any value.
- *
- * Deliberately distinct from `-1`, which `Database.resetFailedPages` already
- * uses as the "hard failure" HTTP status sentinel (see that function's docs)
- * — reusing `-1` here would conflate two different populations of rows in
- * `status_sort_key` ordering and in any future `status = -1` equality filter.
- *
- * Keyset cursor comparisons need a NEVER-`null` sort-key column: SQL's
- * three-valued logic makes `NULL > x` / `NULL < x` always evaluate to
- * `NULL` (never true), which would silently break tuple comparisons like
- * `(status_sort_key, url_sort_key, page_id) > (?, ?, ?)` for rows whose
- * status is unknown. Substituting a sentinel keeps every row on this column
- * strictly orderable.
- */
-const NULL_STATUS_SENTINEL = -32_768;
-
/**
* Row shape read from the write-model `pages` table while populating
* `viewer_pages`. Column names match `pages` verbatim (see
@@ -201,14 +182,16 @@ function toViewerPageInsertRow(row: PagesSourceRow): ViewerPageInsertRow {
}
/**
- * Performs a full rebuild of the viewer read model: drops all 8 tables if
+ * Performs a full rebuild of the viewer read model: drops all 9 tables if
* present, recreates them, populates `viewer_pages` from the current
* `pages` write-model table, populates `viewer_directory_nodes`/
* `viewer_directory_pages` from that same page set (see
* `buildDirectoryTreeRows` for the tree-building rules), populates
- * `viewer_external_links` from a dedicated `anchors` aggregation query (see
- * `computeExternalLinkRows` — unlike the directory tree, this cannot reuse
- * `sourceRows`, since link data lives on `anchors`, not `pages`), seeds one
+ * `viewer_anchor_facts` from a single `anchors` aggregation query (see
+ * `computeAnchorFactRows` — unlike the directory tree, this cannot reuse
+ * `sourceRows`, since link data lives on `anchors`, not `pages`) and derives
+ * `viewer_external_links` from those same in-memory rows with no second
+ * `anchors` scan (see `deriveExternalLinkSummaryRows`), seeds one
* smoke-test row into `viewer_query_profiles`, writes the
* `viewer_count_buckets` totals row plus one row per distinct Pages-list
* facet value (see `computePageFacetBuckets`), and writes the
@@ -325,10 +308,19 @@ export async function buildViewerReadModel(
// Unlike `viewer_pages`/the directory tree, this needs its own `anchors`
// query — `sourceRows` (loaded from `pages` only) has no anchor/link
- // data. Runs once, here, instead of on every `/api/links?type=external`
- // request — see `computeExternalLinkRows`'s docs for the SQLite
+ // data. Runs once, here, instead of on every `/api/links?type=broken`
+ // request — see `computeAnchorFactRows`'s docs for the SQLite
// performance rationale.
- const externalLinkRows = await computeExternalLinkRows(trx);
+ const anchorFactRows = await computeAnchorFactRows(trx);
+ for (let start = 0; start < anchorFactRows.length; start += INSERT_CHUNK_SIZE) {
+ await trx('viewer_anchor_facts').insert(
+ anchorFactRows.slice(start, start + INSERT_CHUNK_SIZE),
+ );
+ }
+
+ // Derived in memory from the anchor-fact rows already computed above —
+ // no second `anchors` scan (see `deriveExternalLinkSummaryRows`'s docs).
+ const externalLinkRows = deriveExternalLinkSummaryRows(anchorFactRows);
for (let start = 0; start < externalLinkRows.length; start += INSERT_CHUNK_SIZE) {
await trx('viewer_external_links').insert(
externalLinkRows.slice(start, start + INSERT_CHUNK_SIZE),
diff --git a/packages/@nitpicker/query/src/viewer-read-model/compute-anchor-fact-rows.spec.ts b/packages/@nitpicker/query/src/viewer-read-model/compute-anchor-fact-rows.spec.ts
new file mode 100644
index 00000000..55117c35
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-read-model/compute-anchor-fact-rows.spec.ts
@@ -0,0 +1,524 @@
+import path from 'node:path';
+
+import { tryParseUrl as parseUrl } from '@d-zero/shared/parse-url';
+import { Archive } from '@nitpicker/crawler';
+import { afterAll, beforeAll, describe, expect, it } from 'vitest';
+
+import { computeAnchorFactRows } from './compute-anchor-fact-rows.js';
+
+const __filename = new URL(import.meta.url).pathname;
+const __dirname = path.dirname(__filename);
+
+const BASE_CONFIG = {
+ baseUrl: 'https://example.com',
+ name: 'test',
+ version: '0.10.0',
+ recursive: true,
+ interval: 0,
+ image: true,
+ fetchExternal: false,
+ parallels: 1,
+ roots: ['https://example.com'],
+ excludes: [],
+ excludeKeywords: [],
+ excludeUrls: [],
+ maxExcludedDepth: 0,
+ retry: 3,
+ fromList: false,
+ disableQueries: false,
+ userAgent: 'test',
+ ignoreRobots: false,
+};
+
+const META = {
+ lang: null,
+ title: null,
+ description: null,
+ keywords: null,
+ noindex: false,
+ nofollow: false,
+ noarchive: false,
+ canonical: null,
+ alternate: null,
+ 'og:type': null,
+ 'og:title': null,
+ 'og:site_name': null,
+ 'og:description': null,
+ 'og:url': null,
+ 'og:image': null,
+ 'twitter:card': null,
+};
+
+describe('computeAnchorFactRows', () => {
+ const workingDir = path.resolve(
+ __dirname,
+ '__test_fixtures_compute_anchor_fact_rows__',
+ );
+ let archive: InstanceType;
+ const archiveFilePath = path.resolve(
+ workingDir,
+ 'compute-anchor-fact-rows-test.nitpicker',
+ );
+
+ beforeAll(async () => {
+ const { mkdirSync } = await import('node:fs');
+ mkdirSync(workingDir, { recursive: true });
+ archive = await Archive.create({ filePath: archiveFilePath, cwd: workingDir });
+ await archive.setConfig(BASE_CONFIG);
+
+ // Page A: two anchors to /broken (same pair, must collapse to one
+ // row with count=2), one anchor to ads.example.com (external).
+ await archive.setPage({
+ url: parseUrl('https://example.com/page-a')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: { ...META, title: 'Page A' },
+ anchorList: [
+ {
+ href: parseUrl('https://example.com/broken')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Broken link 1',
+ },
+ {
+ href: parseUrl('https://example.com/broken')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Broken link 2',
+ },
+ {
+ href: parseUrl('https://ads.example.com/')!,
+ isExternal: true,
+ title: null,
+ textContent: 'Ad',
+ },
+ ],
+ imageList: [],
+ isSkipped: false,
+ });
+
+ // Page B: anchor to a 403 destination (must NOT be flagged broken)
+ // and a 500 destination (must NOT be flagged broken either).
+ await archive.setPage({
+ url: parseUrl('https://example.com/page-b')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: { ...META, title: 'Page B' },
+ anchorList: [
+ {
+ href: parseUrl('https://example.com/forbidden')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Forbidden',
+ },
+ {
+ href: parseUrl('https://example.com/server-error')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Server error',
+ },
+ ],
+ imageList: [],
+ isSkipped: false,
+ });
+
+ await archive.setPage({
+ url: parseUrl('https://example.com/broken')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 404,
+ statusText: 'Not Found',
+ contentType: 'text/html',
+ contentLength: 0,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+ await archive.setPage({
+ url: parseUrl('https://example.com/forbidden')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 403,
+ statusText: 'Forbidden',
+ contentType: 'text/html',
+ contentLength: 0,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+ await archive.setPage({
+ url: parseUrl('https://example.com/server-error')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 500,
+ statusText: 'Internal Server Error',
+ contentType: 'text/html',
+ contentLength: 0,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+ await archive.setPage({
+ url: parseUrl('https://ads.example.com/')!,
+ redirectPaths: [],
+ isExternal: true,
+ isTarget: false,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+ });
+
+ afterAll(async () => {
+ if (archive) {
+ await archive.releaseHandle();
+ }
+ const { rmSync } = await import('node:fs');
+ rmSync(workingDir, { recursive: true, force: true });
+ });
+
+ it('collapses duplicate anchors between the same (source,dest) pair into one row with count', async () => {
+ const knex = archive.getKnex();
+ const rows = await knex.transaction((trx) => computeAnchorFactRows(trx));
+ const broken = rows.find(
+ (row) => row.dest_url_sort_key === 'https://example.com/broken',
+ );
+ expect(broken).toMatchObject({ count: 2, is_broken: 1 });
+ });
+
+ it('flags only 404 destinations as broken, not 403 or 5xx', async () => {
+ const knex = archive.getKnex();
+ const rows = await knex.transaction((trx) => computeAnchorFactRows(trx));
+ const forbidden = rows.find(
+ (row) => row.dest_url_sort_key === 'https://example.com/forbidden',
+ );
+ const serverError = rows.find(
+ (row) => row.dest_url_sort_key === 'https://example.com/server-error',
+ );
+ expect(forbidden).toMatchObject({ status: 403, is_broken: 0 });
+ expect(serverError).toMatchObject({ status: 500, is_broken: 0 });
+ });
+
+ it('flags external destinations via is_external_link, not is_broken', async () => {
+ const knex = archive.getKnex();
+ const rows = await knex.transaction((trx) => computeAnchorFactRows(trx));
+ const ads = rows.find((row) => row.dest_url_sort_key === 'https://ads.example.com');
+ expect(ads).toMatchObject({ count: 1, is_broken: 0, is_external_link: 1 });
+ });
+
+ it('substitutes NULL_STATUS_SENTINEL only when status is null, never a real status', async () => {
+ const knex = archive.getKnex();
+ const rows = await knex.transaction((trx) => computeAnchorFactRows(trx));
+ const broken = rows.find(
+ (row) => row.dest_url_sort_key === 'https://example.com/broken',
+ )!;
+ expect(broken.status_sort_key).toBe(404);
+ });
+
+ it('sets status_desc_key to the negation of status_sort_key', async () => {
+ const knex = archive.getKnex();
+ const rows = await knex.transaction((trx) => computeAnchorFactRows(trx));
+ const broken = rows.find(
+ (row) => row.dest_url_sort_key === 'https://example.com/broken',
+ )!;
+ expect(broken.status_desc_key).toBe(-404);
+ });
+});
+
+/**
+ * Mirrors `list-links.spec.ts`'s redirect-resolution describe block: an
+ * anchor to an internal redirect-source page and an anchor directly to the
+ * same canonical destination must collapse into a single row (same
+ * dest_page_id), not two, and the broken/external judgment must use the
+ * canonical destination, not the literal redirect-source.
+ */
+describe('computeAnchorFactRows — redirect resolution', () => {
+ const workingDir = path.resolve(
+ __dirname,
+ '__test_fixtures_compute_anchor_fact_rows_redirect__',
+ );
+ let archive: InstanceType;
+ const archiveFilePath = path.resolve(
+ workingDir,
+ 'compute-anchor-fact-rows-redirect-test.nitpicker',
+ );
+
+ beforeAll(async () => {
+ const { mkdirSync } = await import('node:fs');
+ mkdirSync(workingDir, { recursive: true });
+ archive = await Archive.create({ filePath: archiveFilePath, cwd: workingDir });
+ await archive.setConfig(BASE_CONFIG);
+
+ await archive.setPage({
+ url: parseUrl('https://example.com/direct')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: { ...META, title: 'Direct' },
+ anchorList: [
+ {
+ href: parseUrl('https://example.com/canonical-target')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Direct link',
+ },
+ ],
+ imageList: [],
+ isSkipped: false,
+ });
+
+ await archive.setPage({
+ url: parseUrl('https://example.com/via-redirect')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: { ...META, title: 'Via redirect' },
+ anchorList: [
+ {
+ href: parseUrl('https://example.com/old')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Old link',
+ hash: null,
+ },
+ ],
+ imageList: [],
+ isSkipped: false,
+ });
+
+ await archive.setPage({
+ url: parseUrl('https://example.com/canonical-target')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 404,
+ statusText: 'Not Found',
+ contentType: 'text/html',
+ contentLength: 0,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+
+ await archive.setRedirect({
+ url: parseUrl('https://example.com/old')!,
+ redirectPaths: ['https://example.com/canonical-target'],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+ });
+
+ afterAll(async () => {
+ if (archive) {
+ await archive.releaseHandle();
+ }
+ const { rmSync } = await import('node:fs');
+ rmSync(workingDir, { recursive: true, force: true });
+ });
+
+ it('collapses a redirect-source anchor and a direct anchor onto the same canonical dest_page_id, judged broken via the canonical status', async () => {
+ const knex = archive.getKnex();
+ const rows = await knex.transaction((trx) => computeAnchorFactRows(trx));
+ const targetRows = rows.filter(
+ (row) => row.dest_url_sort_key === 'https://example.com/canonical-target',
+ );
+ expect(targetRows).toHaveLength(2);
+ expect(new Set(targetRows.map((row) => row.dest_page_id)).size).toBe(1);
+ for (const row of targetRows) {
+ expect(row).toMatchObject({ status: 404, is_broken: 1, count: 1 });
+ }
+ });
+});
+
+/**
+ * Mirrors the internal-destination redirect-resolution block above, but for
+ * a canonical destination that is itself external — `is_external_link` must
+ * come from the canonical page's `isExternal`, not the (always-internal)
+ * redirect-source page's, and the two anchors (one direct, one via an
+ * internal redirect-source) must still collapse onto one `dest_page_id`.
+ */
+describe('computeAnchorFactRows — redirect resolution to an external destination', () => {
+ const workingDir = path.resolve(
+ __dirname,
+ '__test_fixtures_compute_anchor_fact_rows_redirect_external__',
+ );
+ let archive: InstanceType;
+ const archiveFilePath = path.resolve(
+ workingDir,
+ 'compute-anchor-fact-rows-redirect-external-test.nitpicker',
+ );
+
+ beforeAll(async () => {
+ const { mkdirSync } = await import('node:fs');
+ mkdirSync(workingDir, { recursive: true });
+ archive = await Archive.create({ filePath: archiveFilePath, cwd: workingDir });
+ await archive.setConfig(BASE_CONFIG);
+
+ await archive.setPage({
+ url: parseUrl('https://example.com/direct-ext')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: { ...META, title: 'Direct ext' },
+ anchorList: [
+ {
+ href: parseUrl('https://external.example.com/target')!,
+ isExternal: true,
+ title: null,
+ textContent: 'Direct external link',
+ },
+ ],
+ imageList: [],
+ isSkipped: false,
+ });
+
+ await archive.setPage({
+ url: parseUrl('https://example.com/via-redirect-ext')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: { ...META, title: 'Via redirect ext' },
+ anchorList: [
+ {
+ href: parseUrl('https://example.com/old-ext')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Old external link',
+ hash: null,
+ },
+ ],
+ imageList: [],
+ isSkipped: false,
+ });
+
+ await archive.setPage({
+ url: parseUrl('https://external.example.com/target')!,
+ redirectPaths: [],
+ isExternal: true,
+ isTarget: false,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+
+ await archive.setRedirect({
+ url: parseUrl('https://example.com/old-ext')!,
+ redirectPaths: ['https://external.example.com/target'],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+ });
+
+ afterAll(async () => {
+ if (archive) {
+ await archive.releaseHandle();
+ }
+ const { rmSync } = await import('node:fs');
+ rmSync(workingDir, { recursive: true, force: true });
+ });
+
+ it('collapses a redirect-source anchor and a direct anchor onto the same external canonical dest_page_id, flagged external via the canonical page', async () => {
+ const knex = archive.getKnex();
+ const rows = await knex.transaction((trx) => computeAnchorFactRows(trx));
+ const targetRows = rows.filter(
+ (row) => row.dest_url_sort_key === 'https://external.example.com/target',
+ );
+ expect(targetRows).toHaveLength(2);
+ expect(new Set(targetRows.map((row) => row.dest_page_id)).size).toBe(1);
+ for (const row of targetRows) {
+ expect(row).toMatchObject({
+ status: 200,
+ is_broken: 0,
+ is_external_link: 1,
+ count: 1,
+ });
+ }
+ });
+});
diff --git a/packages/@nitpicker/query/src/viewer-read-model/compute-anchor-fact-rows.ts b/packages/@nitpicker/query/src/viewer-read-model/compute-anchor-fact-rows.ts
new file mode 100644
index 00000000..dba94774
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-read-model/compute-anchor-fact-rows.ts
@@ -0,0 +1,70 @@
+import type { AnchorFactInsertRow } from './types.js';
+import type { Knex } from 'knex';
+
+import { NULL_STATUS_SENTINEL } from './null-status-sentinel.js';
+
+/**
+ * Computes one row per unique `(source_page_id, dest_page_id)` pair for
+ * bulk insert into `viewer_anchor_facts`, for `viewer_external_links` to
+ * derive its summary from afterward (`deriveExternalLinkSummaryRows`) —
+ * this is the only `anchors` scan the read-model build performs for either
+ * table.
+ *
+ * Redirect resolution (`COALESCE(canonical.*, dest.*)`) and the broken-link
+ * definition (`status = 404` strictly — see `list-links.ts`'s scope note:
+ * 403/5xx/unknown never count as broken) are lifted verbatim from
+ * `list-links.ts`/`list-external-links.ts`'s live queries. Duplicate
+ * anchors between the same pair (e.g. a nav link repeated in header and
+ * footer) collapse into one row via `count` — see
+ * ARCHITECTURE.md「設計注意(viewer_anchor_facts read model、issue
+ * #114)」for why this is a genuine read/write/storage improvement, not
+ * just a shortcut.
+ * @param trx - An open Knex transaction (a plain `Knex` instance also
+ * works, e.g. in tests).
+ * @returns One row per unique `(source_page_id, dest_page_id)` pair.
+ */
+export async function computeAnchorFactRows(trx: Knex): Promise {
+ const destIdExpression = 'COALESCE("canonical"."id", "dest"."id")';
+ const destUrlExpression = 'COALESCE("canonical"."url", "dest"."url")';
+ const statusExpression = 'COALESCE("canonical"."status", "dest"."status")';
+ const isExternalExpression = 'COALESCE("canonical"."isExternal", "dest"."isExternal")';
+
+ const rows: {
+ sourcePageId: number;
+ destPageId: number;
+ sourceUrl: string;
+ destUrl: string;
+ status: number | null;
+ isExternal: 0 | 1;
+ count: number;
+ }[] = await trx('anchors')
+ .join('pages as source', 'anchors.pageId', '=', 'source.id')
+ .join('pages as dest', 'anchors.hrefId', '=', 'dest.id')
+ .leftJoin('pages as canonical', 'dest.redirectDestId', '=', 'canonical.id')
+ .groupBy('source.id', trx.raw(destIdExpression))
+ .select(
+ 'source.id as sourcePageId',
+ trx.raw(`${destIdExpression} as "destPageId"`),
+ 'source.url as sourceUrl',
+ trx.raw(`${destUrlExpression} as "destUrl"`),
+ trx.raw(`${statusExpression} as "status"`),
+ trx.raw(`${isExternalExpression} as "isExternal"`),
+ trx.raw('count(*) as "count"'),
+ );
+
+ return rows.map((row) => {
+ const statusSortKey = row.status ?? NULL_STATUS_SENTINEL;
+ return {
+ source_page_id: row.sourcePageId,
+ dest_page_id: row.destPageId,
+ source_url_sort_key: row.sourceUrl,
+ dest_url_sort_key: row.destUrl,
+ status: row.status,
+ status_sort_key: statusSortKey,
+ status_desc_key: -statusSortKey,
+ count: Number(row.count),
+ is_broken: row.status === 404 ? 1 : 0,
+ is_external_link: row.isExternal ? 1 : 0,
+ };
+ });
+}
diff --git a/packages/@nitpicker/query/src/viewer-read-model/compute-external-link-rows.spec.ts b/packages/@nitpicker/query/src/viewer-read-model/compute-external-link-rows.spec.ts
deleted file mode 100644
index bde2b0d6..00000000
--- a/packages/@nitpicker/query/src/viewer-read-model/compute-external-link-rows.spec.ts
+++ /dev/null
@@ -1,321 +0,0 @@
-import path from 'node:path';
-
-import { tryParseUrl as parseUrl } from '@d-zero/shared/parse-url';
-import { Archive } from '@nitpicker/crawler';
-import { afterAll, beforeAll, describe, expect, it } from 'vitest';
-
-import { computeExternalLinkRows } from './compute-external-link-rows.js';
-
-const __filename = new URL(import.meta.url).pathname;
-const __dirname = path.dirname(__filename);
-
-const BASE_CONFIG = {
- baseUrl: 'https://example.com',
- name: 'test',
- version: '0.10.0',
- recursive: true,
- interval: 0,
- image: true,
- fetchExternal: false,
- parallels: 1,
- roots: ['https://example.com'],
- excludes: [],
- excludeKeywords: [],
- excludeUrls: [],
- maxExcludedDepth: 0,
- retry: 3,
- fromList: false,
- disableQueries: false,
- userAgent: 'test',
- ignoreRobots: false,
-};
-
-const META = {
- lang: null,
- title: null,
- description: null,
- keywords: null,
- noindex: false,
- nofollow: false,
- noarchive: false,
- canonical: null,
- alternate: null,
- 'og:type': null,
- 'og:title': null,
- 'og:site_name': null,
- 'og:description': null,
- 'og:url': null,
- 'og:image': null,
- 'twitter:card': null,
-};
-
-describe('computeExternalLinkRows', () => {
- const workingDir = path.resolve(
- __dirname,
- '__test_fixtures_compute_external_link_rows__',
- );
- let archive: InstanceType;
- const archiveFilePath = path.resolve(
- workingDir,
- 'compute-external-link-rows-test.nitpicker',
- );
-
- beforeAll(async () => {
- const { mkdirSync } = await import('node:fs');
- mkdirSync(workingDir, { recursive: true });
- archive = await Archive.create({ filePath: archiveFilePath, cwd: workingDir });
- await archive.setConfig(BASE_CONFIG);
-
- // Page A: two anchors to ads.example.com (same page, must count as one
- // referrer, not two), plus one to tracking.
- await archive.setPage({
- url: parseUrl('https://example.com/page-a')!,
- redirectPaths: [],
- isExternal: false,
- isTarget: true,
- status: 200,
- statusText: 'OK',
- contentType: 'text/html',
- contentLength: 100,
- responseHeaders: {},
- html: '',
- meta: { ...META, title: 'Page A' },
- anchorList: [
- {
- href: parseUrl('https://ads.example.com/')!,
- isExternal: true,
- title: null,
- textContent: 'Ad banner',
- },
- {
- href: parseUrl('https://ads.example.com/')!,
- isExternal: true,
- title: null,
- textContent: 'Ad footer',
- },
- {
- href: parseUrl('https://tracking.example.com/')!,
- isExternal: true,
- title: null,
- textContent: 'Tracking',
- },
- ],
- imageList: [],
- isSkipped: false,
- });
-
- // Page B: a second, distinct referrer to ads.example.com.
- await archive.setPage({
- url: parseUrl('https://example.com/page-b')!,
- redirectPaths: [],
- isExternal: false,
- isTarget: true,
- status: 200,
- statusText: 'OK',
- contentType: 'text/html',
- contentLength: 100,
- responseHeaders: {},
- html: '',
- meta: { ...META, title: 'Page B' },
- anchorList: [
- {
- href: parseUrl('https://ads.example.com/')!,
- isExternal: true,
- title: null,
- textContent: 'Ad sidebar',
- },
- ],
- imageList: [],
- isSkipped: false,
- });
-
- await archive.setPage({
- url: parseUrl('https://ads.example.com/')!,
- redirectPaths: [],
- isExternal: true,
- isTarget: false,
- status: 200,
- statusText: 'OK',
- contentType: 'text/html',
- contentLength: 100,
- responseHeaders: {},
- html: '',
- meta: META,
- anchorList: [],
- imageList: [],
- isSkipped: false,
- });
- await archive.setPage({
- url: parseUrl('https://tracking.example.com/')!,
- redirectPaths: [],
- isExternal: true,
- isTarget: false,
- status: 404,
- statusText: 'Not Found',
- contentType: 'text/html',
- contentLength: 0,
- responseHeaders: {},
- html: '',
- meta: META,
- anchorList: [],
- imageList: [],
- isSkipped: false,
- });
- });
-
- afterAll(async () => {
- if (archive) {
- await archive.releaseHandle();
- }
- const { rmSync } = await import('node:fs');
- rmSync(workingDir, { recursive: true, force: true });
- });
-
- it('groups anchors by canonical destination, one row per unique destination', async () => {
- const knex = archive.getKnex();
- const rows = await knex.transaction((trx) => computeExternalLinkRows(trx));
- expect(rows).toHaveLength(2);
- });
-
- it('counts referrers by distinct page id, not anchor count', async () => {
- // Page A has two tags to ads.example.com; combined with page B
- // that's 2 distinct referring pages, not 3 anchors.
- const knex = archive.getKnex();
- const rows = await knex.transaction((trx) => computeExternalLinkRows(trx));
- const ads = rows.find((row) => row.dest_url === 'https://ads.example.com');
- expect(ads).toMatchObject({ status: 200, referrer_count: 2 });
- });
-
- it('carries the canonical destination status through', async () => {
- const knex = archive.getKnex();
- const rows = await knex.transaction((trx) => computeExternalLinkRows(trx));
- const tracking = rows.find((row) => row.dest_url === 'https://tracking.example.com');
- expect(tracking).toMatchObject({ status: 404, referrer_count: 1 });
- });
-});
-
-/**
- * Mirrors `list-external-links.spec.ts`'s redirect-resolution describe
- * block: an anchor to an internal redirect-source page and an anchor
- * directly to the same external canonical destination must collapse into a
- * single `viewer_external_links` row, not two.
- */
-describe('computeExternalLinkRows — redirect resolution', () => {
- const workingDir = path.resolve(
- __dirname,
- '__test_fixtures_compute_external_link_rows_redirect__',
- );
- let archive: InstanceType;
- const archiveFilePath = path.resolve(
- workingDir,
- 'compute-external-link-rows-redirect-test.nitpicker',
- );
-
- beforeAll(async () => {
- const { mkdirSync } = await import('node:fs');
- mkdirSync(workingDir, { recursive: true });
- archive = await Archive.create({ filePath: archiveFilePath, cwd: workingDir });
- await archive.setConfig(BASE_CONFIG);
-
- await archive.setPage({
- url: parseUrl('https://example.com/direct')!,
- redirectPaths: [],
- isExternal: false,
- isTarget: true,
- status: 200,
- statusText: 'OK',
- contentType: 'text/html',
- contentLength: 100,
- responseHeaders: {},
- html: '',
- meta: { ...META, title: 'Direct' },
- anchorList: [
- {
- href: parseUrl('https://redirect-target.example.com/')!,
- isExternal: true,
- title: null,
- textContent: 'Direct link',
- },
- ],
- imageList: [],
- isSkipped: false,
- });
-
- await archive.setPage({
- url: parseUrl('https://example.com/via-redirect')!,
- redirectPaths: [],
- isExternal: false,
- isTarget: true,
- status: 200,
- statusText: 'OK',
- contentType: 'text/html',
- contentLength: 100,
- responseHeaders: {},
- html: '',
- meta: { ...META, title: 'Via redirect' },
- anchorList: [
- {
- href: parseUrl('https://example.com/old')!,
- isExternal: false,
- title: null,
- textContent: 'Old link',
- hash: null,
- },
- ],
- imageList: [],
- isSkipped: false,
- });
-
- await archive.setPage({
- url: parseUrl('https://redirect-target.example.com/')!,
- redirectPaths: [],
- isExternal: true,
- isTarget: false,
- status: 200,
- statusText: 'OK',
- contentType: 'text/html',
- contentLength: 100,
- responseHeaders: {},
- html: '',
- meta: META,
- anchorList: [],
- imageList: [],
- isSkipped: false,
- });
-
- await archive.setRedirect({
- url: parseUrl('https://example.com/old')!,
- redirectPaths: ['https://redirect-target.example.com/'],
- isExternal: false,
- isTarget: true,
- status: 200,
- statusText: 'OK',
- contentType: 'text/html',
- contentLength: 100,
- responseHeaders: {},
- html: '',
- meta: META,
- anchorList: [],
- imageList: [],
- isSkipped: false,
- });
- });
-
- afterAll(async () => {
- if (archive) {
- await archive.releaseHandle();
- }
- const { rmSync } = await import('node:fs');
- rmSync(workingDir, { recursive: true, force: true });
- });
-
- it('collapses a redirect-source anchor and a direct anchor onto the same canonical destination row', async () => {
- const knex = archive.getKnex();
- const rows = await knex.transaction((trx) => computeExternalLinkRows(trx));
- expect(rows).toHaveLength(1);
- expect(rows[0]).toMatchObject({
- dest_url: 'https://redirect-target.example.com',
- referrer_count: 2,
- });
- });
-});
diff --git a/packages/@nitpicker/query/src/viewer-read-model/compute-external-link-rows.ts b/packages/@nitpicker/query/src/viewer-read-model/compute-external-link-rows.ts
deleted file mode 100644
index a09dc2af..00000000
--- a/packages/@nitpicker/query/src/viewer-read-model/compute-external-link-rows.ts
+++ /dev/null
@@ -1,54 +0,0 @@
-import type { ExternalLinkInsertRow } from './types.js';
-import type { Knex } from 'knex';
-
-/**
- * Computes every unique external destination reached from the site, for
- * bulk insert into `viewer_external_links`.
- *
- * The aggregation itself (`COALESCE(canonical.*, dest.*)` redirect
- * resolution, `GROUP BY` on the canonical destination id, `COUNT(DISTINCT
- * source.id)` for the referrer count) is lifted verbatim from
- * `list-external-links.ts`'s live query — see that file's docs for why the
- * counting grain must stay in lockstep with `getPageDetail.inboundLinks`
- * (#71). The only difference here is that this runs once, at read-model
- * build time, against the full `anchors` table with no `LIMIT`/`OFFSET` —
- * see ARCHITECTURE.md「設計注意(外部リンク read model)」for why running
- * this JOIN + `GROUP BY` + `COUNT(DISTINCT ...)` combination on every
- * `/api/links?type=external` request is a known SQLite performance
- * pitfall, and why materialising it once avoids it.
- * @param trx - An open Knex transaction (a plain `Knex` instance also
- * works, e.g. in tests).
- * @returns One row per unique canonical external destination.
- */
-export async function computeExternalLinkRows(
- trx: Knex,
-): Promise {
- const destIdExpression = 'COALESCE("canonical"."id", "dest"."id")';
- const destUrlExpression = 'COALESCE("canonical"."url", "dest"."url")';
- const statusExpression = 'COALESCE("canonical"."status", "dest"."status")';
-
- const rows: {
- destPageId: number;
- destUrl: string;
- status: number | null;
- referrerCount: number;
- }[] = await trx('anchors')
- .join('pages as source', 'anchors.pageId', '=', 'source.id')
- .join('pages as dest', 'anchors.hrefId', '=', 'dest.id')
- .leftJoin('pages as canonical', 'dest.redirectDestId', '=', 'canonical.id')
- .whereRaw(`COALESCE("canonical"."isExternal", "dest"."isExternal") = 1`)
- .groupBy(trx.raw(destIdExpression))
- .select(
- trx.raw(`${destIdExpression} as "destPageId"`),
- trx.raw(`${destUrlExpression} as "destUrl"`),
- trx.raw(`${statusExpression} as "status"`),
- trx.raw('count(distinct "source"."id") as "referrerCount"'),
- );
-
- return rows.map((row) => ({
- dest_page_id: row.destPageId,
- dest_url: row.destUrl,
- status: row.status,
- referrer_count: Number(row.referrerCount),
- }));
-}
diff --git a/packages/@nitpicker/query/src/viewer-read-model/create-viewer-read-model-tables.spec.ts b/packages/@nitpicker/query/src/viewer-read-model/create-viewer-read-model-tables.spec.ts
index 570ef951..dc9fdff6 100644
--- a/packages/@nitpicker/query/src/viewer-read-model/create-viewer-read-model-tables.spec.ts
+++ b/packages/@nitpicker/query/src/viewer-read-model/create-viewer-read-model-tables.spec.ts
@@ -30,7 +30,7 @@ describe('createViewerReadModelTables', () => {
rmSync(workingDir, { recursive: true, force: true });
});
- it('creates all 8 tables and the named viewer_pages indexes', async () => {
+ it('creates all 9 tables and the named viewer_pages indexes', async () => {
const knex = archive.getKnex();
await knex.transaction((trx) => createViewerReadModelTables(trx));
@@ -43,6 +43,7 @@ describe('createViewerReadModelTables', () => {
'viewer_directory_nodes',
'viewer_directory_pages',
'viewer_external_links',
+ 'viewer_anchor_facts',
]) {
expect(await knex.schema.hasTable(table)).toBe(true);
}
@@ -71,6 +72,21 @@ describe('createViewerReadModelTables', () => {
for (const indexName of ['vel_url', 'vel_status', 'vel_referrer_count']) {
expect(externalLinkIndexNames.has(indexName)).toBe(true);
}
+
+ const anchorFactIndexRows: Array<{ name: string }> = await knex('sqlite_master')
+ .where({ type: 'index', tbl_name: 'viewer_anchor_facts' })
+ .select('name');
+ const anchorFactIndexNames = new Set(anchorFactIndexRows.map((r) => r.name));
+ for (const indexName of [
+ 'vaf_broken_source',
+ 'vaf_broken_dest',
+ 'vaf_broken_status',
+ 'vaf_broken_status_desc',
+ 'vaf_source',
+ 'vaf_dest',
+ ]) {
+ expect(anchorFactIndexNames.has(indexName)).toBe(true);
+ }
});
it('viewer_query_profiles enforces a composite (scope, profile_key) key, not a single-column rowid', async () => {
diff --git a/packages/@nitpicker/query/src/viewer-read-model/create-viewer-read-model-tables.ts b/packages/@nitpicker/query/src/viewer-read-model/create-viewer-read-model-tables.ts
index ecfed6de..0644ae20 100644
--- a/packages/@nitpicker/query/src/viewer-read-model/create-viewer-read-model-tables.ts
+++ b/packages/@nitpicker/query/src/viewer-read-model/create-viewer-read-model-tables.ts
@@ -1,14 +1,14 @@
import type { Knex } from 'knex';
/**
- * Creates all 8 viewer-read-model tables (and `viewer_pages`'s named
+ * Creates all 9 viewer-read-model tables (and `viewer_pages`'s named
* indexes) against the given connection. Assumes none of the tables
* currently exist — callers (`buildViewerReadModel`) are responsible for
* dropping any prior version first, inside the same transaction, so this
* function is not itself idempotent.
*
* Every statement runs via `raw()` rather than knex's chainable schema
- * builder: 5 of the 8 tables need `WITHOUT ROWID` / a composite primary key
+ * builder: 5 of the 9 tables need `WITHOUT ROWID` / a composite primary key
* / a `CHECK` constraint / a table-level `UNIQUE` constraint, none of which
* the chainable builder can express (the same reason `page_html_blobs` /
* `page_html_ref` drop to `raw()` in `@nitpicker/crawler`'s
@@ -159,14 +159,16 @@ export async function createViewerReadModelTables(trx: Knex): Promise {
);
// Pre-aggregated, deduplicated-by-canonical-destination external link
- // list — see `computeExternalLinkRows`'s docs for why this needs its own
- // `anchors` query rather than reusing `viewer_pages`'s `sourceRows` (the
- // aggregation joins `anchors` at build time instead of on every read,
- // see ARCHITECTURE.md「設計注意(外部リンク read model)」for the
- // SQLite COUNT(DISTINCT)/GROUP BY performance rationale). No
+ // summary — derived in memory from `viewer_anchor_facts` rows (see
+ // `deriveExternalLinkSummaryRows`'s docs) rather than its own `anchors`
+ // scan, so building this table costs no extra JOIN over the one
+ // `computeAnchorFactRows` already does. See ARCHITECTURE.md「設計注意
+ // (viewer_anchor_facts read model、issue #114)」for the SQLite
+ // COUNT(DISTINCT)/GROUP BY performance rationale this sidesteps. No
// `_desc_key` columns like `viewer_pages` needs: pagination here is
- // plain offset-based (via `paginateQuery`), not keyset-cursor, so a
- // single ascending index scanned backward is enough for DESC.
+ // plain offset-based (via
+ // `paginateQuery`), not keyset-cursor, so a single ascending index
+ // scanned backward is enough for DESC.
await trx.raw(`
CREATE TABLE viewer_external_links (
dest_page_id integer primary key,
@@ -182,4 +184,58 @@ export async function createViewerReadModelTables(trx: Knex): Promise {
await trx.raw(
'CREATE INDEX vel_referrer_count ON viewer_external_links(referrer_count, dest_url, dest_page_id)',
);
+
+ // Edge-level (one row per unique (source_page_id, dest_page_id) pair,
+ // with `count` absorbing duplicate anchor observations between the same
+ // pair) fact table backing broken-link listing. Deliberately has no
+ // `url_refs`/`content_items` ref-table indirection (issue #139 — not
+ // landed, and #103's own execution order places it after this table):
+ // `source_url_sort_key`/`dest_url_sort_key` are inline text, copied at
+ // build time exactly like `viewer_pages.url_sort_key`, so indexed
+ // `ORDER BY` works without a pre-join. Full URL text for the OTHER
+ // (non-sort-key) display columns is resolved by joining back to `pages`
+ // only after the id set is limit-bounded (same limit-before-join
+ // pattern as `joinViewerPageIdsToListItems`). `is_external_link` is
+ // stored (SQLite INTEGER 0/1 costs ~0 bytes) but intentionally has no
+ // index: nothing reads this table filtered by it — it exists only for
+ // `deriveExternalLinkSummaryRows`'s in-memory pass over the full row
+ // set at build time. `status_desc_key` mirrors `viewer_pages`'s same
+ // column for the same reason: `docs/viewer-sql-query-plan.md`'s Stable
+ // Ordering rule keeps the `source_url_sort_key`/`edge_id` tie-breakers
+ // ascending even when the primary sort is `status desc` — a row-value
+ // keyset tuple comparison can't mix per-column directions, so the
+ // primary column is negated and walked ascending instead. See
+ // ARCHITECTURE.md「設計注意(viewer_anchor_facts read model、issue
+ // #114)」for the full read/write/storage rationale.
+ await trx.raw(`
+ CREATE TABLE viewer_anchor_facts (
+ edge_id integer primary key,
+ source_page_id integer not null,
+ dest_page_id integer not null,
+ source_url_sort_key text not null,
+ dest_url_sort_key text not null,
+ status integer,
+ status_sort_key integer not null,
+ status_desc_key integer not null,
+ count integer not null,
+ is_broken integer not null,
+ is_external_link integer not null
+ )
+ `);
+ await trx.raw(
+ 'CREATE INDEX vaf_broken_source ON viewer_anchor_facts(is_broken, source_url_sort_key, edge_id)',
+ );
+ await trx.raw(
+ 'CREATE INDEX vaf_broken_dest ON viewer_anchor_facts(is_broken, dest_url_sort_key, edge_id)',
+ );
+ await trx.raw(
+ 'CREATE INDEX vaf_broken_status ON viewer_anchor_facts(is_broken, status_sort_key, source_url_sort_key, edge_id)',
+ );
+ await trx.raw(
+ 'CREATE INDEX vaf_broken_status_desc ON viewer_anchor_facts(is_broken, status_desc_key, source_url_sort_key, edge_id)',
+ );
+ await trx.raw(
+ 'CREATE INDEX vaf_source ON viewer_anchor_facts(source_page_id, edge_id)',
+ );
+ await trx.raw('CREATE INDEX vaf_dest ON viewer_anchor_facts(dest_page_id, edge_id)');
}
diff --git a/packages/@nitpicker/query/src/viewer-read-model/derive-external-link-summary-rows.spec.ts b/packages/@nitpicker/query/src/viewer-read-model/derive-external-link-summary-rows.spec.ts
new file mode 100644
index 00000000..93c720a8
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-read-model/derive-external-link-summary-rows.spec.ts
@@ -0,0 +1,99 @@
+import type { AnchorFactInsertRow } from './types.js';
+
+import { describe, expect, it } from 'vitest';
+
+import { deriveExternalLinkSummaryRows } from './derive-external-link-summary-rows.js';
+
+/**
+ * Builds a minimal {@link AnchorFactInsertRow} with sensible defaults,
+ * overridable per test.
+ * @param overrides - Fields to override.
+ * @returns The constructed row.
+ */
+function makeFact(overrides: Partial): AnchorFactInsertRow {
+ return {
+ source_page_id: 1,
+ dest_page_id: 100,
+ source_url_sort_key: 'https://example.com/page',
+ dest_url_sort_key: 'https://ads.example.com',
+ status: 200,
+ status_sort_key: 200,
+ status_desc_key: -200,
+ count: 1,
+ is_broken: 0,
+ is_external_link: 1,
+ ...overrides,
+ };
+}
+
+describe('deriveExternalLinkSummaryRows', () => {
+ it('returns an empty array when there are no external-link facts', () => {
+ const facts = [makeFact({ is_external_link: 0 })];
+ expect(deriveExternalLinkSummaryRows(facts)).toEqual([]);
+ });
+
+ it('excludes broken (non-external) facts from the summary', () => {
+ const facts = [
+ makeFact({ source_page_id: 1, is_external_link: 0, is_broken: 1 }),
+ makeFact({ source_page_id: 2, is_external_link: 1 }),
+ ];
+ expect(deriveExternalLinkSummaryRows(facts)).toEqual([
+ {
+ dest_page_id: 100,
+ dest_url: 'https://ads.example.com',
+ status: 200,
+ referrer_count: 1,
+ },
+ ]);
+ });
+
+ it('counts referrer_count as the number of distinct-source edge rows sharing a destination', () => {
+ const facts = [
+ makeFact({ source_page_id: 1 }),
+ makeFact({ source_page_id: 2 }),
+ makeFact({ source_page_id: 3 }),
+ ];
+ const [summary] = deriveExternalLinkSummaryRows(facts);
+ expect(summary).toMatchObject({ dest_page_id: 100, referrer_count: 3 });
+ });
+
+ it('does not inflate referrer_count using the edge-level count column (duplicate anchors already collapsed upstream)', () => {
+ const facts = [makeFact({ source_page_id: 1, count: 5 })];
+ const [summary] = deriveExternalLinkSummaryRows(facts);
+ expect(summary).toMatchObject({ referrer_count: 1 });
+ });
+
+ it('produces one summary row per unique dest_page_id', () => {
+ const facts = [
+ makeFact({
+ source_page_id: 1,
+ dest_page_id: 100,
+ dest_url_sort_key: 'https://ads.example.com',
+ }),
+ makeFact({
+ source_page_id: 1,
+ dest_page_id: 200,
+ dest_url_sort_key: 'https://tracking.example.com',
+ status: 404,
+ }),
+ ];
+ const summaries = deriveExternalLinkSummaryRows(facts);
+ expect(summaries).toHaveLength(2);
+ expect(summaries).toEqual(
+ expect.arrayContaining([
+ {
+ dest_page_id: 100,
+ dest_url: 'https://ads.example.com',
+ status: 200,
+ referrer_count: 1,
+ },
+ {
+ dest_page_id: 200,
+ dest_url: 'https://tracking.example.com',
+ status: 404,
+ referrer_count: 1,
+ },
+ ]),
+ );
+ });
+});
diff --git a/packages/@nitpicker/query/src/viewer-read-model/derive-external-link-summary-rows.ts b/packages/@nitpicker/query/src/viewer-read-model/derive-external-link-summary-rows.ts
new file mode 100644
index 00000000..a4af50e0
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-read-model/derive-external-link-summary-rows.ts
@@ -0,0 +1,37 @@
+import type { AnchorFactInsertRow, ExternalLinkInsertRow } from './types.js';
+
+/**
+ * Derives `viewer_external_links` rows from already-computed
+ * {@link AnchorFactInsertRow} rows — no `anchors` scan of its own.
+ *
+ * `AnchorFactInsertRow` is already deduplicated one row per unique
+ * `(source_page_id, dest_page_id)` pair, so the number of `is_external_link`
+ * rows sharing a `dest_page_id` IS the distinct-referrer count — equivalent
+ * to `COUNT(DISTINCT source.id)` over the raw `anchors` table, but computed
+ * by counting already-grouped rows instead of a second aggregation pass.
+ * @param anchorFacts - The full `viewer_anchor_facts` row set for this
+ * build (as computed by `computeAnchorFactRows`).
+ * @returns One row per unique external destination.
+ */
+export function deriveExternalLinkSummaryRows(
+ anchorFacts: readonly AnchorFactInsertRow[],
+): ExternalLinkInsertRow[] {
+ const summaries = new Map();
+ for (const fact of anchorFacts) {
+ if (!fact.is_external_link) {
+ continue;
+ }
+ const existing = summaries.get(fact.dest_page_id);
+ if (existing) {
+ existing.referrer_count += 1;
+ } else {
+ summaries.set(fact.dest_page_id, {
+ dest_page_id: fact.dest_page_id,
+ dest_url: fact.dest_url_sort_key,
+ status: fact.status,
+ referrer_count: 1,
+ });
+ }
+ }
+ return [...summaries.values()];
+}
diff --git a/packages/@nitpicker/query/src/viewer-read-model/drop-viewer-read-model-tables.spec.ts b/packages/@nitpicker/query/src/viewer-read-model/drop-viewer-read-model-tables.spec.ts
index f873e94e..75fa4b37 100644
--- a/packages/@nitpicker/query/src/viewer-read-model/drop-viewer-read-model-tables.spec.ts
+++ b/packages/@nitpicker/query/src/viewer-read-model/drop-viewer-read-model-tables.spec.ts
@@ -37,7 +37,7 @@ describe('dropViewerReadModelTables', () => {
).resolves.toBeUndefined();
});
- it('drops all 8 tables after they were created', async () => {
+ it('drops all 9 tables after they were created', async () => {
const knex = archive.getKnex();
await knex.transaction((trx) => createViewerReadModelTables(trx));
for (const table of [
@@ -49,6 +49,7 @@ describe('dropViewerReadModelTables', () => {
'viewer_directory_nodes',
'viewer_directory_pages',
'viewer_external_links',
+ 'viewer_anchor_facts',
]) {
expect(await knex.schema.hasTable(table)).toBe(true);
}
@@ -63,6 +64,7 @@ describe('dropViewerReadModelTables', () => {
'viewer_directory_nodes',
'viewer_directory_pages',
'viewer_external_links',
+ 'viewer_anchor_facts',
]) {
expect(await knex.schema.hasTable(table)).toBe(false);
}
diff --git a/packages/@nitpicker/query/src/viewer-read-model/drop-viewer-read-model-tables.ts b/packages/@nitpicker/query/src/viewer-read-model/drop-viewer-read-model-tables.ts
index 5f665079..60666545 100644
--- a/packages/@nitpicker/query/src/viewer-read-model/drop-viewer-read-model-tables.ts
+++ b/packages/@nitpicker/query/src/viewer-read-model/drop-viewer-read-model-tables.ts
@@ -1,16 +1,17 @@
import type { Knex } from 'knex';
/**
- * Drops all 8 viewer-read-model tables if present, against the given
+ * Drops all 9 viewer-read-model tables if present, against the given
* connection. Shared by `buildViewerReadModel` (which drops before
* recreating, inside its own rebuild transaction) and
- * `dropViewerReadModel` (which drops with no recreate), so the 8-table
+ * `dropViewerReadModel` (which drops with no recreate), so the 9-table
* list only needs to be kept in sync with `createViewerReadModelTables`
* in one place.
* @param trx - An open Knex transaction (a plain `Knex` instance also
* works, e.g. in tests).
*/
export async function dropViewerReadModelTables(trx: Knex): Promise {
+ await trx.schema.dropTableIfExists('viewer_anchor_facts');
await trx.schema.dropTableIfExists('viewer_external_links');
await trx.schema.dropTableIfExists('viewer_directory_pages');
await trx.schema.dropTableIfExists('viewer_directory_nodes');
diff --git a/packages/@nitpicker/query/src/viewer-read-model/null-status-sentinel.ts b/packages/@nitpicker/query/src/viewer-read-model/null-status-sentinel.ts
new file mode 100644
index 00000000..32bb96c3
--- /dev/null
+++ b/packages/@nitpicker/query/src/viewer-read-model/null-status-sentinel.ts
@@ -0,0 +1,26 @@
+/**
+ * Sentinel `status_sort_key` value substituted for `null` status (errored /
+ * not-yet-classified rows, or destinations never fetched). Chosen smaller
+ * than any real HTTP status code (100-599) so unknown-status rows keep
+ * sorting first in ascending order — matching the legacy write-model
+ * queries' prior behavior of ordering directly on the nullable `status`
+ * column, where SQLite treats `NULL` as smaller than any value.
+ *
+ * Deliberately distinct from `-1`, which `Database.resetFailedPages` already
+ * uses as the "hard failure" HTTP status sentinel (see that function's docs)
+ * — reusing `-1` here would conflate two different populations of rows in
+ * `status_sort_key` ordering and in any future `status = -1` equality filter.
+ *
+ * Keyset cursor comparisons need a NEVER-`null` sort-key column: SQL's
+ * three-valued logic makes `NULL > x` / `NULL < x` always evaluate to
+ * `NULL` (never true), which would silently break tuple comparisons like
+ * `(status_sort_key, url_sort_key, page_id) > (?, ?, ?)` for rows whose
+ * status is unknown. Substituting a sentinel keeps every row on this column
+ * strictly orderable.
+ *
+ * Shared by `viewer_pages` (`build-viewer-read-model.ts`) and
+ * `viewer_anchor_facts` (`compute-anchor-fact-rows.ts`) so the same
+ * status-ordering convention holds across both keyset-paginated read
+ * models.
+ */
+export const NULL_STATUS_SENTINEL = -32_768;
diff --git a/packages/@nitpicker/query/src/viewer-read-model/types.ts b/packages/@nitpicker/query/src/viewer-read-model/types.ts
index 954c3b7d..4aef84e3 100644
--- a/packages/@nitpicker/query/src/viewer-read-model/types.ts
+++ b/packages/@nitpicker/query/src/viewer-read-model/types.ts
@@ -161,7 +161,8 @@ export interface DirectoryTreeBuildResult {
/**
* One row to insert into `viewer_external_links`, one per unique canonical
* (redirect-resolved) external destination. Produced by
- * `computeExternalLinkRows`.
+ * `deriveExternalLinkSummaryRows` from the already-computed
+ * {@link AnchorFactInsertRow} set — no separate `anchors` scan.
*/
export interface ExternalLinkInsertRow {
/** `COALESCE(canonical.id, dest.id)` — the canonical destination's `pages.id`. */
@@ -171,10 +172,55 @@ export interface ExternalLinkInsertRow {
/** `COALESCE(canonical.status, dest.status)` — the canonical destination's HTTP status, or `null` if unknown. */
status: number | null;
/**
- * `COUNT(DISTINCT source.id)` — the number of distinct internal pages
- * linking to this destination. Must stay in the same counting grain as
- * `getPageDetail.inboundLinks` (see that function's docs, #71) —
- * multiple anchors from the same page count once.
+ * The number of distinct internal pages linking to this destination —
+ * the count of {@link AnchorFactInsertRow} rows sharing this
+ * `dest_page_id`, since those rows are already deduplicated one-per-
+ * `(source_page_id, dest_page_id)` pair. Must stay in the same counting
+ * grain as `getPageDetail.inboundLinks` (see that function's docs, #71)
+ * — multiple anchors from the same page count once.
*/
referrer_count: number;
}
+
+/**
+ * One row to insert into `viewer_anchor_facts`, one per unique
+ * `(source_page_id, dest_page_id)` pair — duplicate anchor observations
+ * between the same pair collapse into a single row via `count`. Produced by
+ * `computeAnchorFactRows`.
+ */
+export interface AnchorFactInsertRow {
+ /** `anchors.pageId` — the referring page's `pages.id`. */
+ source_page_id: number;
+ /** `COALESCE(canonical.id, dest.id)` — the canonical destination's `pages.id`. */
+ dest_page_id: number;
+ /**
+ * The referring page's URL, verbatim — copied at build time so indexed
+ * `ORDER BY`/keyset comparisons don't need a pre-join, the same
+ * rationale as `viewer_pages.url_sort_key`.
+ */
+ source_url_sort_key: string;
+ /** `COALESCE(canonical.url, dest.url)`, verbatim — same rationale as {@link source_url_sort_key}. */
+ dest_url_sort_key: string;
+ /** `COALESCE(canonical.status, dest.status)` — the canonical destination's HTTP status, or `null` if unknown. */
+ status: number | null;
+ /** `status`, or `NULL_STATUS_SENTINEL` when `status` is `null` — see that constant's docs. */
+ status_sort_key: number;
+ /**
+ * The negation of {@link status_sort_key} — walking this column
+ * ascending yields `status desc` display order while keeping the
+ * `source_url_sort_key`/`edge_id` tie-breakers ascending too, the same
+ * `viewer_pages.status_desc_key` rationale (a row-value keyset tuple
+ * comparison can't mix per-column directions).
+ */
+ status_desc_key: number;
+ /** Number of raw anchor observations collapsed into this `(source_page_id, dest_page_id)` row. */
+ count: number;
+ /** `1` iff the canonical destination's status is `404` (see `list-links.ts`'s broken-link scope note — 403/5xx/unknown never count). */
+ is_broken: number;
+ /**
+ * `1` iff the canonical destination is external. Not indexed — consumed
+ * only by `deriveExternalLinkSummaryRows`'s in-memory pass at build
+ * time, never by an indexed read query.
+ */
+ is_external_link: number;
+}
diff --git a/packages/@nitpicker/query/src/viewer-read-model/viewer-read-model-schema-version.ts b/packages/@nitpicker/query/src/viewer-read-model/viewer-read-model-schema-version.ts
index 978f7a00..19fee1bc 100644
--- a/packages/@nitpicker/query/src/viewer-read-model/viewer-read-model-schema-version.ts
+++ b/packages/@nitpicker/query/src/viewer-read-model/viewer-read-model-schema-version.ts
@@ -7,4 +7,4 @@
* `viewer_read_model_meta.schema_version` to decide whether a rebuild is
* needed.
*/
-export const VIEWER_READ_MODEL_SCHEMA_VERSION = 5;
+export const VIEWER_READ_MODEL_SCHEMA_VERSION = 6;
diff --git a/packages/@nitpicker/viewer/src/routes/register-links-route.spec.ts b/packages/@nitpicker/viewer/src/routes/register-links-route.spec.ts
index e9d55294..ae351bea 100644
--- a/packages/@nitpicker/viewer/src/routes/register-links-route.spec.ts
+++ b/packages/@nitpicker/viewer/src/routes/register-links-route.spec.ts
@@ -120,6 +120,12 @@ async function buildFixture(workingDir: string, withReadModel: boolean) {
title: null,
textContent: 'Ad sidebar',
},
+ {
+ href: parseUrl('https://example.com/broken')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Broken link',
+ },
],
imageList: [],
isSkipped: false,
@@ -140,6 +146,22 @@ async function buildFixture(workingDir: string, withReadModel: boolean) {
imageList: [],
isSkipped: false,
});
+ await archive.setPage({
+ url: parseUrl('https://example.com/broken')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 404,
+ statusText: 'Not Found',
+ contentType: 'text/html',
+ contentLength: 0,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
if (withReadModel) {
await buildViewerReadModel(archive);
@@ -221,3 +243,228 @@ describe('registerLinksRoute — /api/links?type=external (integration)', () =>
});
});
});
+
+describe('registerLinksRoute — /api/links?type=broken (integration)', () => {
+ describe('fast path (viewer_anchor_facts read model built)', () => {
+ const workingDir = path.resolve(
+ __dirname,
+ '__test_fixtures_register_links_route_broken_fast__',
+ );
+ let fixture: Awaited>;
+
+ beforeAll(async () => {
+ fixture = await buildFixture(workingDir, true);
+ });
+
+ afterAll(async () => {
+ await fixture.manager.closeAll();
+ const { rmSync } = await import('node:fs');
+ rmSync(workingDir, { recursive: true, force: true });
+ });
+
+ it('returns the broken-link shape with a nextCursor contract', async () => {
+ const res = await fixture.app.request('/api/links?type=broken');
+ const body = (await res.json()) as {
+ items: { sourceUrl: string; destUrl: string; status: number | null }[];
+ total: number;
+ nextCursor: string | null;
+ prevCursor: string | null;
+ };
+ expect(body.total).toBe(1);
+ expect(body.items).toEqual([
+ {
+ sourceUrl: 'https://example.com/page-b',
+ destUrl: 'https://example.com/broken',
+ status: 404,
+ isExternal: false,
+ textContent: null,
+ },
+ ]);
+ expect(body.nextCursor).toBeNull();
+ expect(body.prevCursor).toBeNull();
+ });
+
+ it('forces the legacy fallback when urlPattern is set, since no single index covers source-OR-dest matching', async () => {
+ const res = await fixture.app.request(
+ `/api/links?type=broken&urlPattern=${encodeURIComponent('%page-b%')}`,
+ );
+ const body = (await res.json()) as {
+ items: { sourceUrl: string }[];
+ total: number;
+ };
+ expect(body.total).toBe(1);
+ expect(body.items[0]!.sourceUrl).toBe('https://example.com/page-b');
+ });
+ });
+
+ describe('fast path — sortBy outside the read model’s narrower union', () => {
+ const workingDir = path.resolve(
+ __dirname,
+ '__test_fixtures_register_links_route_broken_unsupported_sort__',
+ );
+ let fixture: Awaited>;
+
+ beforeAll(async () => {
+ const { mkdirSync } = await import('node:fs');
+ mkdirSync(workingDir, { recursive: true });
+ const archive = await Archive.create({
+ filePath: path.resolve(workingDir, 'fixture.nitpicker'),
+ cwd: workingDir,
+ });
+ await archive.setConfig(BASE_CONFIG);
+
+ // `s1`'s broken destination is external, `s2`'s is internal.
+ // Sorting by `sourceUrl` (the fast path's silent fallback if the
+ // unsupported-sort guard were missing) would place `s1` before
+ // `s2` (alphabetical). Sorting by `isExternal` ascending (only
+ // `listLinks`, the legacy path, supports this) places the
+ // internal destination (`s2`) first instead — a result only
+ // reachable by actually forcing the legacy fallback.
+ await archive.setPage({
+ url: parseUrl('https://example.com/s1')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [
+ {
+ href: parseUrl('https://ext.example.com/e1')!,
+ isExternal: true,
+ title: null,
+ textContent: 'External broken',
+ },
+ ],
+ imageList: [],
+ isSkipped: false,
+ });
+ await archive.setPage({
+ url: parseUrl('https://example.com/s2')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 200,
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 100,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [
+ {
+ href: parseUrl('https://example.com/i1')!,
+ isExternal: false,
+ title: null,
+ textContent: 'Internal broken',
+ },
+ ],
+ imageList: [],
+ isSkipped: false,
+ });
+ await archive.setPage({
+ url: parseUrl('https://ext.example.com/e1')!,
+ redirectPaths: [],
+ isExternal: true,
+ isTarget: false,
+ status: 404,
+ statusText: 'Not Found',
+ contentType: 'text/html',
+ contentLength: 0,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+ await archive.setPage({
+ url: parseUrl('https://example.com/i1')!,
+ redirectPaths: [],
+ isExternal: false,
+ isTarget: true,
+ status: 404,
+ statusText: 'Not Found',
+ contentType: 'text/html',
+ contentLength: 0,
+ responseHeaders: {},
+ html: '',
+ meta: META,
+ anchorList: [],
+ imageList: [],
+ isSkipped: false,
+ });
+
+ await buildViewerReadModel(archive);
+
+ const manager = new ArchiveManager();
+ const { archiveId, mode } = await manager.open(archive.tmpDir);
+ const app = createApp({
+ context: {
+ manager,
+ archiveId,
+ filePath: archive.tmpDir,
+ mode,
+ crawlerLockHolder: null,
+ },
+ publicDir: '/tmp/no-such-dir-register-links-route-spec',
+ });
+ fixture = { app, archive, manager };
+ });
+
+ afterAll(async () => {
+ await fixture.manager.closeAll();
+ const { rmSync } = await import('node:fs');
+ rmSync(workingDir, { recursive: true, force: true });
+ });
+
+ it('forces the legacy fallback for sortBy=isExternal, which viewer_anchor_facts has no index for', async () => {
+ const res = await fixture.app.request(
+ '/api/links?type=broken&sortBy=isExternal&sortOrder=asc',
+ );
+ const body = (await res.json()) as { items: { sourceUrl: string }[] };
+ expect(body.items.map((item) => item.sourceUrl)).toEqual([
+ 'https://example.com/s2',
+ 'https://example.com/s1',
+ ]);
+ });
+ });
+
+ describe('legacy fallback path (no read model built)', () => {
+ const workingDir = path.resolve(
+ __dirname,
+ '__test_fixtures_register_links_route_broken_legacy__',
+ );
+ let fixture: Awaited>;
+
+ beforeAll(async () => {
+ fixture = await buildFixture(workingDir, false);
+ });
+
+ afterAll(async () => {
+ await fixture.manager.closeAll();
+ const { rmSync } = await import('node:fs');
+ rmSync(workingDir, { recursive: true, force: true });
+ });
+
+ it('returns the same broken-link shape via the legacy live query, with an offset-string nextCursor', async () => {
+ const res = await fixture.app.request('/api/links?type=broken');
+ const body = (await res.json()) as {
+ items: { sourceUrl: string; destUrl: string; status: number | null }[];
+ total: number;
+ nextCursor: string | null;
+ };
+ expect(body.total).toBe(1);
+ expect(body.items[0]).toMatchObject({
+ sourceUrl: 'https://example.com/page-b',
+ destUrl: 'https://example.com/broken',
+ status: 404,
+ });
+ expect(body.nextCursor).toBeNull();
+ });
+ });
+});
diff --git a/packages/@nitpicker/viewer/src/routes/register-links-route.ts b/packages/@nitpicker/viewer/src/routes/register-links-route.ts
index e06d3c19..84824344 100644
--- a/packages/@nitpicker/viewer/src/routes/register-links-route.ts
+++ b/packages/@nitpicker/viewer/src/routes/register-links-route.ts
@@ -5,14 +5,34 @@ import {
isViewerReadModelCurrent,
listExternalLinks,
listLinks,
+ listViewerBrokenLinks,
listViewerExternalLinks,
} from '@nitpicker/query';
+import { buildLegacyPagesCursors } from '../query-params/build-legacy-pages-cursors.js';
+import { parseLegacyPagesCursor } from '../query-params/parse-legacy-pages-cursor.js';
import { toNumber } from '../query-params/to-number.js';
/** Valid `type` values for the links route. */
const VALID_LINK_TYPES = ['broken', 'external'] as const;
+/** Default page size, matching `listLinks`/`listViewerBrokenLinks`'s own default. */
+const DEFAULT_LIMIT = 100;
+
+/**
+ * `sortBy` values `listViewerBrokenLinks` supports — a strict subset of
+ * `listLinks`'s 5 (`sourceUrl`/`destUrl`/`status`/`isExternal`/
+ * `textContent`), since `viewer_anchor_facts` has no index on
+ * `is_external_link` and stores no anchor text at all (see
+ * `list-viewer-broken-links.ts`'s docs). A request for `isExternal`/
+ * `textContent` must force the legacy fallback rather than silently
+ * falling through `getAnchorFactsSortSpec`'s `sourceUrl` default — a
+ * bookmarked/shared `?sortBy=isExternal` URL must sort the same way
+ * whether or not the read model happens to be current, not silently
+ * change order depending on internal cache state.
+ */
+const BROKEN_LINKS_FAST_PATH_SORT_KEYS = new Set(['sourceUrl', 'destUrl', 'status']);
+
/**
* Registers `GET /api/links?type=broken|external` — link analysis.
*
@@ -27,22 +47,38 @@ const VALID_LINK_TYPES = ['broken', 'external'] as const;
* `sourceUrl`/`isExternal`/`textContent` sort keys, an added
* `referrerCount` sort key).
*
- * `external` dispatches to one of two backends per request, the same
- * two-layer pattern `register-pages-route.ts` uses for `/api/pages`:
+ * Both `external` and `broken` dispatch to one of two backends per request,
+ * the same two-layer pattern `register-pages-route.ts` uses for
+ * `/api/pages`:
*
- * - `listViewerExternalLinks` (the `viewer_external_links` read-model fast
- * path) when the read model is built and current. Unlike `/api/pages`,
- * there is no filter that forces a legacy fallback: `urlPattern`/`status`
- * both map directly onto `viewer_external_links` columns.
- * - `listExternalLinks` (the legacy live `anchors` JOIN + `GROUP BY` query)
- * otherwise — covers archives predating the read model. Both share the
- * same options/response shape, so callers see no difference beyond speed.
+ * - `external`: `listViewerExternalLinks` (the `viewer_external_links`
+ * read-model fast path) when the read model is current — no filter forces
+ * a legacy fallback here, since `urlPattern`/`status` both map directly
+ * onto `viewer_external_links` columns. Otherwise `listExternalLinks`
+ * (the legacy live `anchors` JOIN + `GROUP BY` query).
+ * - `broken`: `listViewerBrokenLinks` (the `viewer_anchor_facts` read-model
+ * fast path, cursor-paginated) when the read model is current AND none of
+ * `urlPattern`, `includeRedirectSources`, or an unsupported `sortBy`
+ * (`isExternal`/`textContent` — see `BROKEN_LINKS_FAST_PATH_SORT_KEYS`) is
+ * set — `urlPattern` matches source OR destination across two columns,
+ * which no single index can satisfy; `includeRedirectSources` has no
+ * read-model equivalent (`viewer_anchor_facts` only ever stores the
+ * canonical destination); and the fast path's narrower `sortBy` union
+ * means an unsupported value must force the legacy fallback rather than
+ * silently resolving to a different sort. Otherwise `listLinks` (legacy,
+ * anchor-scan-bound, offset-based). The
+ * legacy path's `cursor` is a plain decimal offset string (see
+ * `buildLegacyPagesCursors`), not the fast path's opaque keyset token, but
+ * exposes the same `nextCursor`-only contract so `useLinksInfinite`'s
+ * virtual scroll keeps paginating past the first page regardless of which
+ * backend served it.
* @param app - The Hono application.
* @param context - The opened archive context.
*/
export function registerLinksRoute(app: Hono, context: ArchiveContext): void {
app.get('/api/links', async (c) => {
- const type = c.req.query('type');
+ const q = c.req.query();
+ const type = q.type;
if (!type || !(VALID_LINK_TYPES as readonly string[]).includes(type)) {
return c.json(
{
@@ -52,11 +88,11 @@ export function registerLinksRoute(app: Hono, context: ArchiveContext): void {
);
}
const accessor = context.manager.get(context.archiveId);
- const limit = toNumber(c.req.query('limit'));
- const offset = toNumber(c.req.query('offset'));
- const urlPattern = c.req.query('urlPattern');
- const status = toNumber(c.req.query('status'));
- const sortOrder = c.req.query('sortOrder') as 'asc' | 'desc' | undefined;
+ const limit = toNumber(q.limit);
+ const offset = toNumber(q.offset);
+ const urlPattern = q.urlPattern;
+ const status = toNumber(q.status);
+ const sortOrder = q.sortOrder as 'asc' | 'desc' | undefined;
if (type === 'external') {
const params = {
@@ -64,11 +100,7 @@ export function registerLinksRoute(app: Hono, context: ArchiveContext): void {
offset,
urlPattern,
status,
- sortBy: c.req.query('sortBy') as
- | 'destUrl'
- | 'status'
- | 'referrerCount'
- | undefined,
+ sortBy: q.sortBy as 'destUrl' | 'status' | 'referrerCount' | undefined,
sortOrder,
};
const result = (await isViewerReadModelCurrent(accessor))
@@ -77,15 +109,36 @@ export function registerLinksRoute(app: Hono, context: ArchiveContext): void {
return c.json(result);
}
- const includeRedirectSources = c.req.query('includeRedirectSources') === 'true';
- const result = await listLinks(accessor, {
+ const includeRedirectSources = q.includeRedirectSources === 'true';
+ const usesUnsupportedSort = Boolean(
+ q.sortBy && !BROKEN_LINKS_FAST_PATH_SORT_KEYS.has(q.sortBy),
+ );
+ const usesWideTableOnlyFilter = Boolean(
+ urlPattern || includeRedirectSources || usesUnsupportedSort,
+ );
+ if (!usesWideTableOnlyFilter && (await isViewerReadModelCurrent(accessor))) {
+ const result = await listViewerBrokenLinks(accessor, {
+ limit,
+ offset,
+ status,
+ sortBy: q.sortBy as 'sourceUrl' | 'destUrl' | 'status' | undefined,
+ sortOrder,
+ cursor: q.cursor || undefined,
+ direction: q.direction === 'prev' ? 'prev' : undefined,
+ });
+ return c.json(result);
+ }
+
+ const legacyLimit = limit ?? DEFAULT_LIMIT;
+ const legacyOffset = parseLegacyPagesCursor(q.cursor, offset ?? 0);
+ const legacyResult = await listLinks(accessor, {
type: 'broken',
- limit,
- offset,
+ limit: legacyLimit,
+ offset: legacyOffset,
includeRedirectSources,
urlPattern,
status,
- sortBy: c.req.query('sortBy') as
+ sortBy: q.sortBy as
| 'sourceUrl'
| 'destUrl'
| 'status'
@@ -94,6 +147,12 @@ export function registerLinksRoute(app: Hono, context: ArchiveContext): void {
| undefined,
sortOrder,
});
- return c.json(result);
+ const { nextCursor, prevCursor } = buildLegacyPagesCursors({
+ offset: legacyOffset,
+ itemCount: legacyResult.items.length,
+ total: legacyResult.total,
+ limit: legacyLimit,
+ });
+ return c.json({ ...legacyResult, nextCursor, prevCursor });
});
}
diff --git a/packages/@nitpicker/viewer/web/api/use-links-infinite.ts b/packages/@nitpicker/viewer/web/api/use-links-infinite.ts
index 117ef4ff..1d55163d 100644
--- a/packages/@nitpicker/viewer/web/api/use-links-infinite.ts
+++ b/packages/@nitpicker/viewer/web/api/use-links-infinite.ts
@@ -1,10 +1,9 @@
import type { InfiniteQueryOptions } from './infinite-query-options.js';
-import type { LinkEntry } from '@nitpicker/query';
+import type { CursorPaginatedLinkList, LinkEntry } from '@nitpicker/query';
import { useInfiniteQuery } from '@tanstack/react-query';
import { apiGet } from './api-client.js';
-import { getNextOffset } from './get-next-offset.js';
import { PAGE_SIZE } from './page-size.js';
/**
@@ -32,16 +31,15 @@ export interface LinksFilter {
sortOrder?: string;
}
-/** Paginated link analysis response shape. */
-interface LinksPage {
- /** Rows for this page. */
- items: LinkRow[];
- /** Total matching rows. */
- total: number;
-}
-
/**
- * Infinite-scrolling broken-link analysis.
+ * Infinite-scrolling broken-link analysis. Fetches `PAGE_SIZE` rows per
+ * request and advances via the server-issued `nextCursor` (keyset
+ * pagination) rather than a growing `offset` — the same contract
+ * `usePagesInfinite` uses for `/api/pages`. `/api/links?type=broken` serves
+ * this from the `viewer_anchor_facts` read model when available, falling
+ * back to the legacy anchor-scan path (whose `nextCursor` is a plain
+ * offset-as-string, per `buildLegacyPagesCursors`) otherwise — this hook
+ * never needs to know which backend served a given page.
* @param type - The link analysis type.
* @param filter
* @param options - Optional flags (`enabled`).
@@ -54,16 +52,15 @@ export function useLinksInfinite(
) {
return useInfiniteQuery({
queryKey: ['links', type, filter],
- initialPageParam: 0,
+ initialPageParam: null as string | null,
queryFn: ({ pageParam }) =>
- apiGet('/api/links', {
+ apiGet('/api/links', {
type,
...filter,
limit: PAGE_SIZE,
- offset: pageParam,
+ cursor: pageParam ?? undefined,
}),
- getNextPageParam: (lastPage, _allPages, lastPageParam) =>
- getNextOffset(lastPage, lastPageParam),
+ getNextPageParam: (lastPage) => lastPage.nextCursor ?? undefined,
enabled: options?.enabled ?? true,
});
}
diff --git a/scripts/bench-viewer-anchor-facts.mjs b/scripts/bench-viewer-anchor-facts.mjs
new file mode 100644
index 00000000..1c98c8a6
--- /dev/null
+++ b/scripts/bench-viewer-anchor-facts.mjs
@@ -0,0 +1,310 @@
+#!/usr/bin/env node
+/**
+ * Benchmarks `/api/links?type=broken`'s `viewer_anchor_facts` read-model
+ * fast path (issue #114) on a synthetic archive with hundreds of thousands
+ * of anchor records — no real customer archive is ever read or referenced.
+ *
+ * Records, per `docs/viewer-implementation-plan.md`'s Benchmark Contract:
+ *
+ * - page/anchor row counts, read-model build time, added DB size
+ * - `/api/links?type=broken` cold (first request after the just-built
+ * DB) and warm p50/p95 timing, per sort combination
+ * - `EXPLAIN QUERY PLAN` for each combination's read query
+ *
+ * "Cold"/"warm" follow the same convention as
+ * `bench-viewer-pages-read-model.mjs` and CLAUDE.md's `getSummary` cache
+ * note.
+ *
+ * USAGE
+ * -----
+ *
+ * yarn build && node scripts/bench-viewer-anchor-facts.mjs
+ *
+ * Sizes (page counts) default to {50,000}; override via `BENCH_SIZES=…`
+ * (comma separated). Each page gets a fixed anchor fan-out, so the anchor
+ * (and viewer_anchor_facts) row count is roughly 8x the page count. Always
+ * disk-backed (never `:memory:`) — the whole point is measuring realistic
+ * cold-cache I/O, which an in-memory DB can't produce.
+ */
+
+/* eslint-disable no-console, import-x/no-extraneous-dependencies */
+
+import { mkdirSync, rmSync, statSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+import path from 'node:path';
+import process from 'node:process';
+
+import knex from 'knex';
+
+import { initSchema } from '../packages/@nitpicker/crawler/lib/archive/init-schema.js';
+import { LibsqlDialect } from '../packages/@nitpicker/crawler/lib/archive/libsql-dialect.js';
+import { listViewerBrokenLinks } from '../packages/@nitpicker/query/lib/list-viewer-broken-links.js';
+import { buildViewerReadModel } from '../packages/@nitpicker/query/lib/viewer-read-model/build-viewer-read-model.js';
+import { createApp } from '../packages/@nitpicker/viewer/lib/create-app.js';
+
+const SIZES = process.env.BENCH_SIZES
+ ? process.env.BENCH_SIZES.split(',').map((s) => Number(s.trim()))
+ : [50_000];
+
+/** Anchors created per page — tunes the anchor:page row-count ratio. */
+const ANCHOR_FANOUT = 8;
+
+/** Repeated warm requests per matrix entry, for p50/p95. */
+const WARM_ITERATIONS = 30;
+
+/**
+ * Sort combinations benchmarked per `broken-links-view.tsx`'s exposed sort
+ * controls (`sourceUrl`/`destUrl`/`status`, both directions).
+ */
+const MATRIX = [
+ { label: 'default (sourceUrl asc)', query: 'type=broken&limit=100' },
+ {
+ label: 'sourceUrl desc',
+ query: 'type=broken&limit=100&sortBy=sourceUrl&sortOrder=desc',
+ },
+ { label: 'destUrl asc', query: 'type=broken&limit=100&sortBy=destUrl&sortOrder=asc' },
+ { label: 'status asc', query: 'type=broken&limit=100&sortBy=status&sortOrder=asc' },
+ { label: 'status desc', query: 'type=broken&limit=100&sortBy=status&sortOrder=desc' },
+];
+
+/**
+ * Materialises a disk-backed synthetic archive DB with `n` `pages` rows and
+ * `n * ANCHOR_FANOUT` `anchors` rows. Status mix (200/301/404/500/null)
+ * matches `bench-viewer-pages-read-model.mjs`'s real-world skew. Anchor
+ * targets are deterministic offsets from the source page index, including
+ * one guaranteed duplicate target per page (exercises `count` dedup) and a
+ * regular hit rate on 404 destinations (exercises `is_broken`).
+ * @param {number} n - The number of page rows to insert.
+ * @returns {Promise<{db: import('knex').Knex, dbFilePath: string, cleanupDir: string, anchorRowCount: number}>}
+ * The seeded Knex instance, its backing file/dir (for size + cleanup),
+ * and the total anchor row count inserted.
+ */
+async function makeDb(n) {
+ const cleanupDir = path.join(
+ tmpdir(),
+ `nitpicker-bench-viewer-anchor-facts-${n}-${process.pid}`,
+ );
+ rmSync(cleanupDir, { recursive: true, force: true });
+ mkdirSync(cleanupDir, { recursive: true });
+ const dbFilePath = path.join(cleanupDir, 'db.sqlite');
+
+ const db = knex({
+ client: LibsqlDialect,
+ connection: { filename: dbFilePath },
+ useNullAsDefault: true,
+ });
+ await initSchema(db);
+
+ const STATUSES = [200, 200, 200, 200, 301, 404, 500, null];
+ const CHUNK = 100;
+
+ const pageRows = [];
+ for (let i = 0; i < n; i++) {
+ const padded = String(i).padStart(8, '0');
+ pageRows.push({
+ url: `https://example.com/page-${padded}`,
+ scraped: 1,
+ isTarget: 1,
+ isExternal: 0,
+ isSkipped: 0,
+ redirectDestId: null,
+ status: STATUSES[i % STATUSES.length],
+ statusText: 'OK',
+ contentType: 'text/html',
+ contentLength: 1000,
+ title: `Page ${padded}`,
+ source: 'crawled',
+ tag_count: 0,
+ jsonld_count: 0,
+ });
+ if (pageRows.length >= CHUNK) {
+ await db('pages').insert(pageRows);
+ pageRows.length = 0;
+ }
+ }
+ if (pageRows.length > 0) {
+ await db('pages').insert(pageRows);
+ }
+
+ const idRows = await db('pages').select('id').orderBy('id');
+ const idByIndex = idRows.map((row) => row.id);
+
+ let anchorRowCount = 0;
+ const anchorRows = [];
+ for (let i = 0; i < n; i++) {
+ const sourceId = idByIndex[i];
+ for (let k = 0; k < ANCHOR_FANOUT; k++) {
+ // A fixed prime-step walk spreads targets across the whole page
+ // set deterministically; k === ANCHOR_FANOUT - 1 repeats the k=0
+ // target on purpose, so every page has at least one duplicate
+ // (source,dest) pair collapsing into a viewer_anchor_facts row
+ // with count=2.
+ const step = k === ANCHOR_FANOUT - 1 ? 0 : k;
+ const targetIndex = (i + 1 + step * 97) % n;
+ anchorRows.push({ pageId: sourceId, hrefId: idByIndex[targetIndex] });
+ anchorRowCount++;
+ }
+ if (anchorRows.length >= CHUNK) {
+ await db('anchors').insert(anchorRows);
+ anchorRows.length = 0;
+ }
+ }
+ if (anchorRows.length > 0) {
+ await db('anchors').insert(anchorRows);
+ }
+
+ return { db, dbFilePath, cleanupDir, anchorRowCount };
+}
+
+/**
+ * Builds the viewer read model against the seeded DB, timing the build and
+ * measuring the DB file's size delta.
+ * @param {import('knex').Knex} db - The seeded Knex instance.
+ * @param {string} dbFilePath - The DB's backing file path (for `statSync`).
+ * @returns {Promise<{buildMs: number, sizeBeforeBytes: number, sizeAfterBytes: number, anchorFactRowCount: number}>}
+ * Build timing and size metrics.
+ */
+async function buildReadModel(db, dbFilePath) {
+ const sizeBeforeBytes = statSync(dbFilePath).size;
+ const accessorStub = { readOnly: false, getKnex: () => db };
+ const start = process.hrtime.bigint();
+ await buildViewerReadModel(accessorStub);
+ const buildMs = Number(process.hrtime.bigint() - start) / 1e6;
+ const sizeAfterBytes = statSync(dbFilePath).size;
+
+ const anchorFactRowCount = await db('viewer_anchor_facts').count('* as count');
+
+ return {
+ buildMs,
+ sizeBeforeBytes,
+ sizeAfterBytes,
+ anchorFactRowCount: Number(anchorFactRowCount[0]?.count ?? 0),
+ };
+}
+
+/**
+ * Runs `EXPLAIN QUERY PLAN` for one matrix entry's window query, built via
+ * `db.raw` against the same `is_broken = 1` + `ORDER BY` shape
+ * `list-viewer-broken-links.ts`'s `readAnchorFactsWindow` issues.
+ * @param {import('knex').Knex} db - The Knex instance.
+ * @param {string} orderByColumns - The `ORDER BY` column list (no `is_broken` — that's a fixed `WHERE`).
+ * @returns {Promise} One `|`-joined line of `EXPLAIN QUERY PLAN` detail rows.
+ */
+async function explainMatrixEntry(db, orderByColumns) {
+ const sql = `SELECT edge_id FROM viewer_anchor_facts WHERE is_broken = 1 ORDER BY ${orderByColumns} LIMIT 101`;
+ const plan = await db.raw(`EXPLAIN QUERY PLAN ${sql}`);
+ return plan.map((row) => row.detail).join(' | ');
+}
+
+/**
+ * Times `iterations` sequential HTTP round-trips through the real Hono app
+ * for one query string, returning p50/p95 in milliseconds.
+ * @param {import('hono').Hono} app - The app under test.
+ * @param {string} query - The `/api/links` query string (no leading `?`).
+ * @param {number} iterations - Number of warm requests to time.
+ * @returns {Promise<{p50: number, p95: number}>} Warm latency percentiles.
+ */
+async function timeWarmRequests(app, query, iterations) {
+ const timings = [];
+ for (let i = 0; i < iterations; i++) {
+ const start = process.hrtime.bigint();
+ const res = await app.request(`/api/links?${query}`);
+ await res.text();
+ timings.push(Number(process.hrtime.bigint() - start) / 1e6);
+ }
+ timings.sort((a, b) => a - b);
+ const p50 = timings[Math.floor(timings.length * 0.5)];
+ const p95 = timings[Math.floor(timings.length * 0.95)];
+ return { p50, p95 };
+}
+
+const EXPLAIN_ORDER_BY = {
+ 'default (sourceUrl asc)': 'source_url_sort_key, edge_id',
+ 'sourceUrl desc': 'source_url_sort_key DESC, edge_id DESC',
+ 'destUrl asc': 'dest_url_sort_key, edge_id',
+ 'status asc': 'status_sort_key, source_url_sort_key, edge_id',
+ 'status desc': 'status_desc_key, source_url_sort_key, edge_id',
+};
+
+/**
+ * Runs the full matrix (EXPLAIN + cold/warm HTTP timing) against one
+ * already-built read model, printing a results table and a copy-pasteable
+ * Markdown summary block.
+ * @param {import('knex').Knex} db - The Knex instance with a built read model.
+ * @param {number} n - The page-row count this DB was seeded with (for the report header).
+ */
+async function runMatrix(db, n) {
+ const accessorStub = { getKnex: () => db };
+ const app = createApp({
+ context: { archiveId: 'bench', manager: { get: () => accessorStub } },
+ publicDir: '/tmp/no-such-dir-bench',
+ });
+
+ const results = [];
+ for (const entry of MATRIX) {
+ const explain = await explainMatrixEntry(db, EXPLAIN_ORDER_BY[entry.label]);
+ const coldStart = process.hrtime.bigint();
+ const coldRes = await app.request(`/api/links?${entry.query}`);
+ await coldRes.text();
+ const coldMs = Number(process.hrtime.bigint() - coldStart) / 1e6;
+ const { p50, p95 } = await timeWarmRequests(app, entry.query, WARM_ITERATIONS);
+ results.push({ ...entry, coldMs, p50, p95, explain });
+ }
+
+ console.log('\n sort cold p50 p95');
+ for (const r of results) {
+ console.log(
+ ` ${r.label.padEnd(35)} ${`${r.coldMs.toFixed(1)}ms`.padStart(8)} ${`${r.p50.toFixed(1)}ms`.padStart(8)} ${`${r.p95.toFixed(1)}ms`.padStart(8)}`,
+ );
+ console.log(` EXPLAIN: ${r.explain}`);
+ }
+
+ console.log(
+ '\n### Markdown summary (paste into PR/ARCHITECTURE.md, no archive-identifying details)\n',
+ );
+ console.log(
+ `\`${n.toLocaleString()} synthetic pages\` — /api/links?type=broken viewer_anchor_facts fast path:\n`,
+ );
+ console.log('| sort | cold | warm p50 | warm p95 | EXPLAIN QUERY PLAN |');
+ console.log('| --- | --- | --- | --- | --- |');
+ for (const r of results) {
+ console.log(
+ `| ${r.label} | ${r.coldMs.toFixed(1)}ms | ${r.p50.toFixed(1)}ms | ${r.p95.toFixed(1)}ms | ${r.explain} |`,
+ );
+ }
+
+ // listViewerBrokenLinks function-level sanity check — confirms the HTTP
+ // numbers above aren't dominated by Hono/JSON overhead alone.
+ const directStart = process.hrtime.bigint();
+ await listViewerBrokenLinks(accessorStub, { limit: 100 });
+ const directMs = Number(process.hrtime.bigint() - directStart) / 1e6;
+ console.log(
+ `\nDirect \`listViewerBrokenLinks\` call (no HTTP layer), default sort: ${directMs.toFixed(1)}ms`,
+ );
+}
+
+for (const n of SIZES) {
+ console.log(
+ `\n══════════ ${n.toLocaleString()} pages (~${(n * ANCHOR_FANOUT).toLocaleString()} anchors) ══════════`,
+ );
+ const { db, dbFilePath, cleanupDir, anchorRowCount } = await makeDb(n);
+ try {
+ const seedSizeBytes = statSync(dbFilePath).size;
+ console.log(` seeded DB size: ${(seedSizeBytes / 1024 / 1024).toFixed(1)} MiB`);
+ console.log(` anchors inserted: ${anchorRowCount.toLocaleString()}`);
+
+ const { buildMs, sizeBeforeBytes, sizeAfterBytes, anchorFactRowCount } =
+ await buildReadModel(db, dbFilePath);
+ const addedBytes = sizeAfterBytes - sizeBeforeBytes;
+ console.log(` read-model build time: ${buildMs.toFixed(0)}ms`);
+ console.log(
+ ` read-model added DB size: ${(addedBytes / 1024 / 1024).toFixed(1)} MiB (viewer_anchor_facts rows after edge dedup: ${anchorFactRowCount.toLocaleString()})`,
+ );
+
+ await runMatrix(db, n);
+ } finally {
+ await db.destroy();
+ rmSync(cleanupDir, { recursive: true, force: true });
+ }
+}
+console.log('\nDone.');