fix: resolve backlinks through redirects (http/https merge, #71)#78
Merged
Conversation
…throughId getReferrersOfPage now counts an anchor pointing at a redirect source (e.g. http://x that 301s to https://x) as a referrer of the redirect's final destination, mirroring getPagesWithRels. redirectDestId is pre-flattened to the final dest, so COALESCE(target.redirectDestId, target.id) is a single hop. Also select target.url/id as through/throughId so the Page.getReferrers / getRequests fallbacks return the full Referrer shape (report's "[REDIRECTED FROM]" note works on this non-preloaded path too). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
getPageDetail.inboundLinks and listPageLinks.referrerCount now resolve incoming links through redirects, so links to a redirect source merge onto the canonical destination (#71) instead of splitting across the http/https pair. Same single-hop COALESCE semantics as crawler's redirectTable(). Outbound links intentionally stay raw (audit signal that a page links to a redirecting URL); documented inline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A page linking /redirect/start (which 301/302s to /redirect/dest) now shows up as a referrer of the final /redirect/dest, with through pointing at the redirect source. Without redirect-resolved referrers the destination's backlinks are empty. Uses the existing http->http chain (the mechanism is scheme-agnostic). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an ARCHITECTURE.md section explaining that incoming links resolve through redirects (single-hop COALESCE on pre-flattened redirectDestId), the read-path consistency across getPagesWithRels / getReferrersOfPage / getPageDetail / listPageLinks, and the intentional inbound/outbound asymmetry (outbound stays raw for audit visibility — do not "unify" it). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
概要
被リンク(incoming links / referrers)を 読み取り時に redirect 越しで解決し、
http://xとhttps://xのように redirect 元/宛先に分裂していた被リンクを canonical ページに合算する(#71)。正規化は行わず redirect 辺を保持したまま read 層で集約する。これまで report 経路(
getPagesWithRels)だけが redirect 越しに解決しており、viewer/mcp/cli 経路(getReferrersOfPage/getPageDetail/listPageLinks)は解決しておらず被リンクが分裂していた。本 PR で 4 経路すべてを同一セマンティクスに揃える。変更点
getReferrersOfPage:COALESCE(target.redirectDestId, target.id)の単一ホップで最終宛先に解決。through/throughId(アンカーが実際に指した URL = redirect 元)も返し、Page.getReferrers/getRequestsフォールバックが完全なReferrer形状を返すよう整形(report の[REDIRECTED FROM]注記が非プリロード経路でも機能)。getPageDetail.inboundLinks/listPageLinks.referrerCount: 同じ単一ホップ解決で redirect 越しに集約。テスト
database.spec/ queryget-page-detail.spec/list-page-links.spec: http/https の実 URL を使った DB レベルの合算検証(分裂しないこと、through/throughId、redirect 元側の被リンクが空になること)。page.spec: プリロード無しフォールバックのthrough/throughIdマッピング(ghost code 解消)。/redirect/start(301→302→dest)を指すページが最終/redirect/destの被リンクとして合算されることを crawl→archive→getReferrers で end-to-end 検証(機構は scheme 非依存)。ドキュメント
ARCHITECTURE.md に「被リンク/参照の redirect 透過解決(#71)」節を追加(単一ホップ解決の根拠、読み取り経路間の一貫性、inbound/outbound 非対称の設計意図)。
レビュー
/code-review xhigh→/qa-engineer→/product-managerを実施し、全 finding を反映(through/throughId 欠落の修正、フォールバックの ghost code テスト追加、E2E 追加、ドキュメント節追加)。Closes #71
🤖 Generated with Claude Code