diff --git a/SQL/Leetcode/Basic select/620. Not Boring Movies/gpt/Not_Boring_Movies_mysql.ipynb b/SQL/Leetcode/Basic select/620. Not Boring Movies/gpt/Not_Boring_Movies_mysql.ipynb new file mode 100644 index 00000000..7029ef0c --- /dev/null +++ b/SQL/Leetcode/Basic select/620. Not Boring Movies/gpt/Not_Boring_Movies_mysql.ipynb @@ -0,0 +1,178 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7b1caa6a", + "metadata": {}, + "source": [ + "# MySQL 8.0.40\n", + "\n", + "## 0) 前提\n", + "\n", + "* エンジン: **MySQL 8**\n", + "* 並び順: 本問は **仕様で降順指定あり**(`ORDER BY rating DESC`)\n", + "* `NOT IN` は NULL 罠のため回避(本問では不要)\n", + "* 判定は **ID 基準**、表示は仕様どおりの列名と順序\n", + "\n", + "## 1) 問題\n", + "\n", + "* `映画テーブル Cinema から、ID が奇数かつ description が \"boring\" ではない映画を抽出し、rating の降順で返せ。`\n", + "* 入力テーブル例: `Cinema(id, movie, description, rating)`\n", + "* 出力仕様: `id, movie, description, rating` を **rating 降順**で返す\n", + "\n", + "## 2) 最適解(単一クエリ)\n", + "\n", + "> 本問はウィンドウ不要。単純な条件抽出+降順ソートで 1 クエリ。\n", + "\n", + "```sql\n", + "SELECT\n", + " id,\n", + " movie,\n", + " description,\n", + " rating\n", + "FROM Cinema\n", + "WHERE (id % 2) = 1 -- 奇数IDのみ\n", + " AND description <> 'boring' -- 文字列一致で除外(NULL は自動的に除外される)\n", + "ORDER BY rating DESC; -- 仕様で降順指定\n", + "\n", + "Runtime 241 ms\n", + "Beats 34.15%\n", + "\n", + "```\n", + "\n", + "## 3) 代替解\n", + "\n", + "> 同義。奇数判定をビット演算にするだけ。実行計画・結果は同等。\n", + "\n", + "```sql\n", + "SELECT\n", + " id,\n", + " movie,\n", + " description,\n", + " rating\n", + "FROM Cinema\n", + "WHERE (id & 1) = 1\n", + " AND description <> 'boring'\n", + "ORDER BY rating DESC;\n", + "\n", + "Runtime 249 ms\n", + "Beats 25.20%\n", + "\n", + "```\n", + "\n", + "## 4) 要点解説\n", + "\n", + "* **奇数判定**: `id % 2 = 1` でも `(id & 1) = 1` でも可。整数主キーならどちらも安全。\n", + "* **\"boring\" 除外**: `<> 'boring'` は `description IS NULL` 行を含めない(`UNKNOWN` のため WHERE で落ちる)。NULL を含めたいなら `OR description IS NULL` を追加する。\n", + "* **順序**: 本テンプレでは「順不同を優先」とあるが、本問は仕様で **降順必須** のため `ORDER BY rating DESC` を入れる。\n", + "\n", + "## 5) 計算量(概算)\n", + "\n", + "* テーブルフルスキャン時: **O(N)**。`id`(PK)の演算は行ごと定数時間、`description <> 'boring'` も行ごと定数時間。\n", + "* インデックス: 本条件は選択性が低いので索引効果は限定的(`rating` で並び替えるため filesort が走りやすい)。\n", + "\n", + "## 6) 図解(Mermaid 超保守版)\n", + "\n", + "```mermaid\n", + "flowchart TD\n", + " A[入力 Cinema] --> B[条件抽出 id 奇数]\n", + " B --> C[\"条件抽出 description <> \\\"boring\\\"\"]\n", + " C --> D[rating 降順で並べ替え]\n", + " D --> E[出力 id, movie, description, rating]\n", + "```\n", + "\n", + "この問題の範囲(クエリだけで勝負、スキーマ変更なし)だと、ほぼ最短距離が書けています。差分 241ms vs 249ms は誤差レベルで、`%2` と `&1` の優劣は測定ノイズの域です。\n", + "それでも「もう少しだけ良くする/明確にする」観点で、実用的な改善ポイントを挙げます。\n", + "\n", + "## クエリだけでできる微調整\n", + "\n", + "1. **順序の決定性を高める(同率タイを安定化)**\n", + " rating が同じ映画がある場合、返却順が揺れます。審査系や再現性重視ならタイブレークを追加。\n", + "\n", + "```sql\n", + "SELECT id, movie, description, rating\n", + "FROM Cinema\n", + "WHERE (id & 1) = 1\n", + " AND description <> 'boring'\n", + "ORDER BY rating DESC, id DESC; -- 安定化\n", + "\n", + "Runtime 222 ms\n", + "Beats 67.10%\n", + "\n", + "```\n", + "\n", + "※速度影響はほぼ無し、可読性と再現性の向上。\n", + "\n", + "2. **NULL の扱いを明示**\n", + " 仕様どおり「boring ではない」を厳密に“NULL は含めない”と解釈するなら明示しておくと安心。\n", + "\n", + "```sql\n", + "... AND description IS NOT NULL\n", + " AND description <> 'boring'\n", + "```\n", + "\n", + "逆に **NULL も許可**したいなら `OR description IS NULL` を併記。\n", + "\n", + "3. **大文字小文字の扱いを固定**\n", + " 環境の照合順序で “Boring” などの扱いが変わります。区別したいならバイナリ比較に。\n", + "\n", + "```sql\n", + "... AND BINARY description <> 'boring'\n", + "-- もしくは\n", + "... AND description COLLATE utf8mb4_bin <> 'boring'\n", + "```\n", + "\n", + "> ここまでが LeetCode 的に現実的な「クエリ単体」改善。速度はほぼ変わらず、**結果の安定性・意図の明確さ**が主効果です。\n", + "\n", + "## スキーマ変更が許される現場向け最適化(参考)\n", + "\n", + "> テーブルが大きく、実運用で速度を突き詰めたい場合。\n", + "\n", + "* **filesort 回避用インデックス**(`ORDER BY rating DESC` をインデックス順で満たす)\n", + "\n", + " ```sql\n", + " CREATE INDEX idx_cinema_rating_desc ON Cinema (rating DESC, id);\n", + " ```\n", + "\n", + " *効果*: 並べ替えコストを大幅削減(ただし `description <> 'boring'` は残差条件で評価)。\n", + " *ポイント*: 取り出し列 `movie, description` は二次索引から PK 経由でルックアップされます(InnoDB 仕様)。\n", + "\n", + "* **関数インデックスで偶数/奇数を sargable に**\n", + "\n", + " ```sql\n", + " CREATE INDEX idx_cinema_odd_rating ON Cinema ( (id & 1), rating DESC );\n", + " -- そしてクエリは WHERE (id & 1) = 1 AND description <> 'boring'\n", + " ```\n", + "\n", + " *効果*: まず `(id & 1)=1` で範囲を半減 → そのまま `rating DESC` でインデックススキャンし、\n", + " `description <> 'boring'` はフィルタで落とす。大規模データで効きやすいです。\n", + "\n", + "* **description 側の選択性が高い場合**\n", + " “boring” が多い/少ないで有効度が変わりますが、**読み取り量を減らす**方向に張るなら:\n", + "\n", + " ```sql\n", + " CREATE INDEX idx_cinema_desc_rating ON Cinema (description, rating DESC);\n", + " ```\n", + "\n", + " *注意*: `<> 'boring'` はレンジ分割(`< 'boring'` と `> 'boring'`)になるため、\n", + " インデックス効用はデータ分布に強く依存します。実測&EXPLAIN で判断を。\n", + "\n", + "## まとめ\n", + "\n", + "* **クエリ単体では既に最適レベル**。実測 241ms と 249ms の差は誤差で、どちらも OK。\n", + "* 品質面の小改善は **タイブレーク追加**・**NULL/大文字小文字の扱いを明示**。\n", + "* 実運用で速度を上げるなら **`ORDER BY` 用の降順インデックス** と **(id & 1) の関数インデックス**が効きます。\n", + "\n", + "必要なら、あなたの想定データ量・分布を仮定して `EXPLAIN` の読み方とインデックス案をもう少し踏み込みで出します。\n", + "\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/SQL/Leetcode/Basic select/620. Not Boring Movies/gpt/Not_Boring_Movies_pandas.ipynb b/SQL/Leetcode/Basic select/620. Not Boring Movies/gpt/Not_Boring_Movies_pandas.ipynb new file mode 100644 index 00000000..3672a8ce --- /dev/null +++ b/SQL/Leetcode/Basic select/620. Not Boring Movies/gpt/Not_Boring_Movies_pandas.ipynb @@ -0,0 +1,214 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "040332a5", + "metadata": {}, + "source": [ + "# Pandas 2.2.2用\n", + "\n", + "## 0) 前提\n", + "\n", + "* 環境: **Python 3.10.15 / pandas 2.2.2**\n", + "* **指定シグネチャ厳守**(関数名・引数名・返却列・順序)\n", + "* I/O 禁止、不要な `print` や `sort_values` 禁止(並び替えは `nlargest` を使用)\n", + "\n", + "## 1) 問題\n", + "\n", + "* `Cinema から、ID が奇数かつ description が \"boring\" ではない映画を抽出し、rating の降順で返す。`\n", + "* 入力 DF: `Cinema(id: int, movie: str, description: str, rating: float)`\n", + "* 出力: 列は `id, movie, description, rating`。**rating 降順**(同率時の順序は任意)\n", + "\n", + "## 2) 実装(指定シグネチャ厳守)\n", + "\n", + "> 列最小化 → 条件抽出 → `nlargest` で降順(`sort_values` 禁止対応)\n", + "\n", + "```python\n", + "import pandas as pd\n", + "\n", + "def select_non_boring_odd_movies(cinema: pd.DataFrame) -> pd.DataFrame:\n", + " \"\"\"\n", + " Returns:\n", + " pd.DataFrame: 列名と順序は ['id', 'movie', 'description', 'rating']\n", + " rating の降順で返す(同率時の順序は任意)\n", + " \"\"\"\n", + " # 必要列のみ抽出(列最小化)\n", + " cols = ['id', 'movie', 'description', 'rating']\n", + " c = cinema.loc[:, cols]\n", + "\n", + " # 条件: 奇数ID かつ description が 'boring' ではなく(かつ非NULL)\n", + " # 仕様上 NULL を含めないため isna の否定を明示\n", + " mask = (c['id'] % 2 == 1) & c['description'].notna() & (c['description'] != 'boring')\n", + " kept = c.loc[mask]\n", + "\n", + " # 並び替え: sort_values 禁止のため nlargest を使用して降順を実現\n", + " # n == len(kept) を取れば rating DESC 全件ソートと同等\n", + " out = kept.nlargest(len(kept), columns='rating')\n", + "\n", + " # 返却: 仕様列・順序のまま\n", + " return out[['id', 'movie', 'description', 'rating']]\n", + "\n", + "Analyze Complexity\n", + "Runtime 276 ms\n", + "Beats 35.04%\n", + "Memory 67.14 MB\n", + "Beats 68.84%\n", + "\n", + "```\n", + "\n", + "## 3) アルゴリズム説明\n", + "\n", + "* 使用 API\n", + "\n", + " * ブールマスク: `Series %`, `Series.notna()`, 比較 `!=`\n", + " * 列最小化: `DataFrame.loc[:, cols]`\n", + " * 降順取得: `DataFrame.nlargest(n, 'rating')`(`sort_values` 非使用要件に対応)\n", + "* **NULL / 重複 / 型**\n", + "\n", + " * `description.notna()` により `NULL` を除外(仕様に合わせて明示)\n", + " * `rating` は数値列必須(float)。文字列混在の場合は事前に `to_numeric` を検討\n", + " * 主キー `id` 前提なので重複行は想定しない\n", + "\n", + "## 4) 計算量(概算)\n", + "\n", + "* フィルタ(ブールマスク): **O(N)**\n", + "* `nlargest(len(kept), 'rating')`: 内部的には選択アルゴリズム+部分ソートで **O(M log M)**(M は残件数)\n", + "\n", + " * 全件降順と同等のオーダーだが、`sort_values` を使わず要件を満たす手段として最小限\n", + "\n", + "## 5) 図解(Mermaid 超保守版)\n", + "\n", + "```mermaid\n", + "flowchart TD\n", + " A[入力 Cinema DF] --> B[列最小化 id,movie,description,rating]\n", + " B --> C[条件抽出 id 奇数 かつ description 非NULL かつ != 'boring']\n", + " C --> D[nlargest で rating 降順]\n", + " D --> E[出力 id,movie,description,rating]\n", + "```\n", + "\n", + "**Pandas のオーバーヘッド削減**と**コピー削減**でまだ詰められます。`sort_values` 禁止の前提は守りつつ、`nlargest` の対象を **Series** にしてインデックスで並べ替える・もしくは **NumPy だけで順位付け**するのが速いです。\n", + "\n", + "---\n", + "\n", + "## 1) 低リスク版(Pandas寄り・最小変更)\n", + "\n", + "ポイント\n", + "\n", + "* 中間の `c = cinema.loc[:, cols]` コピーを削除\n", + "* マスクは **NumPy 配列**で作成(`to_numpy()`)\n", + "* 並び替えは **Series.nlargest** でインデックスを取得 → 最後に列投影\n", + "\n", + "```python\n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "def select_non_boring_odd_movies(cinema: pd.DataFrame) -> pd.DataFrame:\n", + " \"\"\"\n", + " Returns:\n", + " pd.DataFrame: ['id', 'movie', 'description', 'rating'] を rating 降順で返す\n", + " \"\"\"\n", + " id_arr = cinema['id'].to_numpy()\n", + " desc_arr = cinema['description'].to_numpy()\n", + " # NULL 除外 + 'boring' 除外 + 奇数ID\n", + " mask = ((id_arr & 1) == 1) & (desc_arr == desc_arr) & (desc_arr != 'boring')\n", + "\n", + " # 抽出行の index を得る\n", + " idx = cinema.index[mask]\n", + " # rating の降順 index を Series.nlargest で取得(DataFrame.nlargest より軽い)\n", + " top_idx = cinema.loc[idx, 'rating'].nlargest(idx.size).index\n", + "\n", + " return cinema.loc[top_idx, ['id', 'movie', 'description', 'rating']]\n", + "\n", + "Analyze Complexity\n", + "Runtime 261 ms\n", + "Beats 65.91%\n", + "Memory 67.39 MB\n", + "Beats 28.60%\n", + "\n", + "```\n", + "\n", + "**ねらい**\n", + "\n", + "* `Series.nlargest` は対象列だけを扱うため、`DataFrame.nlargest` よりメモリアクセスが少なく、速くなりやすいです。\n", + "* 中間 DataFrame を作らないのでコピー削減(メモリ・CPU ともに軽くなる傾向)。\n", + "\n", + "---\n", + "\n", + "## 2) 攻めの最速版(NumPy 主導)\n", + "\n", + "ポイント\n", + "\n", + "* 並び替えを **`np.argsort`** に完全委譲(Pandas のインデクサ組み立てコストを最小化)\n", + "* 全行の `rating` を配列で一度読むだけ\n", + "\n", + "```python\n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "def select_non_boring_odd_movies(cinema: pd.DataFrame) -> pd.DataFrame:\n", + " \"\"\"\n", + " Returns:\n", + " pd.DataFrame: ['id', 'movie', 'description', 'rating'] を rating 降順で返す\n", + " \"\"\"\n", + " id_arr = cinema['id'].to_numpy()\n", + " desc_arr = cinema['description'].to_numpy()\n", + " rate_arr = cinema['rating'].to_numpy()\n", + "\n", + " mask = ((id_arr & 1) == 1) & (desc_arr == desc_arr) & (desc_arr != 'boring')\n", + " sel = np.flatnonzero(mask)\n", + "\n", + " # rating 降順の位置(選択部分のみ)を取得\n", + " order_in_sel = np.argsort(rate_arr[sel])[::-1]\n", + " row_pos = sel[order_in_sel]\n", + "\n", + " return cinema.iloc[row_pos, :][['id', 'movie', 'description', 'rating']]\n", + "\n", + "Analyze Complexity\n", + "Runtime 256 ms\n", + "Beats 76.61%\n", + "Memory 67.11 MB\n", + "Beats 68.84%\n", + "\n", + "```\n", + "\n", + "**ねらい**\n", + "\n", + "* `nlargest(len)` は実質「全件降順」と同義で、内部での選択+部分ソートとはいえコストが大きいことがあります。\n", + "* `np.argsort` は純配列上で高速に動き、**行位置配列→`iloc`** の流れが非常に軽いです。\n", + "\n", + "---\n", + "\n", + "## 3) 追加の細かな最適化ヒント\n", + "\n", + "* **dtype の見直し**\n", + "\n", + " * `id` を `int32`、`rating` を `float32` に落とせるならメモリ削減(CPU キャッシュ効率↑)。\n", + " * 文字列が多い場合は `pd.StringDtype()`(もしくは pyarrow backend があれば `string[pyarrow]`)でフットプリントを縮小。\n", + "* **列アクセスの一貫性**\n", + "\n", + " * 同じ列を複数回使うときは **一度配列化して再利用**(上記コードのように `to_numpy()` を 1 回だけ呼ぶ)。\n", + "* **条件の確定順序**\n", + "\n", + " * 選択性が高い条件(今回なら `id & 1` よりも `description` 判定の方が効くケースが多い)を先に評価しても、NumPy のブール演算は短絡しないため実行順で速度は大きく変わりません。配列化して一発で作る方が速いです。\n", + "\n", + "---\n", + "\n", + "## 4) 期待効果の目安\n", + "\n", + "* 低リスク版:中間コピー削減と `Series.nlargest` 採用で **10–25% 程度短縮**が見込めることが多いです。\n", + "* NumPy 版:データサイズや列数にもよりますが、**さらに数ms〜数十ms** 改善するケースがあります。\n", + "\n", + "> まずは **低リスク版** → 効果が物足りなければ **NumPy 版**、の順でお試しを。\n", + "> それでもまだ詰める必要があれば、dtype 最適化や前処理段階でのフィルタ(上流で奇数IDだけ渡す等)をご検討ください。\n", + "\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/SQL/Leetcode/Basic select/620. Not Boring Movies/gpt/Not_Boring_Movies_posgres.ipynb b/SQL/Leetcode/Basic select/620. Not Boring Movies/gpt/Not_Boring_Movies_posgres.ipynb new file mode 100644 index 00000000..c97fa31a --- /dev/null +++ b/SQL/Leetcode/Basic select/620. Not Boring Movies/gpt/Not_Boring_Movies_posgres.ipynb @@ -0,0 +1,196 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "43489eb0", + "metadata": {}, + "source": [ + "# PostgreSQL 16.6+\n", + "\n", + "## 0) 前提\n", + "\n", + "* エンジン: **PostgreSQL 16.6+**\n", + "* 並び順: 本問は **仕様で降順指定あり**(`ORDER BY rating DESC`)\n", + "* `NOT IN` 回避(本問では不要)\n", + "* 判定は **ID 基準**、表示は仕様どおり\n", + "\n", + "## 1) 問題\n", + "\n", + "* `映画テーブルから、ID が奇数かつ description が 'boring' ではない映画を抽出し、rating の降順で返せ。`\n", + "* 入力: `Cinema(id int, movie text/varchar, description text/varchar, rating numeric/real)`\n", + "* 出力: `id, movie, description, rating` を **rating 降順**で返す(同率は id で安定化)\n", + "\n", + "## 2) 最適解(単一クエリ)\n", + "\n", + "> 本問はウィンドウ不要。PostgreSQL では単純な条件抽出+降順ソートが最速・明快です。\n", + "\n", + "```sql\n", + "SELECT\n", + " id,\n", + " movie,\n", + " description,\n", + " rating\n", + "FROM cinema\n", + "WHERE (id % 2) = 1 -- 奇数ID\n", + " AND description IS NOT NULL\n", + " AND description <> 'boring' -- 'boring' を除外(NULL は含めない)\n", + "ORDER BY rating DESC, id DESC; -- 同率タイを id で安定化\n", + "\n", + "Runtime 173 ms\n", + "Beats 86.40%\n", + "\n", + "```\n", + "\n", + "### 備考(等価な書き方)\n", + "\n", + "* ビット演算でも可:`(id & 1) = 1`\n", + "* 大文字小文字を厳密一致に固定したい場合は `COLLATE \"C\"`(英数のみなら高速)\n", + " `AND description COLLATE \"C\" <> 'boring'`\n", + "\n", + "## 3) 要点解説\n", + "\n", + "* **奇数判定**: `id % 2 = 1` でも `(id & 1) = 1` でも同等。整数 PK ならどちらも安全。\n", + "* **NULL の扱い**: 仕様解釈を明確化するため `IS NOT NULL` を明示(NULL を含めたいなら `OR description IS NULL`)。\n", + "* **順序の安定化**: 出力の再現性を上げるため `ORDER BY rating DESC, id DESC` を推奨。\n", + "* **識別子の大文字**: PostgreSQL は未引用識別子を小文字化するため、テーブル作成時に `\"Cinema\"` と引用していない限り `FROM cinema` が正。作成時に引用しているなら `FROM \"Cinema\"` に合わせる。\n", + "\n", + "## 4) 計算量(概算)\n", + "\n", + "* フルスキャン前提で **O(N)**。`ORDER BY rating DESC` でソートが発生する場合 **O(N log N)**。\n", + "* 適切な索引で I/O を減らせる(下記)。\n", + "\n", + "### 実運用で効くインデックス(参考)\n", + "\n", + "> 本問のような “条件でざっくり絞って rating で降順” は **式インデックス** と **降順インデックス** が効きます。\n", + "\n", + "```sql\n", + "-- 1) 偶奇+並び替え最適化(式インデックス)\n", + "CREATE INDEX IF NOT EXISTS idx_cinema_odd_rating_desc\n", + "ON cinema ( (id & 1), rating DESC, id DESC );\n", + "\n", + "-- 2) 'boring' 除外を事前に省く(部分インデックス)\n", + "CREATE INDEX IF NOT EXISTS idx_cinema_not_boring_rating_desc\n", + "ON cinema (rating DESC, id DESC)\n", + "WHERE description IS NOT NULL AND description <> 'boring';\n", + "```\n", + "\n", + "*データ分布の目安*\n", + "\n", + "* “boring” が少数派 ⇒ ②(部分インデックス)が特に効く\n", + "* 全体から半分を取る偶奇判定 ⇒ ①でスキャン範囲を半減しつつ並び替えもカバー\n", + "\n", + "`EXPLAIN (ANALYZE, BUFFERS)` で実測し、使用されているインデックスとソート有無(`Sort Method` や `Index Only Scan`)を確認するのがベストです。\n", + "\n", + "## 5) 図解(Mermaid 超保守版)\n", + "\n", + "```mermaid\n", + "flowchart TD\n", + " A[入力 Cinema] --> B[条件 id 奇数]\n", + " B --> C[条件 description NOT NULL かつ <> boring]\n", + " C --> D[rating 降順 かつ id で安定化]\n", + " D --> E[出力 id, movie, description, rating]\n", + "```\n", + "\n", + "この規模の単純フィルタ+降順ソートで、**クエリ文だけ**ではほぼ頭打ちです。さらに詰めるなら “物理設計と実行計画” に踏み込みます。\n", + "\n", + "---\n", + "\n", + "## 伸びしろ(実運用向け)\n", + "\n", + "### 1) 部分+式+降順の “カバリング” インデックス\n", + "\n", + "対象行を事前に削り、`ORDER BY` も満たし、**Index Only Scan** を狙います。\n", + "\n", + "```sql\n", + "-- 奇数かつ boring ではない行だけを対象に、並び替え順で貼る\n", + "CREATE INDEX IF NOT EXISTS idx_cinema_hot\n", + "ON cinema (rating DESC, id DESC) INCLUDE (movie, description)\n", + "WHERE (id & 1) = 1\n", + " AND description IS NOT NULL\n", + " AND description <> 'boring';\n", + "```\n", + "\n", + "* `WHERE` で **走査対象そのものを縮小**\n", + "* `rating DESC, id DESC` で **ソート不要**(インデックス順に取り出し)\n", + "* `INCLUDE (movie, description)` で **Index Only Scan** 可(可視性マップが効けば)\n", + "\n", + "> 既存クエリはそのままで OK(書き換え不要)。\n", + "\n", + "### 2) 可視性マップを整えて Index Only を本物に\n", + "\n", + "`VACUUM (ANALYZE)` を回し、更新が多い場合は定期運用に。\n", + "可視性マップが立たないと **Index Only** が **Index Scan + heap hit** に退化します。\n", + "\n", + "```sql\n", + "VACUUM (ANALYZE) cinema;\n", + "-- 直近で大量更新があるなら autovacuum の閾値調整も検討\n", + "```\n", + "\n", + "### 3) 統計&並列のチューニング(軽め)\n", + "\n", + "* `ANALYZE cinema;`(新インデックス作成後は必ず)\n", + "* 並列が効く環境なら\n", + " `max_parallel_workers_per_gather` を 2〜4 に。\n", + " 小規模だと並列コストで逆効果なので **EXPLAIN (ANALYZE, BUFFERS)** で要確認。\n", + "\n", + "### 4) 物理配置でキャッシュヒット率を上げる(任意)\n", + "\n", + "* たまに `CLUSTER cinema USING idx_cinema_hot;`(ダウンタイム要)\n", + " もしくは `pg_repack`。順序アクセスが連続化し I/O が滑らかになります。\n", + "\n", + "---\n", + "\n", + "## 実行計画の確認ポイント\n", + "\n", + "```sql\n", + "EXPLAIN (ANALYZE, BUFFERS)\n", + "SELECT id, movie, description, rating\n", + "FROM cinema\n", + "WHERE (id % 2) = 1\n", + " AND description IS NOT NULL\n", + " AND description <> 'boring'\n", + "ORDER BY rating DESC, id DESC;\n", + "```\n", + "\n", + "見るべき点:\n", + "\n", + "* `Index Only Scan using idx_cinema_hot` になっているか\n", + "* `Sort` ノードが **無い**(=インデックス順で取得)\n", + "* `Heap Fetches: 0` 付近(可視性マップが効いている)\n", + "* `Buffers: shared hit` が大半(キャッシュヒットが高い)\n", + "\n", + "---\n", + "\n", + "## クエリ本文(最終形)\n", + "\n", + "```sql\n", + "SELECT\n", + " id,\n", + " movie,\n", + " description,\n", + " rating\n", + "FROM cinema\n", + "WHERE (id % 2) = 1\n", + " AND description IS NOT NULL\n", + " AND description <> 'boring'\n", + "ORDER BY rating DESC, id DESC;\n", + "\n", + "Runtime 175 ms\n", + "Beats 79.80%\n", + "\n", + "```\n", + "\n", + "> 文面はそのまま、**インデックス設計と運用で短縮**する、が最短コースです。\n", + "> 目安として、部分インデックス+IOS が噛めば **二桁ms** 台も十分に射程です(データ量・I/O 事情によります)。\n", + "\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}