diff --git a/SQL/Leetcode/Basic join/1075. Project Employees I/gpt 5.1 thinking customized/Project_Employees_I_pandas.ipynb b/SQL/Leetcode/Basic join/1075. Project Employees I/gpt 5.1 thinking customized/Project_Employees_I_pandas.ipynb new file mode 100644 index 00000000..88b226ce --- /dev/null +++ b/SQL/Leetcode/Basic join/1075. Project Employees I/gpt 5.1 thinking customized/Project_Employees_I_pandas.ipynb @@ -0,0 +1,324 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "dd3c3113", + "metadata": {}, + "source": [ + "## 0) 前提\n", + "\n", + "* 環境: **Python 3.10.15 / pandas 2.2.2**\n", + "* **指定シグネチャ厳守**\n", + "\n", + " * 関数名: `project_employees`\n", + " * 引数名: `project`, `employee`\n", + " * 返却列: `[\"project_id\", \"average_years\"]`\n", + " * 列順: 上記順序\n", + "* I/O 禁止(ファイル / 標準出力)、`print` / `sort_values` は使用しない\n", + "\n", + "---\n", + "\n", + "## 1) 問題\n", + "\n", + "* `{{PROBLEM_STATEMENT}}`\n", + " 各プロジェクトについて、そのプロジェクトにアサインされている従業員の\n", + " **平均経験年数 (`experience_years`) を小数第 2 位に丸めて** 求める。\n", + "\n", + "* 入力 DF: `{{INPUT_DATAFRAMES}}`\n", + "\n", + " * `project: pd.DataFrame`\n", + "\n", + " | column | dtype |\n", + " | ----------- | ----- |\n", + " | project_id | int |\n", + " | employee_id | int |\n", + "\n", + " 各行は「従業員 `employee_id` がプロジェクト `project_id` に所属している」ことを表す。\n", + "\n", + " * `employee: pd.DataFrame`\n", + "\n", + " | column | dtype |\n", + " | ---------------- | ------ |\n", + " | employee_id | int |\n", + " | name | object |\n", + " | experience_years | int |\n", + "\n", + " 各行は従業員 1 名の情報。`experience_years` は NULL なし。\n", + "\n", + "* 出力: `{{OUTPUT_COLUMNS_AND_RULES}}`\n", + "\n", + " * 戻り値: `pd.DataFrame`\n", + "\n", + " * 列と意味:\n", + "\n", + " * `project_id`: プロジェクト ID\n", + " * `average_years`: そのプロジェクトに所属する従業員の `experience_years` の平均値(小数第 2 位で丸め)\n", + "\n", + " * 各 `project_id` につき 1 行\n", + "\n", + " * 並び順は任意(`sort_values` 禁止のためソートしない)\n", + "\n", + "---\n", + "\n", + "## 2) 実装(指定シグネチャ厳守)\n", + "\n", + "> 列を最小化しつつ `merge` → `groupby.mean` → `round` の順に処理します。\n", + "> 今回はグループ内順位や条件抽出は不要なので、シンプルな集約だけで OK です。\n", + "\n", + "```python\n", + "import pandas as pd\n", + "\n", + "def project_employees(project: pd.DataFrame, employee: pd.DataFrame) -> pd.DataFrame:\n", + " \"\"\"\n", + " 各プロジェクトごとの平均経験年数を計算する。\n", + "\n", + " Args:\n", + " project (pd.DataFrame): 列 ['project_id', 'employee_id']\n", + " employee (pd.DataFrame): 列 ['employee_id', 'name', 'experience_years']\n", + "\n", + " Returns:\n", + " pd.DataFrame: 列名と順序は ['project_id', 'average_years']\n", + " \"\"\"\n", + " # 1) 列最小化: employee 側は平均に必要な列だけに絞る\n", + " emp_exp = employee[[\"employee_id\", \"experience_years\"]]\n", + "\n", + " # 2) JOIN: project に experience_years を紐づける\n", + " merged = project.merge(emp_exp, on=\"employee_id\", how=\"left\")\n", + "\n", + " # 3) プロジェクトごとに平均値を計算\n", + " out = (\n", + " merged\n", + " .groupby(\"project_id\", as_index=False)[\"experience_years\"]\n", + " .mean()\n", + " )\n", + "\n", + " # 4) 列名を仕様どおりにリネームし、小数第 2 位に丸める\n", + " out = out.rename(columns={\"experience_years\": \"average_years\"})\n", + " out[\"average_years\"] = out[\"average_years\"].round(2)\n", + "\n", + " return out\n", + "\n", + "Analyze Complexity\n", + "Runtime 283 ms\n", + "Beats 62.61%\n", + "Memory 69.14 MB\n", + "Beats 18.00%\n", + "\n", + "```\n", + "\n", + "---\n", + "\n", + "## 3) アルゴリズム説明\n", + "\n", + "### 使用 API\n", + "\n", + "* `DataFrame[...]`\n", + " → 列のサブセットを取り、**列最小化**(不要列を運ばないことでメモリ削減)。\n", + "* `DataFrame.merge`\n", + " → `project` と `employee` を `employee_id` で結合し、従業員の経験年数をプロジェクトに紐付け。\n", + "* `DataFrame.groupby` + `GroupBy.mean`\n", + " → `project_id` ごとに `experience_years` の平均を計算。\n", + "* `DataFrame.rename`\n", + " → 出力列名を問題仕様どおり `average_years` に。\n", + "* `Series.round`\n", + " → 平均値を小数第 2 位に丸める。\n", + "\n", + "### NULL / 重複 / 型の扱い\n", + "\n", + "* `experience_years` は問題文より **NULL なし** 前提なので、`mean()` で NULL ケアは不要。\n", + "* `project` 側は `(project_id, employee_id)` が主キーなので、\n", + " 「同じプロジェクトに同じ従業員が複数行いる」ことはなく、重複による二重カウントも発生しない。\n", + "* `mean()` の戻り値は `float64`。\n", + " `round(2)` で `2.0` → 画面では `2.00` 相当の精度を保持できる(pandas の表示設定次第)。\n", + "\n", + "---\n", + "\n", + "## 4) 計算量(概算)\n", + "\n", + "`N = len(project)`, `M = len(employee)` とします。\n", + "\n", + "* 列最小化: `employee[[\"employee_id\", \"experience_years\"]]`\n", + " → **O(M)**\n", + "* `merge`(ハッシュ結合想定): `project.merge(emp_exp, on=\"employee_id\")`\n", + " → **O(N + M)** 近辺\n", + "* `groupby(\"project_id\").mean()`\n", + " → **O(N)**(ハッシュベースのグループ集計)\n", + "\n", + "したがって、全体はおおよそ **O(N + M)** 時間、メモリは\n", + "結合後の一時 DataFrame(行数 ≒ N、列数は 3 列)相当となります。\n", + "\n", + "---\n", + "\n", + "## 5) 図解(Mermaid 超保守版)\n", + "\n", + "```mermaid\n", + "flowchart TD\n", + " A[project
project_id, employee_id]\n", + " B[employee
employee_id, experience_years, name]\n", + " C[列最小化
employee → employee_id, experience_years のみ]\n", + " D[merge
on employee_id]\n", + " E[\"groupby project_id
mean(experience_years)\"]\n", + " F[\"列名変更 + round(2)
average_years\"]\n", + " G[出力
project_id, average_years]\n", + "\n", + " B --> C\n", + " A --> D\n", + " C --> D\n", + " D --> E\n", + " E --> F\n", + " F --> G\n", + "```\n", + "\n", + "この形なら、LeetCode の pandas 版「Project Employees I」にそのままコピペ投入できるはずです。\n", + "\n", + "結論から言うと、\n", + "\n", + "* **計算量レベルではほぼ限界まで来ていて、大きな高速化は期待しにくい**\n", + "* ただし、**merge → map に変えることで、メモリとわずかな実行時間改善の余地はある**\n", + "* 特にメモリ使用量(Beats 18%)を少しでも良くしたいなら、「結合の仕方」を軽くするのがポイント\n", + "\n", + "という感じです。\n", + "\n", + "---\n", + "\n", + "## 1) いまのアプローチの整理\n", + "\n", + "あなたの現行コード(要約)は:\n", + "\n", + "```python\n", + "emp_exp = employee[[\"employee_id\", \"experience_years\"]]\n", + "\n", + "merged = project.merge(emp_exp, on=\"employee_id\", how=\"left\")\n", + "\n", + "out = (\n", + " merged\n", + " .groupby(\"project_id\", as_index=False)[\"experience_years\"]\n", + " .mean()\n", + ")\n", + "\n", + "out = out.rename(columns={\"experience_years\": \"average_years\"})\n", + "out[\"average_years\"] = out[\"average_years\"].round(2)\n", + "```\n", + "\n", + "やっていることは完全に正しくて、アルゴリズム的にも\n", + "\n", + "* 結合: O(N + M)\n", + "* groupby: O(N)\n", + "\n", + "で、ここから**オーダーを変える改善はできません**。\n", + "\n", + "LeetCode の 283ms / Beats 62% という数字も、\n", + "環境ノイズを含めて「十分良い」側です。\n", + "\n", + "---\n", + "\n", + "## 2) 改善ポイント:`merge` → `map` で軽量化\n", + "\n", + "`project` 側にはすでに `employee_id` が入っているので、\n", + "\n", + "> わざわざ `merge` で行を膨らませるのではなく、\n", + "> **`employee_id → experience_years` のマッピングを作って `map` する**\n", + "\n", + "方が、メモリ的には少し有利になり得ます。\n", + "\n", + "### 修正版コード(`map` ベース)\n", + "\n", + "```python\n", + "import pandas as pd\n", + "\n", + "def project_employees_i(project: pd.DataFrame, employee: pd.DataFrame) -> pd.DataFrame:\n", + " \"\"\"\n", + " 各プロジェクトごとの平均経験年数を計算する。\n", + "\n", + " Args:\n", + " project (pd.DataFrame): 列 ['project_id', 'employee_id']\n", + " employee (pd.DataFrame): 列 ['employee_id', 'name', 'experience_years']\n", + "\n", + " Returns:\n", + " pd.DataFrame: 列名と順序は ['project_id', 'average_years']\n", + " \"\"\"\n", + " # 1) employee_id -> experience_years のマップを作る(列最小化 + index 化)\n", + " emp_exp = employee.set_index(\"employee_id\")[\"experience_years\"]\n", + "\n", + " # 2) project に対応する経験年数列を付与(merge の代わりに map)\n", + " # 列最小化: project 側も必要な列だけ使う\n", + " proj = project[[\"project_id\", \"employee_id\"]].copy()\n", + " proj[\"experience_years\"] = proj[\"employee_id\"].map(emp_exp)\n", + "\n", + " # 3) 集約に不要な employee_id を削る(メモリ削減の一手)\n", + " proj = proj[[\"project_id\", \"experience_years\"]]\n", + "\n", + " # 4) プロジェクトごとに平均値を計算\n", + " out = (\n", + " proj\n", + " .groupby(\"project_id\", as_index=False)[\"experience_years\"]\n", + " .mean()\n", + " )\n", + "\n", + " # 5) 列名変更 + 小数第2位に丸める\n", + " out = out.rename(columns={\"experience_years\": \"average_years\"})\n", + " out[\"average_years\"] = out[\"average_years\"].round(2)\n", + "\n", + " return out\n", + "\n", + "Analyze Complexity\n", + "Runtime 283 ms\n", + "Beats 62.61%\n", + "Memory 68.14 MB\n", + "Beats 98.61%\n", + "\n", + "```\n", + "\n", + "### この書き方の狙い\n", + "\n", + "* `merge` は「左・右両方の列をフルで抱えた新しい DataFrame」を作るので、そのぶんメモリを食いやすい\n", + "* `map` は「Series の値を 1 列だけ投げ込む」だけなので、結合の中では比較的軽い\n", + "* `employee` 側は `set_index` して `Series` 化していて、\n", + " **キー→値** のマップとして最小限の形にしている\n", + "\n", + "大幅なメモリ削減にはならないかもしれませんが、\n", + "\n", + "* 一時的に作るオブジェクトがやや小さくなる\n", + "* 実行時間も微妙に改善する可能性はある\n", + "\n", + "ので、LeetCode のメモリ percentile を数ポイント押し上げられる可能性はあります。\n", + "\n", + "---\n", + "\n", + "## 3) どこまで改善を狙うべきか?\n", + "\n", + "正直なところ、今回の問題は\n", + "\n", + "* 入力サイズもそれほど極端ではない\n", + "* アルゴリズムも「JOIN + GROUP BY」で頭打ち\n", + "\n", + "なので、**すでに「クエリとしては十分に最適な部類」** です。\n", + "\n", + "283ms / Beats 62% という数字は、少しコードをいじったくらいで\n", + "劇的に変わるものではないので、\n", + "\n", + "* 可読性\n", + "* 素直さ(変なトリックを使わない)\n", + "* バグの入りにくさ\n", + "\n", + "を優先しつつ、上の `map` 版のような「軽い改善」を入れておけばかなり良いラインだと思います。\n", + "\n", + "---\n", + "\n", + "もしこの後、\n", + "\n", + "* 別の LeetCode pandas 問題\n", + "* もっと重いグループ演算(複数条件 / 上位 k / window関数っぽい処理)\n", + "\n", + "などが出てきたら、そこは **`groupby.transform` / `rank` / `merge` 戦略**を総合的に組み立てる練習ネタにできます。\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/SQL/Leetcode/Basic join/1075. Project Employees I/gpt 5.1 thinking customized/Project_Employees_I_postgres.ipynb b/SQL/Leetcode/Basic join/1075. Project Employees I/gpt 5.1 thinking customized/Project_Employees_I_postgres.ipynb new file mode 100644 index 00000000..6958b6d3 --- /dev/null +++ b/SQL/Leetcode/Basic join/1075. Project Employees I/gpt 5.1 thinking customized/Project_Employees_I_postgres.ipynb @@ -0,0 +1,374 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "e334eb64", + "metadata": {}, + "source": [ + "## 0) 前提\n", + "\n", + "* エンジン: **PostgreSQL 16.6+**\n", + "* 並び順: 任意\n", + "* `NOT IN` は本問では不要(`EXISTS` / `LEFT JOIN ... IS NULL` も登場しない)\n", + "* 判定は `project_id` 単位、表示は仕様どおり\n", + "\n", + "---\n", + "\n", + "## 1) 問題\n", + "\n", + "* `{{PROBLEM_STATEMENT}}`\n", + " 各プロジェクトについて、そのプロジェクトに所属している従業員の **平均経験年数 (`experience_years`)** を求め、**小数第 2 位まで** に丸めて出力せよ。\n", + "\n", + "* 入力: `{{TABLES_OR_SCHEMAS}}`\n", + "\n", + " **Project**\n", + "\n", + " | Column Name | Type |\n", + " | ----------- | ---- |\n", + " | project_id | int |\n", + " | employee_id | int |\n", + "\n", + " * `(project_id, employee_id)` が主キー\n", + " * `employee_id` は `Employee.employee_id` への外部キー\n", + "\n", + " **Employee**\n", + "\n", + " | Column Name | Type |\n", + " | ---------------- | ------- |\n", + " | employee_id | int |\n", + " | name | varchar |\n", + " | experience_years | int |\n", + "\n", + " * `employee_id` が主キー\n", + " * `experience_years` は NOT NULL\n", + "\n", + "* 出力: `{{OUTPUT_COLUMNS_AND_RULES}}`\n", + "\n", + " | Column Name | 説明 |\n", + " | ------------- | ----------------------------------- |\n", + " | project_id | プロジェクト ID |\n", + " | average_years | そのプロジェクトに属する従業員の平均経験年数。小数第 2 位で四捨五入 |\n", + "\n", + " * 各 `project_id` につき 1 行\n", + " * 並び順は任意\n", + "\n", + "---\n", + "\n", + "## 2) 最適解(単一クエリ)\n", + "\n", + "> 指示どおり CTE + ウィンドウで組み立てたバージョンです(この問題だけなら単純な `GROUP BY` でも解けますが、ウィンドウ版に寄せます)。\n", + "\n", + "```sql\n", + "WITH pre AS (\n", + " SELECT\n", + " p.project_id,\n", + " e.experience_years\n", + " FROM project AS p\n", + " JOIN employee AS e\n", + " ON e.employee_id = p.employee_id\n", + "),\n", + "win AS (\n", + " SELECT\n", + " project_id,\n", + " AVG(experience_years) OVER (\n", + " PARTITION BY project_id\n", + " ) AS avg_experience_years\n", + " FROM pre\n", + ")\n", + "SELECT DISTINCT\n", + " project_id,\n", + " ROUND(avg_experience_years::numeric, 2) AS average_years\n", + "FROM win;\n", + "\n", + "Runtime 446 ms\n", + "Beats 64.43%\n", + "\n", + "```\n", + "\n", + "### (参考)もっと素直な書き方\n", + "\n", + "テンプレートに縛られないなら、現場ではこちらのほうがシンプルで読みやすいです。\n", + "\n", + "```sql\n", + "SELECT\n", + " p.project_id,\n", + " ROUND(AVG(e.experience_years)::numeric, 2) AS average_years\n", + "FROM project AS p\n", + "JOIN employee AS e\n", + " ON e.employee_id = p.employee_id\n", + "GROUP BY\n", + " p.project_id;\n", + "\n", + "Runtime 439 ms\n", + "Beats 72.22%\n", + "\n", + "```\n", + "\n", + "---\n", + "\n", + "## 3) 要点解説\n", + "\n", + "1. **従業員の紐づけ (`pre` CTE)**\n", + " `project` と `employee` を `employee_id` で結合して、\n", + " 「プロジェクト × 従業員の経験年数」の明細を作っています。\n", + "\n", + " ```sql\n", + " SELECT\n", + " p.project_id,\n", + " e.experience_years\n", + " FROM project AS p\n", + " JOIN employee AS e\n", + " ON e.employee_id = p.employee_id;\n", + " ```\n", + "\n", + "2. **ウィンドウ集計でプロジェクト別の平均を計算 (`win` CTE)**\n", + " `AVG(experience_years) OVER (PARTITION BY project_id)` により、\n", + " 各 `project_id` ごとに同じ平均値(ウィンドウ集計結果)が行ごとに付きます。\n", + "\n", + " ```sql\n", + " AVG(experience_years) OVER (\n", + " PARTITION BY project_id\n", + " ) AS avg_experience_years\n", + " ```\n", + "\n", + "3. **`DISTINCT` で 1 プロジェクト 1 行に整形**\n", + " `win` ではプロジェクトに属する従業員の行数だけ行があるため、\n", + " `SELECT DISTINCT project_id, avg_experience_years` で 1 行に圧縮します。\n", + "\n", + "4. **2 桁丸め (`ROUND`)**\n", + " PostgreSQL で小数第 2 位までに丸めるには `ROUND(numeric, 2)` を使うので、\n", + " ウィンドウの結果を `numeric` にキャストした上で丸めています。\n", + "\n", + " ```sql\n", + " ROUND(avg_experience_years::numeric, 2) AS average_years\n", + " ```\n", + "\n", + "---\n", + "\n", + "## 4) 計算量(概算)\n", + "\n", + "`n` 行の `Project`、`m` 行の `Employee` を想定します。\n", + "\n", + "* `JOIN`(`employee_id` にインデックスあり前提で hash join / nested loop)\n", + " → おおよそ **O(n + m)** 〜 **O(n log m)** 程度\n", + "* ウィンドウ関数 `AVG() OVER (PARTITION BY project_id)`\n", + " → プロジェクトごとの件数を `n_g` とすると、**O(Σ n_g)**(内部ソートがあっても `project_id` での 1 回)\n", + "* 全体としてはインデックスが効いていれば **ほぼ線形 O(n + m)** 近辺に収まる想定\n", + "\n", + "この規模の問題(LeetCode レベル)なら、どちらの書き方でも十分高速です。\n", + "\n", + "---\n", + "\n", + "## 5) 図解(Mermaid 超保守版)\n", + "\n", + "```mermaid\n", + "flowchart TD\n", + " A[Project
project_id, employee_id]\n", + " B[Employee
employee_id, experience_years]\n", + " C[CTE pre
project_id + experience_years]\n", + " D[\"CTE win
AVG(...) OVER (PARTITION BY project_id)\"]\n", + " E[\"SELECT DISTINCT
project_id, ROUND(avg, 2)\"]\n", + " F[出力
project_id, average_years]\n", + "\n", + " A -->|JOIN employee_id| C\n", + " B -->|JOIN employee_id| C\n", + " C --> D\n", + " D --> E\n", + " E --> F\n", + "```\n", + "\n", + "こんな感じで、「JOIN → プロジェクトごとに平均 → 丸めて出力」という流れになっています。\n", + "\n", + "結論からいうと、\n", + "\n", + "* **クエリとしてはすでにほぼ最適**\n", + "* 446ms vs 439ms は **誤差レベル(実行環境ノイズ)**\n", + "* 改善するとしたら **「アルゴリズム」ではなく「シンプルさと無駄の削減」**\n", + "\n", + "くらいの話になります。\n", + "\n", + "---\n", + "\n", + "## 1. 2つのクエリの評価\n", + "\n", + "### ① CTE + ウィンドウ版\n", + "\n", + "```sql\n", + "WITH pre AS (\n", + " SELECT\n", + " p.project_id,\n", + " e.experience_years\n", + " FROM project AS p\n", + " JOIN employee AS e\n", + " ON e.employee_id = p.employee_id\n", + "),\n", + "win AS (\n", + " SELECT\n", + " project_id,\n", + " AVG(experience_years) OVER (\n", + " PARTITION BY project_id\n", + " ) AS avg_experience_years\n", + " FROM pre\n", + ")\n", + "SELECT DISTINCT\n", + " project_id,\n", + " ROUND(avg_experience_years::numeric, 2) AS average_years\n", + "FROM win;\n", + "```\n", + "\n", + "* `AVG(...) OVER (PARTITION BY ...)` で **全行に同じ平均値を付与**\n", + "* その後 `DISTINCT` で重複を 1 行に畳んでいる\n", + " → **機能的には正しい** ですが、「1プロジェクト1行」が欲しいだけなので、\n", + " *一度ウィンドウで複製してから DISTINCT で消す* という遠回りになっています。\n", + "\n", + "### ② GROUP BY 版\n", + "\n", + "```sql\n", + "SELECT\n", + " p.project_id,\n", + " ROUND(AVG(e.experience_years)::numeric, 2) AS average_years\n", + "FROM project AS p\n", + "JOIN employee AS e\n", + " ON e.employee_id = p.employee_id\n", + "GROUP BY\n", + " p.project_id;\n", + "```\n", + "\n", + "* こちらは **素直に 1 グループ 1 行を直接作っている**\n", + "* 論理的にも物理的にも処理フローが短く、プランナーも最適化しやすい\n", + "\n", + "実行時間の差が小さいのは、PostgreSQL のプランナーが賢くて、\n", + "ウィンドウ + DISTINCT も内部的にかなりうまく畳んでくれている可能性が高いです。\n", + "\n", + "---\n", + "\n", + "## 2. 実行時間 446ms vs 439ms について\n", + "\n", + "LeetCode系のランタイムは\n", + "\n", + "* 同じクエリでも **何度も実行すれば普通に ±10〜20ms くらい揺れる**\n", + "* 他ユーザーの負荷や環境状態に左右される\n", + "\n", + "ので、**446ms vs 439ms は完全にノイズ範囲** と思って問題ないです。\n", + "\n", + "「Beats 64% → 72%」も、プラットフォーム内での相対比較なので、\n", + "\n", + "* DBエンジンが違う(MySQL / Oracle など)\n", + "* インデックス前提が違う\n", + "* 言語実装やドライバのオーバーヘッドの差\n", + "\n", + "などいろいろ混ざっています。\n", + "PostgreSQL 的には、今の GROUP BY クエリで十分「いい線」です。\n", + "\n", + "---\n", + "\n", + "## 3. ほんの少しだけ直せるところ(PostgreSQL的に)\n", + "\n", + "### 3.1 不要なキャストを削る\n", + "\n", + "PostgreSQL では\n", + "\n", + "* `AVG(int)` の戻り値は **すでに `numeric`**\n", + "\n", + "なので、実は `::numeric` は不要です。\n", + "\n", + "```sql\n", + "SELECT\n", + " p.project_id,\n", + " ROUND(AVG(e.experience_years), 2) AS average_years\n", + "FROM project AS p\n", + "JOIN employee AS e\n", + " ON e.employee_id = p.employee_id\n", + "GROUP BY\n", + " p.project_id;\n", + "```\n", + "\n", + "これで意味はまったく同じです。\n", + "(性能差はほとんど出ませんが、「余計なことをしない」という意味で綺麗。)\n", + "\n", + "### 3.2 `USING` で少しだけ読みやすく\n", + "\n", + "PostgreSQL では `JOIN ... USING (...)` が使えるので、個人的にはこの書き方も好きです。\n", + "\n", + "```sql\n", + "SELECT\n", + " p.project_id,\n", + " ROUND(AVG(e.experience_years), 2) AS average_years\n", + "FROM project AS p\n", + "JOIN employee AS e USING (employee_id)\n", + "GROUP BY\n", + " p.project_id;\n", + "\n", + "Runtime 449 ms\n", + "Beats 61.92%\n", + "\n", + "```\n", + "\n", + "これも性能はほぼ同じですが、\n", + "\n", + "* 結合条件が「キー1本」のときに読みやすく\n", + "* `ON p.employee_id = e.employee_id` という重複記述が消える\n", + "\n", + "というメリットがあります。\n", + "\n", + "---\n", + "\n", + "## 4. チューニングの「本質的な余地」はあるか?\n", + "\n", + "この問題の構造だと、DB側でできることはだいたい以下の2点に集約されます。\n", + "\n", + "1. **JOIN をいかに効率よく行うか**\n", + "2. **project_id ごとに平均を出すための集約をいかに効率よく行うか**\n", + "\n", + "そして、この前提では:\n", + "\n", + "* `Employee.employee_id` は PK → **インデックスあり**\n", + "* `Project(project_id, employee_id)` も PK → **(project_id, employee_id) インデックスあり**\n", + "* `GROUP BY project_id` は、すでにインデックスの先頭列\n", + "\n", + "なので、実行計画としてもかなり恵まれていて、\n", + "**これ以上の実行時間短縮は「DBチューニング」領域(メモリ・並列度・統計情報など)**になります。\n", + "LeetCode のような環境ではそこは触れないので、クエリレベルでの改善余地はほぼありません。\n", + "\n", + "---\n", + "\n", + "## 5. まとめ\n", + "\n", + "* **CTE + ウィンドウ版**\n", + "\n", + " * ロジックは綺麗だが、この問題に対しては処理が遠回り気味\n", + "* **GROUP BY 版**\n", + "\n", + " * 問題にもっとも素直で、現場でもこの形が「ベストプラクティス」\n", + " * 実行時間もわずかに良く、プランナーも扱いやすい\n", + "\n", + "最終形としては、これを採用でOKだと思います:\n", + "\n", + "```sql\n", + "SELECT\n", + " p.project_id,\n", + " ROUND(AVG(e.experience_years), 2) AS average_years\n", + "FROM project AS p\n", + "JOIN employee AS e USING (employee_id)\n", + "GROUP BY\n", + " p.project_id;\n", + "```\n", + "\n", + "もし次のステップとして「さらに一段上の最適化感覚」を鍛えるなら、\n", + "\n", + "* `EXPLAIN (ANALYZE, BUFFERS)` を見て、\n", + "* `HashAggregate` / `GroupAggregate` / 使われているインデックス\n", + "* 実行時間の内訳(JOIN が重いのか、集約が重いのか)\n", + "\n", + "を読む練習をすると、実務寄りのスキルアップになります 💪" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}