|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "dd3c3113", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "## 0) 前提\n", |
| 9 | + "\n", |
| 10 | + "* 環境: **Python 3.10.15 / pandas 2.2.2**\n", |
| 11 | + "* **指定シグネチャ厳守**\n", |
| 12 | + "\n", |
| 13 | + " * 関数名: `project_employees`\n", |
| 14 | + " * 引数名: `project`, `employee`\n", |
| 15 | + " * 返却列: `[\"project_id\", \"average_years\"]`\n", |
| 16 | + " * 列順: 上記順序\n", |
| 17 | + "* I/O 禁止(ファイル / 標準出力)、`print` / `sort_values` は使用しない\n", |
| 18 | + "\n", |
| 19 | + "---\n", |
| 20 | + "\n", |
| 21 | + "## 1) 問題\n", |
| 22 | + "\n", |
| 23 | + "* `{{PROBLEM_STATEMENT}}`\n", |
| 24 | + " 各プロジェクトについて、そのプロジェクトにアサインされている従業員の\n", |
| 25 | + " **平均経験年数 (`experience_years`) を小数第 2 位に丸めて** 求める。\n", |
| 26 | + "\n", |
| 27 | + "* 入力 DF: `{{INPUT_DATAFRAMES}}`\n", |
| 28 | + "\n", |
| 29 | + " * `project: pd.DataFrame`\n", |
| 30 | + "\n", |
| 31 | + " | column | dtype |\n", |
| 32 | + " | ----------- | ----- |\n", |
| 33 | + " | project_id | int |\n", |
| 34 | + " | employee_id | int |\n", |
| 35 | + "\n", |
| 36 | + " 各行は「従業員 `employee_id` がプロジェクト `project_id` に所属している」ことを表す。\n", |
| 37 | + "\n", |
| 38 | + " * `employee: pd.DataFrame`\n", |
| 39 | + "\n", |
| 40 | + " | column | dtype |\n", |
| 41 | + " | ---------------- | ------ |\n", |
| 42 | + " | employee_id | int |\n", |
| 43 | + " | name | object |\n", |
| 44 | + " | experience_years | int |\n", |
| 45 | + "\n", |
| 46 | + " 各行は従業員 1 名の情報。`experience_years` は NULL なし。\n", |
| 47 | + "\n", |
| 48 | + "* 出力: `{{OUTPUT_COLUMNS_AND_RULES}}`\n", |
| 49 | + "\n", |
| 50 | + " * 戻り値: `pd.DataFrame`\n", |
| 51 | + "\n", |
| 52 | + " * 列と意味:\n", |
| 53 | + "\n", |
| 54 | + " * `project_id`: プロジェクト ID\n", |
| 55 | + " * `average_years`: そのプロジェクトに所属する従業員の `experience_years` の平均値(小数第 2 位で丸め)\n", |
| 56 | + "\n", |
| 57 | + " * 各 `project_id` につき 1 行\n", |
| 58 | + "\n", |
| 59 | + " * 並び順は任意(`sort_values` 禁止のためソートしない)\n", |
| 60 | + "\n", |
| 61 | + "---\n", |
| 62 | + "\n", |
| 63 | + "## 2) 実装(指定シグネチャ厳守)\n", |
| 64 | + "\n", |
| 65 | + "> 列を最小化しつつ `merge` → `groupby.mean` → `round` の順に処理します。\n", |
| 66 | + "> 今回はグループ内順位や条件抽出は不要なので、シンプルな集約だけで OK です。\n", |
| 67 | + "\n", |
| 68 | + "```python\n", |
| 69 | + "import pandas as pd\n", |
| 70 | + "\n", |
| 71 | + "def project_employees(project: pd.DataFrame, employee: pd.DataFrame) -> pd.DataFrame:\n", |
| 72 | + " \"\"\"\n", |
| 73 | + " 各プロジェクトごとの平均経験年数を計算する。\n", |
| 74 | + "\n", |
| 75 | + " Args:\n", |
| 76 | + " project (pd.DataFrame): 列 ['project_id', 'employee_id']\n", |
| 77 | + " employee (pd.DataFrame): 列 ['employee_id', 'name', 'experience_years']\n", |
| 78 | + "\n", |
| 79 | + " Returns:\n", |
| 80 | + " pd.DataFrame: 列名と順序は ['project_id', 'average_years']\n", |
| 81 | + " \"\"\"\n", |
| 82 | + " # 1) 列最小化: employee 側は平均に必要な列だけに絞る\n", |
| 83 | + " emp_exp = employee[[\"employee_id\", \"experience_years\"]]\n", |
| 84 | + "\n", |
| 85 | + " # 2) JOIN: project に experience_years を紐づける\n", |
| 86 | + " merged = project.merge(emp_exp, on=\"employee_id\", how=\"left\")\n", |
| 87 | + "\n", |
| 88 | + " # 3) プロジェクトごとに平均値を計算\n", |
| 89 | + " out = (\n", |
| 90 | + " merged\n", |
| 91 | + " .groupby(\"project_id\", as_index=False)[\"experience_years\"]\n", |
| 92 | + " .mean()\n", |
| 93 | + " )\n", |
| 94 | + "\n", |
| 95 | + " # 4) 列名を仕様どおりにリネームし、小数第 2 位に丸める\n", |
| 96 | + " out = out.rename(columns={\"experience_years\": \"average_years\"})\n", |
| 97 | + " out[\"average_years\"] = out[\"average_years\"].round(2)\n", |
| 98 | + "\n", |
| 99 | + " return out\n", |
| 100 | + "\n", |
| 101 | + "Analyze Complexity\n", |
| 102 | + "Runtime 283 ms\n", |
| 103 | + "Beats 62.61%\n", |
| 104 | + "Memory 69.14 MB\n", |
| 105 | + "Beats 18.00%\n", |
| 106 | + "\n", |
| 107 | + "```\n", |
| 108 | + "\n", |
| 109 | + "---\n", |
| 110 | + "\n", |
| 111 | + "## 3) アルゴリズム説明\n", |
| 112 | + "\n", |
| 113 | + "### 使用 API\n", |
| 114 | + "\n", |
| 115 | + "* `DataFrame[...]`\n", |
| 116 | + " → 列のサブセットを取り、**列最小化**(不要列を運ばないことでメモリ削減)。\n", |
| 117 | + "* `DataFrame.merge`\n", |
| 118 | + " → `project` と `employee` を `employee_id` で結合し、従業員の経験年数をプロジェクトに紐付け。\n", |
| 119 | + "* `DataFrame.groupby` + `GroupBy.mean`\n", |
| 120 | + " → `project_id` ごとに `experience_years` の平均を計算。\n", |
| 121 | + "* `DataFrame.rename`\n", |
| 122 | + " → 出力列名を問題仕様どおり `average_years` に。\n", |
| 123 | + "* `Series.round`\n", |
| 124 | + " → 平均値を小数第 2 位に丸める。\n", |
| 125 | + "\n", |
| 126 | + "### NULL / 重複 / 型の扱い\n", |
| 127 | + "\n", |
| 128 | + "* `experience_years` は問題文より **NULL なし** 前提なので、`mean()` で NULL ケアは不要。\n", |
| 129 | + "* `project` 側は `(project_id, employee_id)` が主キーなので、\n", |
| 130 | + " 「同じプロジェクトに同じ従業員が複数行いる」ことはなく、重複による二重カウントも発生しない。\n", |
| 131 | + "* `mean()` の戻り値は `float64`。\n", |
| 132 | + " `round(2)` で `2.0` → 画面では `2.00` 相当の精度を保持できる(pandas の表示設定次第)。\n", |
| 133 | + "\n", |
| 134 | + "---\n", |
| 135 | + "\n", |
| 136 | + "## 4) 計算量(概算)\n", |
| 137 | + "\n", |
| 138 | + "`N = len(project)`, `M = len(employee)` とします。\n", |
| 139 | + "\n", |
| 140 | + "* 列最小化: `employee[[\"employee_id\", \"experience_years\"]]`\n", |
| 141 | + " → **O(M)**\n", |
| 142 | + "* `merge`(ハッシュ結合想定): `project.merge(emp_exp, on=\"employee_id\")`\n", |
| 143 | + " → **O(N + M)** 近辺\n", |
| 144 | + "* `groupby(\"project_id\").mean()`\n", |
| 145 | + " → **O(N)**(ハッシュベースのグループ集計)\n", |
| 146 | + "\n", |
| 147 | + "したがって、全体はおおよそ **O(N + M)** 時間、メモリは\n", |
| 148 | + "結合後の一時 DataFrame(行数 ≒ N、列数は 3 列)相当となります。\n", |
| 149 | + "\n", |
| 150 | + "---\n", |
| 151 | + "\n", |
| 152 | + "## 5) 図解(Mermaid 超保守版)\n", |
| 153 | + "\n", |
| 154 | + "```mermaid\n", |
| 155 | + "flowchart TD\n", |
| 156 | + " A[project<br/>project_id, employee_id]\n", |
| 157 | + " B[employee<br/>employee_id, experience_years, name]\n", |
| 158 | + " C[列最小化<br/>employee → employee_id, experience_years のみ]\n", |
| 159 | + " D[merge<br/>on employee_id]\n", |
| 160 | + " E[\"groupby project_id<br/>mean(experience_years)\"]\n", |
| 161 | + " F[\"列名変更 + round(2)<br/>average_years\"]\n", |
| 162 | + " G[出力<br/>project_id, average_years]\n", |
| 163 | + "\n", |
| 164 | + " B --> C\n", |
| 165 | + " A --> D\n", |
| 166 | + " C --> D\n", |
| 167 | + " D --> E\n", |
| 168 | + " E --> F\n", |
| 169 | + " F --> G\n", |
| 170 | + "```\n", |
| 171 | + "\n", |
| 172 | + "この形なら、LeetCode の pandas 版「Project Employees I」にそのままコピペ投入できるはずです。\n", |
| 173 | + "\n", |
| 174 | + "結論から言うと、\n", |
| 175 | + "\n", |
| 176 | + "* **計算量レベルではほぼ限界まで来ていて、大きな高速化は期待しにくい**\n", |
| 177 | + "* ただし、**merge → map に変えることで、メモリとわずかな実行時間改善の余地はある**\n", |
| 178 | + "* 特にメモリ使用量(Beats 18%)を少しでも良くしたいなら、「結合の仕方」を軽くするのがポイント\n", |
| 179 | + "\n", |
| 180 | + "という感じです。\n", |
| 181 | + "\n", |
| 182 | + "---\n", |
| 183 | + "\n", |
| 184 | + "## 1) いまのアプローチの整理\n", |
| 185 | + "\n", |
| 186 | + "あなたの現行コード(要約)は:\n", |
| 187 | + "\n", |
| 188 | + "```python\n", |
| 189 | + "emp_exp = employee[[\"employee_id\", \"experience_years\"]]\n", |
| 190 | + "\n", |
| 191 | + "merged = project.merge(emp_exp, on=\"employee_id\", how=\"left\")\n", |
| 192 | + "\n", |
| 193 | + "out = (\n", |
| 194 | + " merged\n", |
| 195 | + " .groupby(\"project_id\", as_index=False)[\"experience_years\"]\n", |
| 196 | + " .mean()\n", |
| 197 | + ")\n", |
| 198 | + "\n", |
| 199 | + "out = out.rename(columns={\"experience_years\": \"average_years\"})\n", |
| 200 | + "out[\"average_years\"] = out[\"average_years\"].round(2)\n", |
| 201 | + "```\n", |
| 202 | + "\n", |
| 203 | + "やっていることは完全に正しくて、アルゴリズム的にも\n", |
| 204 | + "\n", |
| 205 | + "* 結合: O(N + M)\n", |
| 206 | + "* groupby: O(N)\n", |
| 207 | + "\n", |
| 208 | + "で、ここから**オーダーを変える改善はできません**。\n", |
| 209 | + "\n", |
| 210 | + "LeetCode の 283ms / Beats 62% という数字も、\n", |
| 211 | + "環境ノイズを含めて「十分良い」側です。\n", |
| 212 | + "\n", |
| 213 | + "---\n", |
| 214 | + "\n", |
| 215 | + "## 2) 改善ポイント:`merge` → `map` で軽量化\n", |
| 216 | + "\n", |
| 217 | + "`project` 側にはすでに `employee_id` が入っているので、\n", |
| 218 | + "\n", |
| 219 | + "> わざわざ `merge` で行を膨らませるのではなく、\n", |
| 220 | + "> **`employee_id → experience_years` のマッピングを作って `map` する**\n", |
| 221 | + "\n", |
| 222 | + "方が、メモリ的には少し有利になり得ます。\n", |
| 223 | + "\n", |
| 224 | + "### 修正版コード(`map` ベース)\n", |
| 225 | + "\n", |
| 226 | + "```python\n", |
| 227 | + "import pandas as pd\n", |
| 228 | + "\n", |
| 229 | + "def project_employees_i(project: pd.DataFrame, employee: pd.DataFrame) -> pd.DataFrame:\n", |
| 230 | + " \"\"\"\n", |
| 231 | + " 各プロジェクトごとの平均経験年数を計算する。\n", |
| 232 | + "\n", |
| 233 | + " Args:\n", |
| 234 | + " project (pd.DataFrame): 列 ['project_id', 'employee_id']\n", |
| 235 | + " employee (pd.DataFrame): 列 ['employee_id', 'name', 'experience_years']\n", |
| 236 | + "\n", |
| 237 | + " Returns:\n", |
| 238 | + " pd.DataFrame: 列名と順序は ['project_id', 'average_years']\n", |
| 239 | + " \"\"\"\n", |
| 240 | + " # 1) employee_id -> experience_years のマップを作る(列最小化 + index 化)\n", |
| 241 | + " emp_exp = employee.set_index(\"employee_id\")[\"experience_years\"]\n", |
| 242 | + "\n", |
| 243 | + " # 2) project に対応する経験年数列を付与(merge の代わりに map)\n", |
| 244 | + " # 列最小化: project 側も必要な列だけ使う\n", |
| 245 | + " proj = project[[\"project_id\", \"employee_id\"]].copy()\n", |
| 246 | + " proj[\"experience_years\"] = proj[\"employee_id\"].map(emp_exp)\n", |
| 247 | + "\n", |
| 248 | + " # 3) 集約に不要な employee_id を削る(メモリ削減の一手)\n", |
| 249 | + " proj = proj[[\"project_id\", \"experience_years\"]]\n", |
| 250 | + "\n", |
| 251 | + " # 4) プロジェクトごとに平均値を計算\n", |
| 252 | + " out = (\n", |
| 253 | + " proj\n", |
| 254 | + " .groupby(\"project_id\", as_index=False)[\"experience_years\"]\n", |
| 255 | + " .mean()\n", |
| 256 | + " )\n", |
| 257 | + "\n", |
| 258 | + " # 5) 列名変更 + 小数第2位に丸める\n", |
| 259 | + " out = out.rename(columns={\"experience_years\": \"average_years\"})\n", |
| 260 | + " out[\"average_years\"] = out[\"average_years\"].round(2)\n", |
| 261 | + "\n", |
| 262 | + " return out\n", |
| 263 | + "\n", |
| 264 | + "Analyze Complexity\n", |
| 265 | + "Runtime 283 ms\n", |
| 266 | + "Beats 62.61%\n", |
| 267 | + "Memory 68.14 MB\n", |
| 268 | + "Beats 98.61%\n", |
| 269 | + "\n", |
| 270 | + "```\n", |
| 271 | + "\n", |
| 272 | + "### この書き方の狙い\n", |
| 273 | + "\n", |
| 274 | + "* `merge` は「左・右両方の列をフルで抱えた新しい DataFrame」を作るので、そのぶんメモリを食いやすい\n", |
| 275 | + "* `map` は「Series の値を 1 列だけ投げ込む」だけなので、結合の中では比較的軽い\n", |
| 276 | + "* `employee` 側は `set_index` して `Series` 化していて、\n", |
| 277 | + " **キー→値** のマップとして最小限の形にしている\n", |
| 278 | + "\n", |
| 279 | + "大幅なメモリ削減にはならないかもしれませんが、\n", |
| 280 | + "\n", |
| 281 | + "* 一時的に作るオブジェクトがやや小さくなる\n", |
| 282 | + "* 実行時間も微妙に改善する可能性はある\n", |
| 283 | + "\n", |
| 284 | + "ので、LeetCode のメモリ percentile を数ポイント押し上げられる可能性はあります。\n", |
| 285 | + "\n", |
| 286 | + "---\n", |
| 287 | + "\n", |
| 288 | + "## 3) どこまで改善を狙うべきか?\n", |
| 289 | + "\n", |
| 290 | + "正直なところ、今回の問題は\n", |
| 291 | + "\n", |
| 292 | + "* 入力サイズもそれほど極端ではない\n", |
| 293 | + "* アルゴリズムも「JOIN + GROUP BY」で頭打ち\n", |
| 294 | + "\n", |
| 295 | + "なので、**すでに「クエリとしては十分に最適な部類」** です。\n", |
| 296 | + "\n", |
| 297 | + "283ms / Beats 62% という数字は、少しコードをいじったくらいで\n", |
| 298 | + "劇的に変わるものではないので、\n", |
| 299 | + "\n", |
| 300 | + "* 可読性\n", |
| 301 | + "* 素直さ(変なトリックを使わない)\n", |
| 302 | + "* バグの入りにくさ\n", |
| 303 | + "\n", |
| 304 | + "を優先しつつ、上の `map` 版のような「軽い改善」を入れておけばかなり良いラインだと思います。\n", |
| 305 | + "\n", |
| 306 | + "---\n", |
| 307 | + "\n", |
| 308 | + "もしこの後、\n", |
| 309 | + "\n", |
| 310 | + "* 別の LeetCode pandas 問題\n", |
| 311 | + "* もっと重いグループ演算(複数条件 / 上位 k / window関数っぽい処理)\n", |
| 312 | + "\n", |
| 313 | + "などが出てきたら、そこは **`groupby.transform` / `rank` / `merge` 戦略**を総合的に組み立てる練習ネタにできます。\n" |
| 314 | + ] |
| 315 | + } |
| 316 | + ], |
| 317 | + "metadata": { |
| 318 | + "language_info": { |
| 319 | + "name": "python" |
| 320 | + } |
| 321 | + }, |
| 322 | + "nbformat": 4, |
| 323 | + "nbformat_minor": 5 |
| 324 | +} |
0 commit comments