Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,324 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "dd3c3113",
"metadata": {},
"source": [
"## 0) 前提\n",
"\n",
"* 環境: **Python 3.10.15 / pandas 2.2.2**\n",
"* **指定シグネチャ厳守**\n",
"\n",
" * 関数名: `project_employees`\n",
" * 引数名: `project`, `employee`\n",
" * 返却列: `[\"project_id\", \"average_years\"]`\n",
" * 列順: 上記順序\n",
"* I/O 禁止(ファイル / 標準出力)、`print` / `sort_values` は使用しない\n",
"\n",
"---\n",
"\n",
"## 1) 問題\n",
"\n",
"* `{{PROBLEM_STATEMENT}}`\n",
" 各プロジェクトについて、そのプロジェクトにアサインされている従業員の\n",
" **平均経験年数 (`experience_years`) を小数第 2 位に丸めて** 求める。\n",
"\n",
"* 入力 DF: `{{INPUT_DATAFRAMES}}`\n",
"\n",
" * `project: pd.DataFrame`\n",
"\n",
" | column | dtype |\n",
" | ----------- | ----- |\n",
" | project_id | int |\n",
" | employee_id | int |\n",
"\n",
" 各行は「従業員 `employee_id` がプロジェクト `project_id` に所属している」ことを表す。\n",
"\n",
" * `employee: pd.DataFrame`\n",
"\n",
" | column | dtype |\n",
" | ---------------- | ------ |\n",
" | employee_id | int |\n",
" | name | object |\n",
" | experience_years | int |\n",
"\n",
" 各行は従業員 1 名の情報。`experience_years` は NULL なし。\n",
"\n",
"* 出力: `{{OUTPUT_COLUMNS_AND_RULES}}`\n",
"\n",
" * 戻り値: `pd.DataFrame`\n",
"\n",
" * 列と意味:\n",
"\n",
" * `project_id`: プロジェクト ID\n",
" * `average_years`: そのプロジェクトに所属する従業員の `experience_years` の平均値(小数第 2 位で丸め)\n",
"\n",
" * 各 `project_id` につき 1 行\n",
"\n",
" * 並び順は任意(`sort_values` 禁止のためソートしない)\n",
"\n",
"---\n",
"\n",
"## 2) 実装(指定シグネチャ厳守)\n",
"\n",
"> 列を最小化しつつ `merge` → `groupby.mean` → `round` の順に処理します。\n",
"> 今回はグループ内順位や条件抽出は不要なので、シンプルな集約だけで OK です。\n",
"\n",
"```python\n",
"import pandas as pd\n",
"\n",
"def project_employees(project: pd.DataFrame, employee: pd.DataFrame) -> pd.DataFrame:\n",
" \"\"\"\n",
" 各プロジェクトごとの平均経験年数を計算する。\n",
"\n",
" Args:\n",
" project (pd.DataFrame): 列 ['project_id', 'employee_id']\n",
" employee (pd.DataFrame): 列 ['employee_id', 'name', 'experience_years']\n",
"\n",
" Returns:\n",
" pd.DataFrame: 列名と順序は ['project_id', 'average_years']\n",
" \"\"\"\n",
" # 1) 列最小化: employee 側は平均に必要な列だけに絞る\n",
" emp_exp = employee[[\"employee_id\", \"experience_years\"]]\n",
"\n",
" # 2) JOIN: project に experience_years を紐づける\n",
" merged = project.merge(emp_exp, on=\"employee_id\", how=\"left\")\n",
"\n",
" # 3) プロジェクトごとに平均値を計算\n",
" out = (\n",
" merged\n",
" .groupby(\"project_id\", as_index=False)[\"experience_years\"]\n",
" .mean()\n",
" )\n",
"\n",
" # 4) 列名を仕様どおりにリネームし、小数第 2 位に丸める\n",
" out = out.rename(columns={\"experience_years\": \"average_years\"})\n",
" out[\"average_years\"] = out[\"average_years\"].round(2)\n",
"\n",
" return out\n",
"\n",
"Analyze Complexity\n",
"Runtime 283 ms\n",
"Beats 62.61%\n",
"Memory 69.14 MB\n",
"Beats 18.00%\n",
"\n",
"```\n",
"\n",
"---\n",
"\n",
"## 3) アルゴリズム説明\n",
"\n",
"### 使用 API\n",
"\n",
"* `DataFrame[...]`\n",
" → 列のサブセットを取り、**列最小化**(不要列を運ばないことでメモリ削減)。\n",
"* `DataFrame.merge`\n",
" → `project` と `employee` を `employee_id` で結合し、従業員の経験年数をプロジェクトに紐付け。\n",
"* `DataFrame.groupby` + `GroupBy.mean`\n",
" → `project_id` ごとに `experience_years` の平均を計算。\n",
"* `DataFrame.rename`\n",
" → 出力列名を問題仕様どおり `average_years` に。\n",
"* `Series.round`\n",
" → 平均値を小数第 2 位に丸める。\n",
"\n",
"### NULL / 重複 / 型の扱い\n",
"\n",
"* `experience_years` は問題文より **NULL なし** 前提なので、`mean()` で NULL ケアは不要。\n",
"* `project` 側は `(project_id, employee_id)` が主キーなので、\n",
" 「同じプロジェクトに同じ従業員が複数行いる」ことはなく、重複による二重カウントも発生しない。\n",
"* `mean()` の戻り値は `float64`。\n",
" `round(2)` で `2.0` → 画面では `2.00` 相当の精度を保持できる(pandas の表示設定次第)。\n",
"\n",
"---\n",
"\n",
"## 4) 計算量(概算)\n",
"\n",
"`N = len(project)`, `M = len(employee)` とします。\n",
"\n",
"* 列最小化: `employee[[\"employee_id\", \"experience_years\"]]`\n",
" → **O(M)**\n",
"* `merge`(ハッシュ結合想定): `project.merge(emp_exp, on=\"employee_id\")`\n",
" → **O(N + M)** 近辺\n",
"* `groupby(\"project_id\").mean()`\n",
" → **O(N)**(ハッシュベースのグループ集計)\n",
"\n",
"したがって、全体はおおよそ **O(N + M)** 時間、メモリは\n",
"結合後の一時 DataFrame(行数 ≒ N、列数は 3 列)相当となります。\n",
"\n",
"---\n",
"\n",
"## 5) 図解(Mermaid 超保守版)\n",
"\n",
"```mermaid\n",
"flowchart TD\n",
" A[project<br/>project_id, employee_id]\n",
" B[employee<br/>employee_id, experience_years, name]\n",
" C[列最小化<br/>employee → employee_id, experience_years のみ]\n",
" D[merge<br/>on employee_id]\n",
" E[\"groupby project_id<br/>mean(experience_years)\"]\n",
" F[\"列名変更 + round(2)<br/>average_years\"]\n",
" G[出力<br/>project_id, average_years]\n",
"\n",
" B --> C\n",
" A --> D\n",
" C --> D\n",
" D --> E\n",
" E --> F\n",
" F --> G\n",
"```\n",
"\n",
"この形なら、LeetCode の pandas 版「Project Employees I」にそのままコピペ投入できるはずです。\n",
"\n",
"結論から言うと、\n",
"\n",
"* **計算量レベルではほぼ限界まで来ていて、大きな高速化は期待しにくい**\n",
"* ただし、**merge → map に変えることで、メモリとわずかな実行時間改善の余地はある**\n",
"* 特にメモリ使用量(Beats 18%)を少しでも良くしたいなら、「結合の仕方」を軽くするのがポイント\n",
"\n",
"という感じです。\n",
"\n",
"---\n",
"\n",
"## 1) いまのアプローチの整理\n",
"\n",
"あなたの現行コード(要約)は:\n",
"\n",
"```python\n",
"emp_exp = employee[[\"employee_id\", \"experience_years\"]]\n",
"\n",
"merged = project.merge(emp_exp, on=\"employee_id\", how=\"left\")\n",
"\n",
"out = (\n",
" merged\n",
" .groupby(\"project_id\", as_index=False)[\"experience_years\"]\n",
" .mean()\n",
")\n",
"\n",
"out = out.rename(columns={\"experience_years\": \"average_years\"})\n",
"out[\"average_years\"] = out[\"average_years\"].round(2)\n",
"```\n",
"\n",
"やっていることは完全に正しくて、アルゴリズム的にも\n",
"\n",
"* 結合: O(N + M)\n",
"* groupby: O(N)\n",
"\n",
"で、ここから**オーダーを変える改善はできません**。\n",
"\n",
"LeetCode の 283ms / Beats 62% という数字も、\n",
"環境ノイズを含めて「十分良い」側です。\n",
"\n",
"---\n",
"\n",
"## 2) 改善ポイント:`merge` → `map` で軽量化\n",
"\n",
"`project` 側にはすでに `employee_id` が入っているので、\n",
"\n",
"> わざわざ `merge` で行を膨らませるのではなく、\n",
"> **`employee_id → experience_years` のマッピングを作って `map` する**\n",
"\n",
"方が、メモリ的には少し有利になり得ます。\n",
"\n",
"### 修正版コード(`map` ベース)\n",
"\n",
"```python\n",
"import pandas as pd\n",
"\n",
"def project_employees_i(project: pd.DataFrame, employee: pd.DataFrame) -> pd.DataFrame:\n",
" \"\"\"\n",
" 各プロジェクトごとの平均経験年数を計算する。\n",
"\n",
" Args:\n",
" project (pd.DataFrame): 列 ['project_id', 'employee_id']\n",
" employee (pd.DataFrame): 列 ['employee_id', 'name', 'experience_years']\n",
"\n",
" Returns:\n",
" pd.DataFrame: 列名と順序は ['project_id', 'average_years']\n",
" \"\"\"\n",
" # 1) employee_id -> experience_years のマップを作る(列最小化 + index 化)\n",
" emp_exp = employee.set_index(\"employee_id\")[\"experience_years\"]\n",
"\n",
" # 2) project に対応する経験年数列を付与(merge の代わりに map)\n",
" # 列最小化: project 側も必要な列だけ使う\n",
" proj = project[[\"project_id\", \"employee_id\"]].copy()\n",
" proj[\"experience_years\"] = proj[\"employee_id\"].map(emp_exp)\n",
"\n",
" # 3) 集約に不要な employee_id を削る(メモリ削減の一手)\n",
" proj = proj[[\"project_id\", \"experience_years\"]]\n",
"\n",
" # 4) プロジェクトごとに平均値を計算\n",
" out = (\n",
" proj\n",
" .groupby(\"project_id\", as_index=False)[\"experience_years\"]\n",
" .mean()\n",
" )\n",
"\n",
" # 5) 列名変更 + 小数第2位に丸める\n",
" out = out.rename(columns={\"experience_years\": \"average_years\"})\n",
" out[\"average_years\"] = out[\"average_years\"].round(2)\n",
"\n",
" return out\n",
"\n",
"Analyze Complexity\n",
"Runtime 283 ms\n",
"Beats 62.61%\n",
"Memory 68.14 MB\n",
"Beats 98.61%\n",
"\n",
"```\n",
"\n",
"### この書き方の狙い\n",
"\n",
"* `merge` は「左・右両方の列をフルで抱えた新しい DataFrame」を作るので、そのぶんメモリを食いやすい\n",
"* `map` は「Series の値を 1 列だけ投げ込む」だけなので、結合の中では比較的軽い\n",
"* `employee` 側は `set_index` して `Series` 化していて、\n",
" **キー→値** のマップとして最小限の形にしている\n",
"\n",
"大幅なメモリ削減にはならないかもしれませんが、\n",
"\n",
"* 一時的に作るオブジェクトがやや小さくなる\n",
"* 実行時間も微妙に改善する可能性はある\n",
"\n",
"ので、LeetCode のメモリ percentile を数ポイント押し上げられる可能性はあります。\n",
"\n",
"---\n",
"\n",
"## 3) どこまで改善を狙うべきか?\n",
"\n",
"正直なところ、今回の問題は\n",
"\n",
"* 入力サイズもそれほど極端ではない\n",
"* アルゴリズムも「JOIN + GROUP BY」で頭打ち\n",
"\n",
"なので、**すでに「クエリとしては十分に最適な部類」** です。\n",
"\n",
"283ms / Beats 62% という数字は、少しコードをいじったくらいで\n",
"劇的に変わるものではないので、\n",
"\n",
"* 可読性\n",
"* 素直さ(変なトリックを使わない)\n",
"* バグの入りにくさ\n",
"\n",
"を優先しつつ、上の `map` 版のような「軽い改善」を入れておけばかなり良いラインだと思います。\n",
"\n",
"---\n",
"\n",
"もしこの後、\n",
"\n",
"* 別の LeetCode pandas 問題\n",
"* もっと重いグループ演算(複数条件 / 上位 k / window関数っぽい処理)\n",
"\n",
"などが出てきたら、そこは **`groupby.transform` / `rank` / `merge` 戦略**を総合的に組み立てる練習ネタにできます。\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading