Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a12bd06c",
"metadata": {},
"source": [
"# MySQL 8.0.40\n",
"\n",
"## 0) 前提\n",
"\n",
"* エンジン: **MySQL 8**\n",
"* 並び順: 任意(`ORDER BY` を付けない)\n",
"* `NOT IN` は NULL 罠のため回避\n",
"* 判定は **ID 基準**、表示は仕様どおりの列名と順序\n",
"\n",
"## 1) 問題\n",
"\n",
"* `MyNumbers` から **ちょうど1回だけ出現する数(single number)**のうち **最大の数**を1行で返す。存在しなければ `null` を返す。\n",
"\n",
"* 入力テーブル例:\n",
"\n",
" ```\n",
" Table: MyNumbers\n",
" +-------------+------+\n",
" | Column Name | Type |\n",
" +-------------+------+\n",
" | num | int |\n",
" +-------------+------+\n",
" -- 重複あり得る\n",
" ```\n",
"\n",
"* 出力仕様:\n",
"\n",
" ```\n",
" +-----+\n",
" | num |\n",
" +-----+\n",
" | 6 | -- single number の最大。存在しなければ NULL\n",
" +-----+\n",
" ```\n",
"\n",
"## 2) 最適解(単一クエリ)\n",
"\n",
"> ウィンドウ関数で「出現回数」を各行に載せ、そのうち `cnt = 1` の `num` の **最大値**を投影。\n",
"\n",
"```sql\n",
"WITH win AS (\n",
" SELECT\n",
" num,\n",
" COUNT(*) OVER (PARTITION BY num) AS cnt\n",
" FROM MyNumbers\n",
")\n",
"SELECT\n",
" MAX(num) AS num\n",
"FROM win\n",
"WHERE cnt = 1;\n",
"\n",
"Runtime 392 ms\n",
"Beats 64.64%\n",
"\n",
"```\n",
"\n",
"* `MAX(num)` により **並び替え不要**で最大の single number を1行で取得\n",
"* single number が存在しない場合、`MAX` の母集団が空になり **`NULL` を返す**(要件どおり)\n",
"\n",
"## 3) 代替解\n",
"\n",
"> 集約のみで十分なサイズなら、`GROUP BY ... HAVING` → その最大値を返す。\n",
"\n",
"```sql\n",
"SELECT\n",
" MAX(num) AS num\n",
"FROM (\n",
" SELECT num\n",
" FROM MyNumbers\n",
" GROUP BY num\n",
" HAVING COUNT(*) = 1\n",
") s;\n",
"\n",
"Runtime 418 ms\n",
"Beats 39.99%\n",
"\n",
"```\n",
"\n",
"* `NOT IN` 不要、`NULL` でも安全\n",
"* インデックスがないと全表スキャンになる点はウィンドウ版と同様\n",
"\n",
"## 4) 要点解説\n",
"\n",
"* **方針**:\n",
"\n",
" 1. 各 `num` の出現回数を計算(ウィンドウ or 集約)\n",
" 2. `= 1`(single)に絞る\n",
" 3. **最大値**だけを返す → `ORDER BY`・`LIMIT` 不要\n",
"* **NULL / 重複**:\n",
"\n",
" * `num` に `NULL` があっても `COUNT(*)` は `NULL` を数えるため、`num IS NULL` は single になり得る。ただし問題の意図は整数なので通常は非NULL前提。もし `NULL` 行があっても `MAX(num)` は `NULL` を無視するため影響しない。\n",
"* **安定性**:\n",
"\n",
" * 出力は1行のみで順序不要。`ORDER BY` を付けないほうが速い。\n",
"\n",
"## 5) 計算量(概算)\n",
"\n",
"* ウィンドウ版: `COUNT() OVER (PARTITION BY num)` は **O(N)**~**O(N log N)**(実装依存・ソート/ハッシュ)\n",
"* 集約版: `GROUP BY num` は **O(N)**~**O(N log N)**\n",
"* 推奨インデックス: `INDEX(num)` があればハッシュ/ツリー集約が効きやすい\n",
"\n",
"## 6) 図解(Mermaid 超保守版)\n",
"\n",
"```mermaid\n",
"flowchart TD\n",
" A[入力 MyNumbers] --> B[出現回数を算出 cnt]\n",
" B --> C[cnt が 1 の行に絞る]\n",
" C --> D[最大 num を求める]\n",
" D --> E[出力 列 num だけ]\n",
"```\n",
"\n",
"いい感じの結果です(特にウィンドウ版で ~65% 上回り)が、**もう少し縮められる余地**はあります。要点だけ手短に👇\n",
"\n",
"---\n",
"\n",
"## まずはインデックス\n",
"\n",
"```sql\n",
"CREATE INDEX ix_mynumbers_num ON MyNumbers(num);\n",
"```\n",
"\n",
"* `GROUP BY num` / `COUNT(*)` が**インデックス順走査**でまとまりやすくなり、\n",
" 一時テーブルやファイルソートの発生を抑制できます(環境次第で体感差が大きいところ)。\n",
"\n",
"---\n",
"\n",
"## 速度重視の実戦解(早期終了を効かせる)\n",
"\n",
"> 並び順が任意という仕様でしたが、**パフォーマンス最優先**なら `ORDER BY ... DESC LIMIT 1` による **早期終了**が効きます。`INDEX(num)` があると特に強いです。\n",
"\n",
"```sql\n",
"-- 早いことが多い版(上位1件だけ取りに行く)\n",
"SELECT num\n",
"FROM MyNumbers\n",
"GROUP BY num\n",
"HAVING COUNT(*) = 1\n",
"ORDER BY num DESC\n",
"LIMIT 1;\n",
"\n",
"Wrong Answer\n",
"13 / 18 testcases passed\n",
"```\n",
"\n",
"* 右端(最大値側)から**逆順インデックス走査**し、最初に見つかった「出現1回」のグループで終わるため、\n",
" データ分布によっては **大幅短縮**します(特に「大きい値ほどユニークが出やすい」分布)。\n",
"\n",
"> 出力は1行だけで、外側に `MAX` をかける必要はありません。\n",
"\n",
"---\n",
"\n",
"## `ORDER BY` を避けたい場合の最適形\n",
"\n",
"あなたの代替解は正攻法で、インデックス追加だけでも十分効きます。書式はそのままでOK:\n",
"\n",
"```sql\n",
"-- あなたの代替解(INDEXあり想定)\n",
"SELECT\n",
" MAX(num) AS num\n",
"FROM (\n",
" SELECT num\n",
" FROM MyNumbers\n",
" GROUP BY num\n",
" HAVING COUNT(*) = 1\n",
") s;\n",
"\n",
"Runtime 396 ms\n",
"Beats 60.49%\n",
"\n",
"```\n",
"\n",
"* ウィンドウ版より **`GROUP BY` 直集約**のほうが MySQL では速く出ることが多いです(特に `INDEX(num)` あり)。\n",
"\n",
"---\n",
"\n",
"## 重複が極端に多い場合の小技(重複集合を先に作る)\n",
"\n",
"> 「ほとんどが重複で、ユニークが少ない」分布なら、**重複集合だけ先に抽出**して引き算すると速いことがあります。\n",
"\n",
"```sql\n",
"WITH dup AS (\n",
" SELECT num\n",
" FROM MyNumbers\n",
" GROUP BY num\n",
" HAVING COUNT(*) > 1\n",
"),\n",
"uniq AS (\n",
" SELECT DISTINCT num\n",
" FROM MyNumbers\n",
" LEFT JOIN dup USING (num)\n",
" WHERE dup.num IS NULL\n",
")\n",
"SELECT MAX(num) AS num\n",
"FROM uniq;\n",
"\n",
"Runtime 414 ms\n",
"Beats 43.42%\n",
"\n",
"```\n",
"\n",
"* `dup` のサイズが小さくなる分、以降の結合・探索が軽くなります(分布依存)。\n",
"\n",
"---\n",
"\n",
"## 実務メモ\n",
"\n",
"* `EXPLAIN ANALYZE` で\n",
" *「Using index」でグループ化できているか*、\n",
" *一時テーブル/ファイルソートが消えているか* を確認。\n",
"* `ANALYZE TABLE MyNumbers;` で統計を更新しておくとプランが安定。\n",
"* `num` に `NULL` が混じっていても、`GROUP BY` + `HAVING COUNT(*)=1` は正しく動き、\n",
" `MAX(num)` も `NULL` を無視するので問題ありません。\n",
"\n",
"---\n",
"\n",
"### まとめ\n",
"\n",
"* まずは `INDEX(num)` を追加。\n",
"* 速度をさらに取りに行くなら **`GROUP BY ... HAVING COUNT(*)=1 ORDER BY num DESC LIMIT 1`**。\n",
"* `ORDER BY` を使わない方針を守るなら、あなたの **集約版 + インデックス** が最善に近いです。\n",
"\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading