-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Add comprehensive Bash 10th line extraction tutorial with visua… #240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,363 @@ | ||
| { | ||
| "cells": [ | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "43f342e3", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# 192. Word Frequency - Bash解法\n", | ||
| "\n", | ||
| "## 問題概要\n", | ||
| "\n", | ||
| "テキストファイル `words.txt` から各単語の出現頻度を集計し、頻度の降順で出力する問題です。\n", | ||
| "\n", | ||
| "---\n", | ||
| "\n", | ||
| "## 解答(スクリプト版)\n", | ||
| "\n", | ||
| "`wordfreq.sh`(Bash, POSIX ツールのみ)\n", | ||
| "\n", | ||
| "```bash\n", | ||
| "#!/usr/bin/env bash\n", | ||
| "set -euo pipefail\n", | ||
| "\n", | ||
| "# 使い方: ./wordfreq.sh [path/to/words.txt]\n", | ||
| "# 引数が未指定なら ./words.txt を読む\n", | ||
| "input=\"${1:-words.txt}\"\n", | ||
| "\n", | ||
| "# 1) 全ての空白(スペース/タブ/改行など)を改行にし、連続空白は1つに圧縮\n", | ||
| "# 2) ソート\n", | ||
| "# 3) uniq -c で頻度集計\n", | ||
| "# 4) 頻度(第1列)で数値降順ソート\n", | ||
| "# 5) \"単語 頻度\" の並びに整形\n", | ||
| "LC_ALL=C tr -s '[:space:]' '\\n' < \"$input\" \\\n", | ||
| " | sort \\\n", | ||
| " | uniq -c \\\n", | ||
| " | sort -nr \\\n", | ||
| " | awk '{print $2, $1}'\n", | ||
| "```\n", | ||
| "\n", | ||
| "### 実行方法\n", | ||
| "\n", | ||
| "```bash\n", | ||
| "chmod +x wordfreq.sh\n", | ||
| "./wordfreq.sh # カレントの words.txt を集計\n", | ||
| "# もしくは\n", | ||
| "./wordfreq.sh /path/to/words.txt\n", | ||
| "```\n", | ||
| "\n", | ||
| "---\n", | ||
| "\n", | ||
| "## 解答(パイプのみの1行版)\n", | ||
| "\n", | ||
| "```bash\n", | ||
| "LC_ALL=C tr -s '[:space:]' '\\n' < words.txt | sort | uniq -c | sort -nr | awk '{print $2, $1}'\n", | ||
| "```\n", | ||
| "\n", | ||
| "---\n", | ||
| "\n", | ||
| "## 入出力例\n", | ||
| "\n", | ||
| "### 入力 (`words.txt`)\n", | ||
| "\n", | ||
| "```text\n", | ||
| "the day is sunny the the\n", | ||
| "the sunny is is\n", | ||
| "```\n", | ||
| "\n", | ||
| "### 出力\n", | ||
| "\n", | ||
| "```text\n", | ||
| "the 4\n", | ||
| "is 3\n", | ||
| "sunny 2\n", | ||
| "day 1\n", | ||
| "```\n", | ||
| "\n", | ||
| "---\n", | ||
| "\n", | ||
| "## 処理フロー図解\n", | ||
| "\n", | ||
| "```mermaid\n", | ||
| "flowchart LR\n", | ||
| " A[\"words.txt<br/>入力ファイル\"] --> B[\"<code>tr -s [:space:] \\\\n</code><br/>全ての空白→改行<br/>連続空白を1つに圧縮\"]\n", | ||
| " B --> C[\"<code>sort</code><br/>辞書順整列\"]\n", | ||
| " C --> D[\"<code>uniq -c</code><br/>連続同一語をカウント\"]\n", | ||
| " D --> E[\"<code>sort -nr</code><br/>頻度で降順ソート\"]\n", | ||
| " E --> F[\"<code>awk {print $2, $1}</code><br/>「単語 頻度」形式に整形\"]\n", | ||
| " F --> G[\"結果出力\"]\n", | ||
| "```\n", | ||
| "\n", | ||
| "---\n", | ||
| "\n", | ||
| "## ステップ別の処理詳細\n", | ||
| "\n", | ||
| "### 入力データ\n", | ||
| "\n", | ||
| "```text\n", | ||
| "the day is sunny the the\n", | ||
| "the sunny is is\n", | ||
| "```\n", | ||
| "\n", | ||
| "### ステップ1: `tr -s '[:space:]' '\\n'`\n", | ||
| "\n", | ||
| "全ての空白文字(スペース・タブ・改行)を改行に変換し、連続する空白は1つに圧縮します。\n", | ||
| "\n", | ||
| "```text\n", | ||
| "the\n", | ||
| "day\n", | ||
| "is\n", | ||
| "sunny\n", | ||
| "the\n", | ||
| "the\n", | ||
| "the\n", | ||
| "sunny\n", | ||
| "is\n", | ||
| "is\n", | ||
| "```\n", | ||
| "\n", | ||
| "### ステップ2: `sort`\n", | ||
| "\n", | ||
| "単語を辞書順に整列します(`uniq -c` は連続した同一行のみカウントするため必須)。\n", | ||
| "\n", | ||
| "```text\n", | ||
| "day\n", | ||
| "is\n", | ||
| "is\n", | ||
| "is\n", | ||
| "sunny\n", | ||
| "sunny\n", | ||
| "the\n", | ||
| "the\n", | ||
| "the\n", | ||
| "the\n", | ||
| "```\n", | ||
| "\n", | ||
| "### ステップ3: `uniq -c`\n", | ||
| "\n", | ||
| "連続する同一単語をカウントします。\n", | ||
| "\n", | ||
| "```text\n", | ||
| " 1 day\n", | ||
| " 3 is\n", | ||
| " 2 sunny\n", | ||
| " 4 the\n", | ||
| "```\n", | ||
| "\n", | ||
| "### ステップ4: `sort -nr`\n", | ||
| "\n", | ||
| "頻度(第1列)で数値降順ソートします。\n", | ||
| "\n", | ||
| "```text\n", | ||
| " 4 the\n", | ||
| " 3 is\n", | ||
| " 2 sunny\n", | ||
| " 1 day\n", | ||
| "```\n", | ||
| "\n", | ||
| "### ステップ5: `awk '{print $2, $1}'`\n", | ||
| "\n", | ||
| "「単語 頻度」の形式に整形します。\n", | ||
| "\n", | ||
| "```text\n", | ||
| "the 4\n", | ||
| "is 3\n", | ||
| "sunny 2\n", | ||
| "day 1\n", | ||
| "```\n", | ||
| "\n", | ||
| "---\n", | ||
| "\n", | ||
| "## アルゴリズムの解説\n", | ||
| "\n", | ||
| "### なぜこの順番なのか?\n", | ||
| "\n", | ||
| "1. **`tr` で正規化**\n", | ||
| " - 様々な空白文字(スペース・タブ・改行)を統一的に処理\n", | ||
| " - 連続空白の圧縮により空行を防止\n", | ||
| "\n", | ||
| "2. **最初の `sort` が必須**\n", | ||
| " - `uniq -c` は**連続した**同一行のみカウント\n", | ||
| " - 事前に整列することで同じ単語を隣接させる\n", | ||
| "\n", | ||
| "3. **`uniq -c` で集計**\n", | ||
| " - 連続する同一単語の出現回数をカウント\n", | ||
| " - 出力形式: `<頻度> <単語>`\n", | ||
| "\n", | ||
| "4. **`sort -nr` で降順**\n", | ||
| " - `-n`: 数値としてソート\n", | ||
| " - `-r`: 降順(reverse)\n", | ||
| "\n", | ||
| "5. **`awk` で整形**\n", | ||
| " - 列の順序を入れ替え: `$2 $1` → `<単語> <頻度>`\n", | ||
| "\n", | ||
| "---\n", | ||
| "\n", | ||
| "## 代替解法(awk メイン)\n", | ||
| "\n", | ||
| "`awk` の連想配列を使った方法:\n", | ||
| "\n", | ||
| "```bash\n", | ||
| "awk '{for(i=1;i<=NF;i++) c[$i]++} END{for(w in c) print w, c[w]}' words.txt \\\n", | ||
| " | LC_ALL=C sort -k2,2nr\n", | ||
| "```\n", | ||
| "\n", | ||
| "### 処理の流れ\n", | ||
| "\n", | ||
| "1. `awk` で各単語をカウント\n", | ||
| " - `NF`: 行内のフィールド数(空白区切り)\n", | ||
| " - `c[$i]++`: 連想配列でカウント\n", | ||
| "\n", | ||
| "2. `END` ブロックで出力\n", | ||
| " - `for(w in c)`: 全ての単語をループ\n", | ||
| " - `print w, c[w]`: 単語と頻度を出力\n", | ||
| "\n", | ||
| "3. `sort -k2,2nr` で頻度降順ソート\n", | ||
| " - `-k2,2`: 第2列(頻度)でソート\n", | ||
| " - `n`: 数値ソート\n", | ||
| " - `r`: 降順\n", | ||
| "\n", | ||
| "---\n", | ||
| "\n", | ||
| "## パフォーマンス最適化のポイント\n", | ||
| "\n", | ||
| "### 1. ロケール設定\n", | ||
| "\n", | ||
| "```bash\n", | ||
| "LC_ALL=C\n", | ||
| "```\n", | ||
| "\n", | ||
| "- C ロケールを使用することで `sort` が高速化\n", | ||
| "- バイト単位の比較により安定した動作\n", | ||
| "\n", | ||
| "### 2. 空行の除去(必要に応じて)\n", | ||
| "\n", | ||
| "`tr -s` を使っていれば基本的に不要ですが、念のため:\n", | ||
| "\n", | ||
| "```bash\n", | ||
| "... | grep -v '^$' | ...\n", | ||
| "```\n", | ||
| "\n", | ||
| "### 3. 入力ファイルの柔軟な指定\n", | ||
| "\n", | ||
| "スクリプト版では引数でファイルパスを指定可能:\n", | ||
| "\n", | ||
| "```bash\n", | ||
| "input=\"${1:-words.txt}\"\n", | ||
| "```\n", | ||
| "\n", | ||
| "---\n", | ||
| "\n", | ||
| "## 応用例\n", | ||
| "\n", | ||
| "### 圧縮ファイルの処理\n", | ||
| "\n", | ||
| "```bash\n", | ||
| "zcat compressed.txt.gz | tr -s '[:space:]' '\\n' | sort | uniq -c | sort -nr | awk '{print $2, $1}'\n", | ||
| "```\n", | ||
| "\n", | ||
| "### ストリーム処理\n", | ||
| "\n", | ||
| "```bash\n", | ||
| "curl -s https://example.com/text.txt | tr -s '[:space:]' '\\n' | sort | uniq -c | sort -nr | awk '{print $2, $1}'\n", | ||
|
Comment on lines
+259
to
+262
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick | 🔵 Trivial ストリーム処理の例に堅牢性の改善を検討してください。
curl -sf https://example.com/text.txt | tr -s '[:space:]' '\n' | ...
🤖 Prompt for AI Agents |
||
| "```\n", | ||
| "\n", | ||
| "### 大文字小文字を区別しない\n", | ||
| "\n", | ||
| "```bash\n", | ||
| "LC_ALL=C tr -s '[:space:]' '\\n' < words.txt \\\n", | ||
| " | tr '[:upper:]' '[:lower:]' \\\n", | ||
| " | sort \\\n", | ||
| " | uniq -c \\\n", | ||
| " | sort -nr \\\n", | ||
| " | awk '{print $2, $1}'\n", | ||
| "```\n", | ||
| "\n", | ||
| "---\n", | ||
| "\n", | ||
| "## よくある質問\n", | ||
| "\n", | ||
| "### Q1: `tr -s` の `-s` オプションは何をする?\n", | ||
| "\n", | ||
| "**A:** `-s` (squeeze) は連続する文字を1つに圧縮します。\n", | ||
| "\n", | ||
| "```bash\n", | ||
| "# 例: 連続するスペースを1つに\n", | ||
| "echo \"a b c\" | tr -s ' '\n", | ||
| "# 出力: a b c\n", | ||
| "```\n", | ||
| "\n", | ||
| "### Q2: なぜ `LC_ALL=C` を使うのか?\n", | ||
| "\n", | ||
| "**A:** \n", | ||
| "- ロケール依存の文字比較を避ける\n", | ||
| "- バイト単位の比較で高速化\n", | ||
| "- 環境による動作の違いを防ぐ\n", | ||
| "\n", | ||
| "### Q3: `uniq -c` の出力形式は?\n", | ||
| "\n", | ||
| "**A:** `<頻度><スペース><単語>` の形式で出力されます。\n", | ||
| "\n", | ||
| "```text\n", | ||
| " 4 the\n", | ||
| " 3 is\n", | ||
| "```\n", | ||
| "\n", | ||
| "先頭にスペースが入るため、`awk` で列を入れ替える際は `$1` が頻度、`$2` が単語になります。\n", | ||
| "\n", | ||
| "---\n", | ||
| "\n", | ||
| "## Mermaid図の注意点\n", | ||
| "\n", | ||
| "Mermaid でコマンドを含むラベルを書く際の安全な記法:\n", | ||
| "\n", | ||
| "### 特殊文字のエスケープ\n", | ||
| "\n", | ||
| "- 角かっこ `[` `]` → `[` `]`\n", | ||
| "- 波かっこ `{` `}` → `{` `}`\n", | ||
| "- バックスラッシュ `\\` → `\\\\`\n", | ||
| "- シングルクォート `'` → `'`(必要な場合)\n", | ||
| "\n", | ||
| "### 推奨記法\n", | ||
| "\n", | ||
| "```mermaid\n", | ||
| "flowchart LR\n", | ||
| " A[\"ノード名\"] --> B[\"<code>コマンド</code><br/>説明文\"]\n", | ||
| "```\n", | ||
| "\n", | ||
| "- ラベル全体を二重引用符 `[\"...\"]` で囲む\n", | ||
| "- コマンド部分は `<code>` タグで囲む\n", | ||
| "- 改行は `<br/>` を使用\n", | ||
| "\n", | ||
| "---\n", | ||
| "\n", | ||
| "## まとめ\n", | ||
| "\n", | ||
| "この問題の解法ポイント:\n", | ||
| "\n", | ||
| "1. **`tr`** で空白を正規化\n", | ||
| "2. **`sort`** で同一単語を隣接させる\n", | ||
| "3. **`uniq -c`** で頻度をカウント\n", | ||
| "4. **`sort -nr`** で頻度降順ソート\n", | ||
| "5. **`awk`** で出力形式を整形\n", | ||
| "\n", | ||
| "シンプルな POSIX ツールの組み合わせで効率的に処理できます。\n", | ||
| "\n", | ||
| "主な改善点:\n", | ||
| "1. 重複セクションを完全に削除\n", | ||
| "2. 構造を論理的に整理(問題→解答→詳細→応用)\n", | ||
| "3. Mermaid図を1つに統一(安全な記法を使用)\n", | ||
| "4. よくある質問セクションを追加\n", | ||
| "5. 応用例を充実\n", | ||
| "6. Mermaid記法の注意点を最後にまとめ" | ||
|
Comment on lines
+346
to
+352
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 開発メモがドキュメントに残っています。 「主な改善点」セクション(項目1-6)は、リファクタリング時の内部メモのようです。エンドユーザー向けチュートリアルには不要な情報であり、削除を推奨します。 🔧 推奨される修正 シンプルな POSIX ツールの組み合わせで効率的に処理できます。
-\n",
- "\n",
- "主な改善点:\n",
- "1. 重複セクションを完全に削除\n",
- "2. 構造を論理的に整理(問題→解答→詳細→応用)\n",
- "3. Mermaid図を1つに統一(安全な記法を使用)\n",
- "4. よくある質問セクションを追加\n",
- "5. 応用例を充実\n",
- "6. Mermaid記法の注意点を最後にまとめ"
+"🤖 Prompt for AI Agents |
||
| ] | ||
| } | ||
| ], | ||
| "metadata": { | ||
| "language_info": { | ||
| "name": "python" | ||
| } | ||
| }, | ||
|
Comment on lines
+356
to
+360
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick | 🔵 Trivial ノートブックのメタデータについて。
🤖 Prompt for AI Agents |
||
| "nbformat": 4, | ||
| "nbformat_minor": 5 | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
入力ファイルの存在チェックを追加することを推奨します。
set -eによりファイルが存在しない場合はスクリプトが終了しますが、エラーメッセージがユーザーフレンドリーではありません。明示的なバリデーションを追加すると、デバッグが容易になります。♻️ 推奨される修正
input="${1:-words.txt}" +# 入力ファイルの存在確認 +if [[ ! -f "$input" ]]; then + echo "Error: File '$input' not found." >&2 + exit 1 +fi + # 1) 全ての空白(スペース/タブ/改行など)を改行にし、連続空白は1つに圧縮📝 Committable suggestion
🤖 Prompt for AI Agents