Ever stared at a terminal wondering if your commit should be feat:, fix:, or chore:? Stop guessing.
This tool acts as your personalized, AI-powered commit assistant. Rather than forcing you strictly into generic conventional commits, it fetches your organization's actual historical commit data, uses Machine Learning (ML.NET) to find natural patterns(using K-means clustering) in how your team works, and leverages Google Gemini AI to automatically assign human-readable labels to those patterns. Finally, it provides an interactive prompt to correctly format your new commits on the fly!
- Fully Autonomous ML Pipeline: Automatically scales to your data. It uses Grid Search evaluating the Davies-Bouldin index to dynamically find the optimal number of categories (clusters) for your specific repositories. It even algorithmically penalizes outliers to naturally lean towards a readable 4-8 tag cluster size.
- Smart & Configurable AI Labeling: Uses the latest
gemini-2.5-flashmodel algorithm to semantically label clusters (e.g., "Dependency Updates", "UI Fixes"). Features 4 configurable execution modes (Hybrid,SinglePrompt,PerCluster,LocalOnly) to tightly control API quotas and fallback natively on smart local heuristic algorithms! - State-of-the-Art Reliability: Includes automatic cryptographic hashing (
SHA256) of your datasets. If the data hasn't changed, ML training is completely bypassed and models/labels load in milliseconds from disk. Plus, native exponential backoff protects against Gemini server rate limiting. - Interactive Output: Drop seamlessly into a real-time local terminal loop to instantly predict and format your next commit message against the running AI logic.
- .NET 9 SDK
- A GitHub Personal Access Token (to securely fetch organization commits)
- A Google Gemini API Key (for cluster semantic naming inferencing)
- Install .NET: Ensure you have the .NET 9 Runtime or SDK installed.
- Download: Go to v0.9-alpha and download the
.zipfile for your platform. - Unzip & Run: Extract the zip, setup your
.envfile (see below), and run the executable directly in the unzipped folder.
-
Clone the repository:
git clone https://github.com/abhitrueprogrammer/git-commit-categorize.git cd git-commit-categorize -
Setup your Environment Variables: Create a
.envfile in your root workspace containing the following:GITHUB_TOKEN=your_github_personal_access_token_here GEMINI_API_KEY=your_google_gemini_api_key_here
-
Restore Packages & Build:
cd ConsoleApp2 dotnet restore dotnet build
Run the project directly via the .NET CLI:
dotnet run- The app will pull and cache your JSON dataset.
- If the data is new or uncached, ML.NET prepares the data (80/20 train validation split) and extracts text features. Otherwise, it efficiently reloads the cached hash model!
- A Grid Search evaluates clusters, applies math penalties to prefer
$K=4-8$ , identifies the best valid structure, and saves the trained ML model globally (kmeans_model.zip).
- The Gemini LLM (or the local heuristic fallback) connects and predicts a descriptive human-readable category for the newly structured clusters.
- You will enter the Interactive Labeler:
========================================
--- Interactive Commit Formatting ---
Enter a raw commit message to see it mapped to its AI cluster label.
Type 'exit' to quit.
========================================
Enter commit message: fix padding on the login button
Formatted => UI Bug Fixes: fix padding on the login button
Program.cs: Orchestrates data loading, caching/hashing invalidation checks, ML model loading, and triggers terminal execution.CommitFetcher.cs: Handles standard Octokit GitHub authentications and remote API downloading.Analyser.cs: Contains the heavy ML.NET isolated workflow (Splitting, Featurizing text, Penalized Grid Search K-Values).AiClusterLabeler.cs: Contains the robust Gemini API pipeline, fallback heuristics, multi-mode inference limits, and resilient exponential backoff retry circuits.CommitInteractiveLabeler.cs: Manages dynamic, real-time prediction formatting mapping.



