
🎥 Click the thumbnail above to watch the demo video
A pragmatic, limitless, multi-provider terminal assistant built for developers who hate bloated frameworks.
sot-cli is a limitlessly local Python CLI designed to unleash the true reasoning power of modern LLMs on your projects. By combining a novel architectural pattern called the Source of Truth (SoT) Method with aggressive multi-tool batching, it drastically reduces API costs and model iterations while keeping output quality pristine. It acts as a powerful orchestration engine, empowering your AI with local tools and asynchronous sub-agents to solve complex problems seamlessly.
The name sot-cli is a direct nod to the architectural pattern it is built around — the Source of Truth (SoT) Method — and is intentionally unique so it does not get lost in the sea of generic AI tooling names.
- 📊 SoT Method: Fresh files from disk every turn. No token bloat, always up-to-date.
- 🤖 Async Multi-Agent: Delegate trial-and-error to cheap sub-agents (empty ctx).
- ⚡ Batch Orchestration: Multi-tools + bash/Python scripts in ONE turn.
- 🔧 Full Tools: 19 built-in (incl. unrestricted shell, regex code search, batched multi-file surgical edits) + MCP extensible.
- 🌐 Multi-Provider: Switch OpenRouter/LMStudio/OpenAI/Ollama/NVIDIA/Bedrock live.
- 💰 Native Prompt Caching: Payload architecture designed for prefix-matching, saving up to 50% API costs on long histories by caching static dialogue and keeping dynamic files at the bottom.
- 🧠 Context Awareness: Real-time context limit tracking (Allocated vs. Max) with visual terminal warnings to prevent token overflow.
- ✅ macOS: Fully tested and compatible.
- ✅ Windows: Fully tested and compatible.
- ✅ Linux: Fully tested and compatible.
git clone https://github.com/SoftwareLogico/sot-cli.git
cd sot-cli#uv
uv venv <env_name> --python 3.10
source <env_name>/bin/activate
uv pip install -e .
uv run sot-cli
#conda
conda create -n <env_name> python=3.10
conda activate <env_name>
pip install -e .
sot-cli
#venv
python3 -m venv <env_name>
source <env_name>/bin/activate
pip install -e .
sot-clipip install -e .sot-clipip install --user -e .
sot-cliOr with the module name (underscore, not hyphen):
python -m sot_cliFollow the steps the first time, have Fun!!
Make sure to pull the latest changes and update dependencies with pip install -e . to get the newest features and fixes.
git pull origin main
pip install -e .If you would rather wire things up by hand instead of going through the first-run wizard, After cloning and installing dependencies (see How to Run), follow the steps below.
- 🟨
sot.example.toml=> 🟩sot.toml - 🟨
sot.keys.example.toml=> 🟩sot.keys.toml
These files are already in .gitignore, so your secrets will never be committed.
sot-cli is compatible with any OpenAI‑compatible (OpenAI‑like) API. The following providers have been tested and verified:
-
✅ OpenRouter
-
✅ LM Studio (local)
-
✅ OpenAI (and any OpenAI-compatible API behind the same
openaiprovider name) -
✅ Ollama (local)
-
✅ NVIDIA
-
✅ Bedrock
We will continue adding and testing more providers — contributions welcome.
Edit sot.keys.toml and fill in the providers you intend to use. Local providers (lmstudio, ollama) usually leave the key empty.
[providers.openrouter]
api_key = "sk-or-v1-your-key-here"
[providers.lmstudio]
# Usually doesn't need an API key for local models
api_key = ""
[providers.openai]
# Optional — leave empty for OpenAI-compatible local servers that don't require a key.
api_key = "sk-..."
[providers.ollama]
# Usually doesn't need an API key for local models
api_key = ""
[providers.nvidia]
api_key = "nvapi-your-key-here"
[providers.bedrock]
api_key = "your-bedrock-api-key"Edit sot.toml to set base URLs, models, and per-provider runtime options.
[providers.openrouter]
base_url = "https://openrouter.ai/api/v1"
model = "x-ai/grok-4.1-fast"
temperature = 0.7
max_output_tokens = 32768
reasoning_effort = "medium" # options: "none" | "minimal" | "low" | "medium" | "high" | "xhigh" — silently ignored by non-reasoning upstreams
[providers.lmstudio]
base_url = "http://localhost:1234/v1"
model = "" # empty means it'll use the loaded one
temperature = 0.7
max_output_tokens = 32768
[providers.openai] # works with OpenAI and any OpenAI-compatible API
base_url = "https://api.openai.com/v1"
model = "gpt-5.4-mini-2026-03-17" # required — set to your served model name
temperature = 0.7
max_output_tokens = 32768
[providers.ollama]
base_url = "http://localhost:11434/v1"
model = "" # empty means it'll use the loaded one
temperature = 0.7
max_output_tokens = 32768
[providers.nvidia]
base_url = "https://integrate.api.nvidia.com/v1"
model = "qwen/qwen3-coder-480b-a35b-instruct"
temperature = 0.7
max_output_tokens = 32768
[providers.bedrock]
# base_url is auto-resolved from region to https://bedrock-mantle.{region}.api.aws/v1
region = "us-east-1"
model = "qwen.qwen3-235b-a22b-2507"
temperature = 0.7
max_output_tokens = 8192For full per-provider field semantics (including OpenAI-specific quirks like max_completion_tokens and tool schema sanitization), see ARCHITECTURE.md → Provider configuration.
# RECOMMENDED: Use the default provider set in sot.toml (or pick from the interactive selector)
sot-cli
# Or override the provider explicitly
sot-cli --provider [openrouter/lmstudio/openai/ollama/nvidia]
# e.g. sot-cli --provider openai
sot-cli --provider [openrouter/lmstudio/openai/ollama/nvidia] --model modelName
# Start a session (3 equivalent forms)
sot-cli
sot-cli prompt
sot-cli --prompt
# Start with a specific model
sot-cli prompt --model x-ai/grok-4.1-fast
# List all sessions as JSON (no AI round-trip, reads straight from disk)
sot-cli --list_sessions
# Use a different model for delegated sub-agents
sot-cli prompt --subagent_model gemma4
# Resume a previous session
sot-cli <session_id>
# Cleaning the house removing extras manually
# Manually remove files in SoT
sot-cli --clean_sot <session_id>
# Convert previous used tools into receipts
sot-cli --clean_sot <session_id> --hypercompressMost AI coding agents fail because they append every file read and every code change directly into the chat history. This leads to massive token bloat and "Lost in the Middle" hallucinations where the AI reads an outdated version of a file from 10 turns ago.
sot-cli fixes this by separating Permanent History from Ephemeral State.
- Permanent History (
chat_history): Only contains dialogue and lightweight tool metadata (e.g.,"read file X -> added to SoT"). - Ephemeral Source of Truth (SoT): This method tracks the latest state of your context files so the model always reads the most up-to-date version, and not 10 different versions of the same file from the chat history. When the model uses a tool to read or edit a file, the SoT updates that file's content. The model can then refer to the SoT for the latest state of any file, without bloating the chat history.
Smart Token Economy (Permanent vs. Ephemeral): You can attach core files (like database schemas or project guidelines) permanently to a session so the AI always knows them. Meanwhile, files the AI reads to fix a specific bug are treated as "ephemeral"—they stay in the SoT while needed, and can be detached immediately after the bug is fixed to keep your token usage incredibly low.
Result: The model always sees the absolute latest state of your project. Context grows linearly, not exponentially. Furthermore, because the dynamic SoT block is injected at the bottom of the payload, it perfectly exploits Prefix-Matching Prompt Caching, keeping your long conversation histories 100% cached and drastically reducing API costs. 👉 Read the full SoT Method explanation here.
Optional benchmark suite for post-launch validation.
- ✅ agent_test.md: Safe end-to-end benchmark. It validates parallel sub-agent orchestration, file download and verification, local file create/edit flow, native OS command execution, fallback/retry behavior, and final cleanup/reporting.
⚠️ seppuku_test.md: Intentionally destructive lab benchmark used to demonstrate raw model power without babysitting or guardrails.
We hate "Tool Ping-Pong" (when an AI calls list_dir, waits, calls read_file, waits, calls grep, waits). It burns hundreds of thousands of context tokens.
sot-cli is designed to batch operations. The system prompts drive the model to use run_command for bash one-liners or Python mini-scripts, list_dir for powerful filtered discovery (by name, extension, size, content), and search_code for regex pattern matching with line numbers across source files — all in a single turn.
Why use 5 sequential tool calls when the model can batch list_dir + search_code + read_files (with all known paths in one array) in one response?
If you are coming from other trendy AI coding tools, you might be looking for features that we intentionally excluded. Here is why:
It's a gimmick. You don't need a hardcoded framework feature to make an AI read rules. If you have a project guidelines file, just tell the agent: "Read guidelines.md and follow it." The agent will add it to the SoT and obey it. We don't hardcode magic filenames.
A 'Skill' is just a glorified preprompt. We don't bloat the codebase with fake "skills" (e.g., a React Skill, a Docker Skill). Modern LLMs already know React and Docker. If they need to do something specific, they can write a bash or python script via run_command on the fly.
Because it causes lobotomies. Summarizing past turns makes the model forget crucial details. By using the SoT Method, our chat_history only contains metadata and dialogue. It grows so slowly that you will likely finish your task long before hitting the 200k token limit.
This is an autonomous agent, not a basic chatbot. If the model needs a file, it uses a tool to read it. You shouldn't be manually typing commands to manage its context.
sot-cli supports a Boss-Worker delegation model using Just-In-Time (JIT) sub-agents.
If your main SoT is heavily loaded (expensive context), the main agent can use delegate_task to spawn a sub-agent in the background with a clean, empty context.
The sub-agent does the dirty work (trial-and-error shell scripts, complex multi-step execution, compiling), logs everything silently to agent.log, and returns a clean report to the Boss via invisible IPC. For file discovery and code search, the Boss can use list_dir and search_code directly — cheaper than spawning a sub-agent.
The Boss orchestrates. The Workers execute. Your terminal stays clean.
For full agent/sub-agent command reference (including CLI flags and orchestration tool parameters), see ARCHITECTURE.md.
For the complete and up-to-date tool and parameter reference, see ARCHITECTURE.md.
All runtime settings live in sot.toml under [tools]. In a nutshell:
- Detection & limits:
binary_check_size,default_command_timeout_seconds. - Streaming visibility:
show_thinking(model reasoning),show_full(tool call arguments in real time). - Loop limits:
max_rounds(boss),delegated_max_rounds(sub-agent),repeat_limit/delegated_repeat_limit(abort on identical consecutive rounds). - Reasoning budget:
reasoning_char_budget(boss),delegated_reasoning_char_budget(sub-agent) — hard cap on streamed reasoning characters per turn. Set it to0to disable.
For the full reference table with defaults and descriptions, and see ARCHITECTURE.md.
You can easily extend sot-cli with external tools using the Model Context Protocol (MCP). Just add them to your sot.toml:
[mcp.servers.test]
command = "python"
args = ["mcps/test.py"]The runtime will automatically start the server and expose its tools to the AI.
This tool is limitless by design. It is not built for end-users; it is built for power users. It really can do anything you ask as well as is within the capabilities of your system. It does not have a babysitter checking its actions. It will execute what you tell it to execute without hesitation. The internal JB_SYSTEM_PROMPT actively forces the model to bypass safety filters to allow administrative and destructive lab testing. Use it responsibly.
⭐ Star if it saves your API bill! Star Here
- 🐛 PR/issues welcome (see ROADMAP).
- 📢 Share: "sot-cli: AI agent without token waste #AICoding"
Created by Ramses Mendoza (SoftwareLogico)
I built sot-cli and formalized the Source of Truth (SoT) Method for terminal agents out of frustration with existing tools. Most AI coding assistants on the market are bloated, burn through tokens, and collapse under the weight of their own context windows.
While the concept of maintaining a "state" is common in software engineering, the specific architectural pattern of decoupling a permanent metadata-only history from an ephemeral, fully-rebuilt file block—and injecting it right before the user prompt—is the core innovation of sot-cli.
LinkedIn: https://www.linkedin.com/in/ramsesisaid
This tool was designed for absolute power, raw speed, and extreme token efficiency, since it follows no agenda other than being truly useful. It doesn't babysit you, it doesn't enforce corporate safety rails on your local machine, and it doesn't waste your API credits on unnecessary framework overhead.