"Ask questions in English. Get answers in charts."
Nexus SQL is an autonomous AI Data Analyst that transforms natural language questions into optimized SQL queries, executes them safely, and instantly visualizes the results — all through a premium Streamlit interface. Powered by DeepSeek-V3 and LangChain's ReAct agent, it reduces time-to-insight by eliminating the need to know SQL entirely.
✨ Features · 🏗️ Architecture · 🚀 Setup · 💡 Examples
Business intelligence is bottlenecked by a single skill: SQL fluency.
- Analysts wait days for engineering to answer data questions
- Non-technical stakeholders can't self-serve — every insight requires a ticket
- Writing complex JOINs, GROUP BYs, and subqueries is error-prone even for experienced developers
- Raw query results are meaningless without visualization — creating another handoff to a BI tool
- No safety net — a single
DROP TABLEorDELETE FROMcan cause irreversible damage
Nexus SQL solves all five. Type a question in English, get a chart, a summary, and the full chain of thought — with two layers of SQL injection protection ensuring your data is never at risk.
Powered by LangChain's create_sql_agent with a Zero-Shot ReAct Description agent type:
- Schema Inspection First: The agent always inspects database structure before writing SQL — using exact table and column names, never guessing
- Multi-Step Planning: Complex questions trigger multi-step reasoning chains — the agent plans JOINs, decides groupings, and iterates on its own queries before returning an answer
- Explicit Foreign Key Logic: The system prompt enforces
explicit JOINsand correct foreign key resolution (e.g.,orders.product_id = products.product_id) - Business Interpretation: After returning data, the agent provides an executive summary explaining the business significance
Context-aware follow-up queries without repeating yourself:
- Sliding Window: The agent maintains a history of the last 5 Q&A pairs in memory
- Context Injection: Follow-up questions automatically receive the last 3 exchanges as context — so "Break that down by country" works after "Show total revenue"
- Session History Panel: The sidebar displays all previous queries with one-click Re-run buttons
The ResultVisualizer class in visualizer.py automatically selects the optimal chart type:
| Data Pattern Detected | Chart Type | When It Triggers |
|---|---|---|
| Date/time column + numeric column | 📈 Timeline (spline line) | date, time, month in column name + any numeric column |
| Categorical + numeric (≤5 categories) | 🍩 Donut Chart | e.g., revenue per country (3–5 rows) |
| Categorical + numeric (>5 categories) | 📊 Bar Chart | e.g., product-level breakdown (6+ rows) |
| Two numeric columns | 🔵 Scatter Plot | e.g., quantity vs. price correlation |
| Single numeric column | 📊 Bar Chart (fallback) | Any single-metric result set |
All charts use a futuristic color palette (#00f2ff, #7000ff, #ff007a, #00ffab, #ffea00) with plotly_dark template — transparent backgrounds, hidden grid lines, Inter font, and custom hover labels matching the UI.
Summary Statistics are also auto-generated: totals, averages, and currency formatting (detects columns with price, amount, total in the name).
Layer 1 — Agent Pre-Audit (agent.py):
Before any query reaches the LLM, _is_potentially_malicious() scans the raw user input for 8 forbidden keywords:
delete, drop, truncate, update, insert, alter, grant, revoke
If any are found, the request is immediately rejected — the agent never sees it.
Layer 2 — Tool-Level Execution Guard (tools.py):
The SafeSQLExecutor LangChain tool enforces a second line of defense at execution time:
- Only
SELECTandWITH(CTE) statements are allowed - 10 dangerous SQL keywords are blocked:
DELETE,DROP,TRUNCATE,UPDATE,INSERT,ALTER,CREATE,EXEC,GRANT,REVOKE - All queries are automatically limited to 100 rows if no
LIMITclause is present - Results are formatted through Pandas for clean table output
The QueryComplexityAnalyzer tool (tools.py) scores any SQL query for performance risk:
| Factor | Score | Warning Threshold |
|---|---|---|
| Each JOIN | +2 | >3 JOINs: "High number of joins. May be slow." |
SELECT * usage |
+1 | "Consider selecting only necessary columns." |
Each subquery (SELECT... |
+3 | >1 subquery: "Multiple subqueries detected." |
GROUP BY |
+1 | (no warning) |
Rating: Simple (0–4) · Moderate (5–9) · Complex (10+)
Zero-config database migration via environment variables:
DATABASE_TYPE=postgresql
POSTGRES_DB_URL=postgresql://user:password@host:5432/dbnameWhen DATABASE_TYPE=postgresql is set, the entire system — agent, tools, visualizer — automatically switches from SQLite to the production PostgreSQL connection. The Streamlit UI also updates the Node Status panel and disables the Hard Reset button (which is SQLite-only).
Every answer includes 👍 Correct / 👎 Incorrect buttons. Feedback is logged to evaluations.csv with:
- Timestamp
- Original question
- SQL query (if captured)
- Agent answer
- Human feedback
This creates a ground-truth dataset for future model fine-tuning.
A fully custom design system injected via st.markdown() CSS:
- Radial gradient background:
#1a1c2c → #0a0b10 - Glassmorphism sidebar:
backdrop-filter: blur(10px)with subtle border - Gradient headings:
linear-gradient(90deg, #fff 0%, #a5b4fc 100%) - Gradient buttons:
linear-gradient(135deg, #7000ff 0%, #4a00ff 100%)with hover lift (translateY(-2px)) - Code blocks: JetBrains Mono font on dark
#1e1e2ebackground - Fade-in animation: Results container uses
@keyframes fadeInwith translate - Three-tab result layout:
📊 Analytics,📋 Source Data,🧠 Neural Logic
Pre-loaded example queries in the sidebar for instant exploration:
| Label | Natural Language Query |
|---|---|
| 📈 Monthly Revenue Trend | "Show me total sales by month for the last 6 months" |
| 🌐 Regional Performance | "What is the total revenue per country?" |
| 👑 VIP Customers | "Who are the top 5 customers by total spending?" |
| 📦 Inventory Health | "Which product categories have the highest and lowest average prices?" |
| 🚛 Order Logistics | "List the 10 most recent orders with customer name and product details" |
┌──────────────────────────────────────────────────────────────────────┐
│ Streamlit Frontend (app.py) │
│ │
│ ┌────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Query Input │ │ Discovery Hub │ │ Session History │ │
│ │ (text_area) │ │ (5 examples) │ │ (Re-run btns) │ │
│ └───────┬────────┘ └────────┬─────────┘ └──────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Layer 1: Pre-Audit Security Check │ │
│ │ _is_potentially_malicious(question) │ │
│ │ Blocks: delete, drop, truncate, update... │ │
│ └─────────────────────────┬───────────────────────────────────┘ │
│ │ ✅ Clean │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ SQLQueryAgent (agent.py) │ │
│ │ │ │
│ │ ┌───────────────────────────┐ ┌────────────────────────┐ │ │
│ │ │ Conversational Memory │ │ DeepSeek V3 LLM │ │ │
│ │ │ Last 5 Q&A pairs │ │ model: deepseek-chat │ │ │
│ │ │ Last 3 injected as │ │ temperature: 0 │ │ │
│ │ │ context for follow-ups │ │ top_p: 0.1 │ │ │
│ │ └───────────────────────────┘ └────────────────────────┘ │ │
│ │ │ │
│ │ LangChain create_sql_agent (zero-shot-react-description) │ │
│ │ SQLDatabaseToolkit → Schema + Sample Rows (3 per table) │ │
│ └─────────────────────────┬───────────────────────────────────┘ │
│ │ ReAct Tool Loop │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Layer 2: SafeSQLExecutor (tools.py) │ │
│ │ │ │
│ │ • Validates: SELECT/WITH only │ │
│ │ • Blocks: DELETE, DROP, CREATE, EXEC, GRANT... (10 KWs) │ │
│ │ • Auto-appends: LIMIT 100 │ │
│ │ • Executes via: SQLAlchemy + Pandas │ │
│ └─────────────────────────┬───────────────────────────────────┘ │
│ │ DataFrame result │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ ResultVisualizer (visualizer.py) │ │
│ │ │ │
│ │ • Detects data shape (date, categorical, numeric) │ │
│ │ • Selects chart: Timeline / Bar / Donut / Scatter │ │
│ │ • Generates summary stats with currency formatting │ │
│ └─────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Three-Tab Result Panel │ │
│ │ 📊 Analytics: Chart + Key Metrics + 👍/👎 Feedback │ │
│ │ 📋 Source Data: Raw DataFrame table │ │
│ │ 🧠 Neural Logic: Full Chain-of-Thought step expansion │ │
│ └──────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
| File | Lines | Responsibility |
|---|---|---|
agent.py |
110 | SQLQueryAgent class — DeepSeek LLM init, LangChain SQL agent setup, conversation memory (5-entry sliding window), input pre-audit, schema exposure |
tools.py |
127 | SafeSQLExecutor — read-only SQL execution with LIMIT 100 auto-append · QueryComplexityAnalyzer — JOIN/subquery scoring · SchemaInspector — table info dump |
app.py |
333 | Full Streamlit UI — CSS design system, sidebar Discovery Hub, result tabs (Analytics / Source Data / Neural Logic), feedback logger, PostgreSQL toggle, @st.cache_resource agent singleton |
visualizer.py |
104 | ResultVisualizer — auto chart selection (Timeline / Bar / Donut / Scatter), futuristic color palette, get_summary_stats() with currency detection |
setup_db.py |
119 | Demo e-commerce database genesis — 3 tables (customers, products, orders), 40 customers, 12 products, 250 orders with realistic time-series pricing |
test_agent.py |
35 | Smoke test — initializes agent, prints schema, runs a sample query |
The setup_db.py script creates a realistic e-commerce dataset:
customers (40 rows)
├── customer_id INTEGER PRIMARY KEY
├── name TEXT
├── email TEXT UNIQUE
├── segment TEXT -- Corporate, Retail, Wholesale, Affiliate
├── country TEXT -- USA, UK, Germany, Canada, France, Japan, Australia, Brazil
└── signup_date DATE
products (12 rows)
├── product_id INTEGER PRIMARY KEY
├── product_name TEXT -- Quantum Laptop Pro, Neural Mouse, ErgoDesk 3000...
├── category TEXT -- Electronics, Furniture, Appliances
├── sub_category TEXT -- Computing, Peripherals, Office, Kitchen, Audio, Mobile
└── base_price DECIMAL
orders (250 rows)
├── order_id INTEGER PRIMARY KEY
├── customer_id INTEGER → FK customers
├── product_id INTEGER → FK products
├── order_date DATE -- Last 180 days
├── quantity INTEGER -- 1–5
├── unit_price DECIMAL -- ±10% variation for realism
├── total_amount DECIMAL
└── shipping_status TEXT -- Delivered, Shipped, Processing, Cancelled- Python
3.10+ - A DeepSeek API Key — get one here (uses OpenAI-compatible endpoint)
git clone https://github.com/Ismail-2001/SQL-Query-Agent.git
cd SQL-Query-Agentpython -m venv venv
source venv/bin/activate # macOS / Linux
# venv\Scripts\Activate.ps1 # Windows (PowerShell)pip install -r requirements.txtCreate a .env file in the project root:
DEEPSEEK_API_KEY=your_deepseek_api_key_here
# Optional — for production PostgreSQL:
# DATABASE_TYPE=postgresql
# POSTGRES_DB_URL=postgresql://user:password@host:5432/dbname# Create the demo e-commerce database (40 customers, 12 products, 250 orders)
python setup_db.py
# Launch the Streamlit app
streamlit run app.pyThe app will open at http://localhost:8501.
Note: If the demo database doesn't exist when you launch
app.py, it will be automatically created on first load — no need to runsetup_db.pymanually.
Type any of these into the query box:
| Category | Example Query | What Happens Under the Hood |
|---|---|---|
| Aggregation | "What is the total revenue for each country?" | SELECT country, SUM(total_amount) FROM orders JOIN customers... GROUP BY country |
| Top-N | "Who are the top 5 customers by spending?" | Multi-table JOIN with ORDER BY ... DESC LIMIT 5 |
| Trend Analysis | "Show monthly revenue for the last 6 months" | Date grouping + Timeline chart auto-generated |
| Filtering | "List all cancelled orders from the USA" | Filtered JOIN with WHERE shipping_status = 'Cancelled' AND country = 'USA' |
| Comparison | "Compare average order values between Corporate and Retail segments" | GROUP BY segment + Bar/Donut chart |
| Follow-up | "Break that down by product category" | Conversational memory injects previous context — no need to repeat the original question |
Every successful query produces a three-tab result panel:
| Tab | Content |
|---|---|
| 📊 Analytics | Executive summary (text), auto-generated Plotly chart, key metrics summary, and 👍/👎 feedback buttons |
| 📋 Source Data | The raw DataFrame rendered as an interactive Streamlit table |
| 🧠 Neural Logic | Full Chain-of-Thought — every ReAct step the agent took, including schema inspections, SQL drafts, and field observations |
python test_agent.pyThis initializes the agent, prints the full database schema, and runs a sample question ("How many customers do we have in total?") — useful for validating your API key and database setup before launching the UI.
SQL-Query-Agent/
│
├── agent.py # SQLQueryAgent class
│ # • DeepSeek V3 via OpenAI-compatible endpoint
│ # • LangChain create_sql_agent (zero-shot-react)
│ # • 5-entry conversational memory (sliding window)
│ # • Pre-audit malicious input scanner (8 forbidden KWs)
│ # • Custom system prompt enforcing schema-first reasoning
│
├── tools.py # LangChain BaseTool implementations
│ # • SafeSQLExecutor: read-only guard + LIMIT 100
│ # • QueryComplexityAnalyzer: JOIN/subquery scoring
│ # • SchemaInspector: table info with sample rows
│
├── app.py # Streamlit application
│ # • Full CSS design system (radial gradients, glassmorphism)
│ # • 3-tab result panel (Analytics / Source Data / Neural Logic)
│ # • Sidebar: Discovery Hub, Session History, Node Status
│ # • @st.cache_resource agent singleton
│ # • @st.cache_data query caching (1hr TTL)
│ # • Feedback logger → evaluations.csv
│ # • PostgreSQL env var toggle
│
├── visualizer.py # ResultVisualizer class
│ # • Auto chart selection: Timeline / Bar / Donut / Scatter
│ # • Futuristic 5-color palette on plotly_dark template
│ # • Currency-aware summary statistics
│
├── setup_db.py # Demo database generator
│ # • 3 tables: customers, products, orders
│ # • 40 customers × 12 products × 250 orders
│ # • Time-series friendly order dates (last 180 days)
│ # • Realistic pricing with ±10% variance
│
├── test_agent.py # Smoke test: schema dump + sample query
├── requirements.txt # Python dependencies (13 packages)
├── evaluations.csv # (Generated) Human feedback log
└── .env # Private API key configuration
| Category | Library | Purpose |
|---|---|---|
| Language | Python 3.10+ |
Core runtime |
| LLM Provider | langchain-openai → DeepSeek V3 |
SQL generation and reasoning via OpenAI-compatible API (api.deepseek.com/v1) |
| Agent Framework | LangChain create_sql_agent |
Zero-Shot ReAct agent with SQLDatabaseToolkit |
| Schema Access | langchain-community SQLDatabase |
Schema introspection with 3 sample rows per table |
| Tool Validation | Pydantic BaseModel |
Typed args_schema for all custom tools |
| Database | SQLAlchemy + SQLite (demo) / PostgreSQL (prod) | Universal database abstraction |
| Data Processing | Pandas | Query result formatting + DataFrame manipulation |
| Visualization | Plotly Express + Graph Objects | Interactive chart generation with dark theme |
| Frontend | Streamlit | App framework with @st.cache_resource and @st.cache_data |
| Config | python-dotenv |
Secure .env file loading |
| PostgreSQL Driver | psycopg2-binary |
Production database driver |
| Columnar Engine | DuckDB | Available for analytical query acceleration |
- Push your repo to GitHub
- Go to share.streamlit.io
- Select your repository → set Main file path to
app.py - Add Secrets in the Streamlit Cloud dashboard:
DEEPSEEK_API_KEY = "your_key_here"
- Click Deploy ✅
# render.yaml (Render)
services:
- type: web
name: nexus-sql
runtime: python
buildCommand: pip install -r requirements.txt && python setup_db.py
startCommand: streamlit run app.py --server.port $PORT --server.headless true
envVars:
- key: DEEPSEEK_API_KEY
sync: falseFROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install --no-cache-dir -r requirements.txt
RUN python setup_db.py
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.headless=true"]- DeepSeek V3 via OpenAI-compatible endpoint
- LangChain Zero-Shot ReAct SQL agent with
SQLDatabaseToolkit - Two-layer SQL security (input pre-audit + execution guard)
- Automatic LIMIT 100 injection on all queries
- Conversational memory (5-entry sliding window, last 3 as context)
- Auto-visualization engine (Timeline / Bar / Donut / Scatter)
- Query Complexity Analyzer with JOIN/subquery scoring
- Premium glassmorphism Streamlit UI with 3-tab results
- Demo e-commerce database (3 tables, 250 orders)
- Discovery Hub with 5 pre-loaded example queries
- Human feedback loop →
evaluations.csv - PostgreSQL environment toggle
-
@st.cache_resourceagent singleton +@st.cache_dataquery caching (1hr TTL)
- Multi-Agent Orchestration: Separate agents for SQL generation vs. data visualization strategy
- Vector-Search Schema Pruning: Embedding-based table selection for 1000+ table databases
- Export Engine: PDF and Excel report generation for business meetings
- Query History Dashboard: Full session analytics with re-run, compare, and trend views
- Role-Based Access Control: Read-only vs. analyst vs. admin permission tiers
- Preditive Anomaly Detection: Proactive alerts when revenue, orders, or metrics deviate from trend
- Slack / Teams Integration: Query your database from a chat message
- Scheduled Reports: Cron-based automated briefings delivered via email
- Multi-Database Connections: Query across multiple databases in a single session
- Data Lineage Visualization: Trace column dependencies and transformation chains
- Fine-Tuned SQL Model: Train on
evaluations.csvfeedback data for domain-specific accuracy - Natural Language Dashboard Builder: Generate full Streamlit dashboards from conversation history
Contributions are welcome! High-impact areas:
- New visualization types in
visualizer.py(e.g., heatmaps, treemaps, funnel charts) - Additional safety rules in
tools.pySafeSQLExecutor - New example queries for the Discovery Hub in
app.py - PostgreSQL test coverage with real connection handling
- New demo datasets via
setup_db.py(e.g., healthcare, logistics, SaaS metrics)
To contribute:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Commit with Conventional Commits:
git commit -m "feat: add heatmap chart type to visualizer" - Push and open a Pull Request against
main
Distributed under the MIT License. See LICENSE for details.
Built so anyone can talk to their data — no SQL required.
If Nexus SQL changed how you think about data access, star ⭐ the repo.
Built with ❤️ by Ismail Sajid