Business lead generation, lead intelligence, and local business data enrichment platform
Python desktop app + scraping engine + local database + browser signals
[THE SYSTEM IS STILL UNDER DEVELOPEMENT]
- Overview
- Current Working Version (Root)
- How Atlas Engine Works
- System Architecture Diagrams
- Key Features (Current Version)
- Version Folders & Differences
- Problems, Risks & Disadvantages
- Why This System
- Who Can Benefit
- Install & Run (Current Version)
- Running Older Versions
- Browser Extension
- Data, Privacy & Compliance
- SEO Keywords
- License
Atlas Engine is a desktop lead generation and lead enrichment system that collects business data from public sources, scores lead quality, and stores everything in a local SQLite database for review and export. It targets local businesses, enriches contact data (phone, email, website, socials), and provides a lead intelligence UI for approval, rejection, and notes. This repository also includes multiple version folders that show the evolution from API-based discovery to public data scraping and a Streamlit-based UI.
This is the active version in the root folder (main.py, app/, extension/).
What it includes (based on current code):
- Desktop UI (PySide6) with login screen and admin credential update.
- Lead discovery engine that builds niche/city/area query sets.
- Public web discovery using Google search HTML + DuckDuckGo + JustDial category pages.
- Website crawling & extraction (JSON-LD + HTML text patterns for email/phone/socials/address).
- Lead scoring & tiers (Hot / Warm / Cold) with configurable filters.
- Website health checks (HTTPS, missing title/description, thin content, block detection).
- Local SQLite database (
atlas.sqlite3) for leads, statuses, notes, and page signals. - Review tools: approve/reject, copy JSON, open source page, export CSV.
- Local HTTP bridge on
127.0.0.1:8765to receive browser extension signals. - Browser extension that detects scrapable business pages and reports signals.
- User defines niche + location in the UI.
- Query generator expands search phrases and sources.
- Discovery engine gathers candidate URLs from public search & directories.
- Scraper & extractor parse pages, JSON-LD, and link signals.
- Lead scoring calculates a quality score and tier.
- Storage writes leads and signals to SQLite.
- Review & export in the dashboard: approve, reject, add notes, export CSV.
flowchart LR
UI[PySide6 Desktop UI] --> Engine[Scrape Worker]
Engine --> Sources[Public Sources]
Sources -->|Google HTML + DuckDuckGo| WebSearch
Sources -->|JustDial category pages| Directory
Engine --> Extract[HTML + JSON-LD Extractors]
Extract --> Score[Lead Scoring]
Score --> DB[(SQLite: atlas.sqlite3)]
DB --> UI
Extension[Chrome Extension] --> Bridge[Local Bridge :8765]
Bridge --> DB
sequenceDiagram
participant User
participant UI as Atlas UI
participant Engine as Scrape Worker
participant Web as Public Web
participant DB as SQLite
User->>UI: Enter niche/city/area + filters
UI->>Engine: Start scraping
Engine->>Web: Search + directory queries
Web-->>Engine: Candidate pages
Engine->>Web: Crawl target pages
Web-->>Engine: HTML + JSON-LD
Engine->>Engine: Extract & score
Engine->>DB: Save leads + status
UI->>DB: Load dashboard & filters
- Multi-source discovery: Google search HTML, DuckDuckGo fallback, JustDial pages.
- Contact enrichment: phones, emails, socials, website detection.
- Lead scoring: quality scoring + Hot/Warm/Cold tiers.
- Lead management UI: approve/reject, notes, quick actions.
- Website health analysis: HTTPS checks, thin-content detection.
- Local-first: no cloud dependency; data stays on the device.
- Browser signals: extension detects business-like pages and sends signals to the app.
- CSV export: built-in export for pipelines or CRM imports.
| Folder | UI | Data Sources | Enrichment | Storage/Export | Notes |
|---|---|---|---|---|---|
| Root (Current) | PySide6 desktop + browser signals | Google HTML + DuckDuckGo + JustDial | JSON-LD + HTML scraping + website health | SQLite + CSV export | Includes local bridge + extension signals |
| V1_COLLECTTS_FULL_DATA | PySide6 desktop | Google Places API + Google CSE API + OSM (Overpass/Nominatim) | API + website crawl | SQLite + run tracking | Requires API keys; more API-driven |
| USA_SCRAPPER_VERSION_2 | PySide6 desktop | OpenStreetMap (Overpass + Nominatim) | Optional website crawl | SQLite + CSV/JSON | Focused on USA/Canada public data |
| USA_SCRAPPER_VERSION_3 | Streamlit web UI | OpenStreetMap (Overpass + Nominatim) | Website crawl + scoring | CSV + Excel | Dashboard filters, Streamlit export |
- V1 = API-first (Google Places + CSE + OSM), requires keys and quotas.
- Current = scraping-first (public search + JustDial), no API keys required, but higher risk of blocks.
- V1 has run tracking + settings table; current has browser extension signals and website health checks.
- Current adds review workflows (approve/reject, notes) and live dashboard widgets.
- Scraping fragility: Google/JustDial HTML changes can break extraction.
- Anti-bot risk: search engines may throttle or block requests.
- No proxy/rotation layer: large-scale scraping can fail or be rate-limited.
- Local-only storage: no built-in cloud sync or multi-user access.
- Security concern: default login (
root/1234) is weak unless changed. - Extension issue:
popup.htmlreferencespopup.js, but the file is missing, so popup UI cannot function fully. - Limited compliance tooling: no built-in consent, GDPR workflows, or audit logs.
- No automated tests: regression risk when modifying scraping logic.
- Platform bias: UI tested mainly on Windows; macOS/Linux may need tweaks.
- Local-first lead intelligence: keep sensitive lead data on your machine.
- Rapid lead discovery: generate and score leads without paid APIs.
- Modular pipeline: easy to swap or extend sources and extraction logic.
- UI-driven workflow: validate, score, and export in one desktop app.
- Sales & lead gen teams building outbound lists.
- Local marketing agencies targeting SMBs.
- Business development reps needing quick prospect research.
- Freelancers or growth teams without budget for paid lead APIs.
- Researchers working with local business datasets.
# 1) Install dependencies
py -3.13 -m pip install -r requirements.txt
# 2) Run the app
py -3.13 main.pyDefault login: root / 1234 (change in Settings after first run)
# V1
cd V1_COLLECTTS_FULL_DATA
pip install -r requirements.txt
python main.py
# USA Scrapper v2
cd USA_SCRAPPER_VERSION_2
pip install -r requirements.txt
python main.py
# USA Scrapper v3 (Streamlit)
cd USA_SCRAPPER_VERSION_3
pip install -r requirements.txt
streamlit run app.pyThe Chrome extension lives in extension/ and posts page signals to the local bridge.
- Bridge endpoint:
http://127.0.0.1:8765/page-signal - Purpose: detect business-like pages and mark them as scrapable
Note: the popup references
popup.jswhich is missing, so the popup UI is currently incomplete.
- All leads are stored locally in SQLite.
- Data is gathered from public sources; ensure compliance with website terms and local regulations.
- You are responsible for ethical use, consent, and compliance (GDPR/CCPA/etc).
Business lead generation, lead intelligence software, local business scraper, lead enrichment tool, B2B lead collection, business directory scraping, offline lead database, PySide6 lead management app.
Apache License 2.0. See LICENSE.