Atlas Engine

Business lead generation, lead intelligence, and local business data enrichment platform

Python desktop app + scraping engine + local database + browser signals

[THE SYSTEM IS STILL UNDER DEVELOPEMENT]

Overview
Current Working Version (Root)
How Atlas Engine Works
System Architecture Diagrams
Key Features (Current Version)
Version Folders & Differences
Problems, Risks & Disadvantages
Why This System
Who Can Benefit
Install & Run (Current Version)
Running Older Versions
Browser Extension
Data, Privacy & Compliance
SEO Keywords
License

Overview

Atlas Engine is a desktop lead generation and lead enrichment system that collects business data from public sources, scores lead quality, and stores everything in a local SQLite database for review and export. It targets local businesses, enriches contact data (phone, email, website, socials), and provides a lead intelligence UI for approval, rejection, and notes. This repository also includes multiple version folders that show the evolution from API-based discovery to public data scraping and a Streamlit-based UI.

Current Working Version (Root)

This is the active version in the root folder (main.py, app/, extension/).

What it includes (based on current code):

Desktop UI (PySide6) with login screen and admin credential update.
Lead discovery engine that builds niche/city/area query sets.
Public web discovery using Google search HTML + DuckDuckGo + JustDial category pages.
Website crawling & extraction (JSON-LD + HTML text patterns for email/phone/socials/address).
Lead scoring & tiers (Hot / Warm / Cold) with configurable filters.
Website health checks (HTTPS, missing title/description, thin content, block detection).
Local SQLite database (atlas.sqlite3) for leads, statuses, notes, and page signals.
Review tools: approve/reject, copy JSON, open source page, export CSV.
Local HTTP bridge on 127.0.0.1:8765 to receive browser extension signals.
Browser extension that detects scrapable business pages and reports signals.

How Atlas Engine Works

User defines niche + location in the UI.
Query generator expands search phrases and sources.
Discovery engine gathers candidate URLs from public search & directories.
Scraper & extractor parse pages, JSON-LD, and link signals.
Lead scoring calculates a quality score and tier.
Storage writes leads and signals to SQLite.
Review & export in the dashboard: approve, reject, add notes, export CSV.

System Architecture Diagrams

1) High-Level Architecture

flowchart LR
  UI[PySide6 Desktop UI] --> Engine[Scrape Worker]
  Engine --> Sources[Public Sources]
  Sources -->|Google HTML + DuckDuckGo| WebSearch
  Sources -->|JustDial category pages| Directory
  Engine --> Extract[HTML + JSON-LD Extractors]
  Extract --> Score[Lead Scoring]
  Score --> DB[(SQLite: atlas.sqlite3)]
  DB --> UI
  Extension[Chrome Extension] --> Bridge[Local Bridge :8765]
  Bridge --> DB

2) Data Pipeline Flow

sequenceDiagram
  participant User
  participant UI as Atlas UI
  participant Engine as Scrape Worker
  participant Web as Public Web
  participant DB as SQLite

  User->>UI: Enter niche/city/area + filters
  UI->>Engine: Start scraping
  Engine->>Web: Search + directory queries
  Web-->>Engine: Candidate pages
  Engine->>Web: Crawl target pages
  Web-->>Engine: HTML + JSON-LD
  Engine->>Engine: Extract & score
  Engine->>DB: Save leads + status
  UI->>DB: Load dashboard & filters

Key Features (Current Version)

Multi-source discovery: Google search HTML, DuckDuckGo fallback, JustDial pages.
Contact enrichment: phones, emails, socials, website detection.
Lead scoring: quality scoring + Hot/Warm/Cold tiers.
Lead management UI: approve/reject, notes, quick actions.
Website health analysis: HTTPS checks, thin-content detection.
Local-first: no cloud dependency; data stays on the device.
Browser signals: extension detects business-like pages and sends signals to the app.
CSV export: built-in export for pipelines or CRM imports.

Version Folders & Differences

Comparison Table

Folder	UI	Data Sources	Enrichment	Storage/Export	Notes
Root (Current)	PySide6 desktop + browser signals	Google HTML + DuckDuckGo + JustDial	JSON-LD + HTML scraping + website health	SQLite + CSV export	Includes local bridge + extension signals
V1_COLLECTTS_FULL_DATA	PySide6 desktop	Google Places API + Google CSE API + OSM (Overpass/Nominatim)	API + website crawl	SQLite + run tracking	Requires API keys; more API-driven
USA_SCRAPPER_VERSION_2	PySide6 desktop	OpenStreetMap (Overpass + Nominatim)	Optional website crawl	SQLite + CSV/JSON	Focused on USA/Canada public data
USA_SCRAPPER_VERSION_3	Streamlit web UI	OpenStreetMap (Overpass + Nominatim)	Website crawl + scoring	CSV + Excel	Dashboard filters, Streamlit export

Key Differences vs V1

V1 = API-first (Google Places + CSE + OSM), requires keys and quotas.
Current = scraping-first (public search + JustDial), no API keys required, but higher risk of blocks.
V1 has run tracking + settings table; current has browser extension signals and website health checks.
Current adds review workflows (approve/reject, notes) and live dashboard widgets.

Problems, Risks & Disadvantages

Scraping fragility: Google/JustDial HTML changes can break extraction.
Anti-bot risk: search engines may throttle or block requests.
No proxy/rotation layer: large-scale scraping can fail or be rate-limited.
Local-only storage: no built-in cloud sync or multi-user access.
Security concern: default login (root / 1234) is weak unless changed.
Extension issue: popup.html references popup.js, but the file is missing, so popup UI cannot function fully.
Limited compliance tooling: no built-in consent, GDPR workflows, or audit logs.
No automated tests: regression risk when modifying scraping logic.
Platform bias: UI tested mainly on Windows; macOS/Linux may need tweaks.

Why This System

Local-first lead intelligence: keep sensitive lead data on your machine.
Rapid lead discovery: generate and score leads without paid APIs.
Modular pipeline: easy to swap or extend sources and extraction logic.
UI-driven workflow: validate, score, and export in one desktop app.

Who Can Benefit

Sales & lead gen teams building outbound lists.
Local marketing agencies targeting SMBs.
Business development reps needing quick prospect research.
Freelancers or growth teams without budget for paid lead APIs.
Researchers working with local business datasets.

Install & Run (Current Version)

# 1) Install dependencies
py -3.13 -m pip install -r requirements.txt

# 2) Run the app
py -3.13 main.py

Default login: root / 1234 (change in Settings after first run)

Running Older Versions

# V1
cd V1_COLLECTTS_FULL_DATA
pip install -r requirements.txt
python main.py

# USA Scrapper v2
cd USA_SCRAPPER_VERSION_2
pip install -r requirements.txt
python main.py

# USA Scrapper v3 (Streamlit)
cd USA_SCRAPPER_VERSION_3
pip install -r requirements.txt
streamlit run app.py

Browser Extension

The Chrome extension lives in extension/ and posts page signals to the local bridge.

Bridge endpoint: http://127.0.0.1:8765/page-signal
Purpose: detect business-like pages and mark them as scrapable

Note: the popup references popup.js which is missing, so the popup UI is currently incomplete.

Data, Privacy & Compliance

All leads are stored locally in SQLite.
Data is gathered from public sources; ensure compliance with website terms and local regulations.
You are responsible for ethical use, consent, and compliance (GDPR/CCPA/etc).

SEO Keywords

Business lead generation, lead intelligence software, local business scraper, lead enrichment tool, B2B lead collection, business directory scraping, offline lead database, PySide6 lead management app.

License

Apache License 2.0. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Atlas Engine

Table of Contents

Overview

Current Working Version (Root)

How Atlas Engine Works

System Architecture Diagrams

1) High-Level Architecture

2) Data Pipeline Flow

Key Features (Current Version)

Version Folders & Differences

Comparison Table

Key Differences vs V1

Problems, Risks & Disadvantages

Why This System

Who Can Benefit

Install & Run (Current Version)

Running Older Versions

Browser Extension

Data, Privacy & Compliance

SEO Keywords

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
USA_SCRAPPER_VERSION_2		USA_SCRAPPER_VERSION_2
USA_SCRAPPER_VERSION_3		USA_SCRAPPER_VERSION_3
V1_COLLECTTS_FULL_DATA		V1_COLLECTTS_FULL_DATA
app		app
extension		extension
.gitignore		.gitignore
ATLAS.ico		ATLAS.ico
LICENSE		LICENSE
README.md		README.md
V2.lnk		V2.lnk
atlas_database.csv		atlas_database.csv
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Atlas Engine

Table of Contents

Overview

Current Working Version (Root)

How Atlas Engine Works

System Architecture Diagrams

1) High-Level Architecture

2) Data Pipeline Flow

Key Features (Current Version)

Version Folders & Differences

Comparison Table

Key Differences vs V1

Problems, Risks & Disadvantages

Why This System

Who Can Benefit

Install & Run (Current Version)

Running Older Versions

Browser Extension

Data, Privacy & Compliance

SEO Keywords

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages