Problem
The existing scan.mjs covers Greenhouse, Ashby, and Lever APIs but misses Indeed Ireland — Ireland's largest job search website. Many local roles are posted exclusively on Indeed and never appear on ATS platforms that scan.mjs can reach.
Indeed uses Cloudflare bot protection that blocks standard HTTP clients, so a different approach is needed.
Proposed Solution
A Python script (scan-indeed.py) using Scrapling to bypass Cloudflare and scrape ie.indeed.com search results. Same output contract as scan.mjs — writes to pipeline.md and scan-history.tsv.
Key features:
- Configurable queries via
indeed_queries section in portals.yml
- Falls back to auto-generating queries from
title_filter.positive if no config
- Applies same title_filter and dedup logic as
scan.mjs
--dry-run mode for previewing without writing
- Documented fallback escalation path (Scrapling → StealthyFetcher → CloakBrowser)
Dependencies
scrapling (Python)
pyyaml (Python)
Implementation
PR: #719
Problem
The existing
scan.mjscovers Greenhouse, Ashby, and Lever APIs but misses Indeed Ireland — Ireland's largest job search website. Many local roles are posted exclusively on Indeed and never appear on ATS platforms thatscan.mjscan reach.Indeed uses Cloudflare bot protection that blocks standard HTTP clients, so a different approach is needed.
Proposed Solution
A Python script (
scan-indeed.py) using Scrapling to bypass Cloudflare and scrape ie.indeed.com search results. Same output contract asscan.mjs— writes topipeline.mdandscan-history.tsv.Key features:
indeed_queriessection inportals.ymltitle_filter.positiveif no configscan.mjs--dry-runmode for previewing without writingDependencies
scrapling(Python)pyyaml(Python)Implementation
PR: #719