Skip to content

feat: Indeed Ireland scanner support #720

@kejiali

Description

@kejiali

Problem

The existing scan.mjs covers Greenhouse, Ashby, and Lever APIs but misses Indeed Ireland — Ireland's largest job search website. Many local roles are posted exclusively on Indeed and never appear on ATS platforms that scan.mjs can reach.

Indeed uses Cloudflare bot protection that blocks standard HTTP clients, so a different approach is needed.

Proposed Solution

A Python script (scan-indeed.py) using Scrapling to bypass Cloudflare and scrape ie.indeed.com search results. Same output contract as scan.mjs — writes to pipeline.md and scan-history.tsv.

Key features:

  • Configurable queries via indeed_queries section in portals.yml
  • Falls back to auto-generating queries from title_filter.positive if no config
  • Applies same title_filter and dedup logic as scan.mjs
  • --dry-run mode for previewing without writing
  • Documented fallback escalation path (Scrapling → StealthyFetcher → CloakBrowser)

Dependencies

  • scrapling (Python)
  • pyyaml (Python)

Implementation

PR: #719

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions