#

web-scraping

Here are 682 public repositories matching this topic...

firecrawl

firecrawl / firecrawl

🔥 Search, scrape, and clean the web for AI agents.

markdown crawler scraper ai html-to-markdown web-crawler scraping web-scraper web-scraping data-extraction webscraping web-data-extraction ai-agents web-search ai-search web-data llm ai-crawler ai-scraping

Updated May 21, 2026
TypeScript

crawlee

apify / crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

nodejs javascript npm crawler scraper automation typescript web-crawler headless scraping crawling web-scraping web-crawling headless-chrome apify puppeteer playwright

Updated May 21, 2026
TypeScript

getmaxun / maxun

🔥 The open-source no-code platform for web scraping, crawling, search and AI data extraction • Turn websites into structured APIs in minutes 🔥

api crawler scraper automation crawling web-scraper self-hosted web-scraping data-extraction webscraping agents browser-automation no-code web-search rpa robotic-process-automation nocode playwright

Updated May 21, 2026
TypeScript

JCodesMore / ai-website-cloner-template

Clone any website with one command using AI coding agents

react template boilerplate automation typescript ai clone skills nextjs reverse-engineering web-scraping developer-tools ai-agents claude tailwindcss website-clone ai-tools shadcn-ui claude-code

Updated May 7, 2026
TypeScript

jaypyles / Scraperr

Self-hosted webscraper.

python docker kubernetes opensource helm scraping webscraper web-scraper self-hosted web-scraping web-scrapers webscraping playwright

Updated Oct 12, 2025
TypeScript

patchright

Kaliiiiiiiiii-Vinyzu / patchright

Undetected version of the Playwright testing and automation library.

Updated May 21, 2026
TypeScript

saifyxpro / HeadlessX

The undetected self-hosted browser automation platform. Powered by Camoufox (Firefox) for 0% detection rates. Built for speed, privacy, and scalability.

automation headless web-scraping chromedriver data-extraction automation-api chrome-headless browser-automation headless-chrome browser-testing web-automation puppeteer browserless automation-platform playwright headless-service scraping-service playwright-automation container-automation

Updated May 19, 2026
TypeScript

firecrawl / open-scouts

🔥 AI-powered web monitoring platform. Create automated scouts that search the web and send email alerts when they find what you're looking for.

react open-source alerts automation typescript nextjs web-scraping openai email-notifications resend ai-agents web-monitoring tailwindcss vercel posthog supabase firecrawl

Updated May 21, 2026
TypeScript

intoli / user-agents

A JavaScript library for generating random user agents with data that's updated daily.

javascript user-agent random randomization navigator web-scraping browsers browser-automation user-agent-spoofer

Updated May 21, 2026
TypeScript

patchright-nodejs

Kaliiiiiiiiii-Vinyzu / patchright-nodejs

Undetected NodeJS version of the Playwright testing and automation library.

bot chrome automation webdriver browser bots chromium cloudflare web-scraping chromedriver stealth webscraping botting webautomation undetected undetectable cloudflare-bypass playwright web-auto

Updated May 10, 2026
TypeScript

web-agent-master / google-search

A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google searches. Local alternative to SERP APIs with MCP server integration.

ai web-scraping google-search llm mcp-server

Updated Apr 6, 2025
TypeScript

lumpinif / deepcrawl

100% free and full open-source edge Firecrawl alternative with better links extraction for agents - that you can deploy to cloudflare or vercel by yourself.

typescript nextjs html-to-markdown crawling web-scraper web-scraping hono html-cleaner ai-sdk cloudflare-workers links-extraction links-tree better-auth ai-agent-tools orpc nextjs16 deepcrawl

Updated Mar 12, 2026
TypeScript

reader

vakra-dev / reader

Open source web infrastructure for AI. Scrape, crawl, and automate the web, clean markdown, browser sessions, ready for your agents.

Updated May 8, 2026
TypeScript

figranium

figranium / figranium

Build complex browser workflows visually and execute them via API.

api automation headless web-scraping browser-automation headless-browser playwright agentic-tasks

Updated Apr 21, 2026
TypeScript

graphlit / graphlit-mcp-server

Model Context Protocol (MCP) Server for Graphlit Platform

web-crawler web-scraping data-collection content-extraction search-api claude unstructured-data content-ingestion llm-tools model-context-protocol mcp-server

Updated Jan 12, 2026
TypeScript

teng-lin / agent-fetch

Full-content web fetcher for AI agents — Chrome TLS fingerprinting, browser impersonation, and multi-strategy article extraction

nodejs typescript html-to-markdown web-scraping readability fetcher content-extraction ai-agents tls-fingerprint anti-bot-detection httpcloak

Updated Mar 15, 2026
TypeScript

BrowserCash / teracrawl

High-performance web crawler API optimized for LLMs. Turn any search or website into clean Markdown using remote browsers. Firecrawl alternative

html-to-markdown web-scraper web-scraping data-extraction browser-automation ai-agents web-search google-serp ai-search serpapi browser-agent ai-crawler antibot-bypass ai-scraping crawl4ai firecrawl-mcp firecrawl-api firecrawl-alternative

Updated Dec 3, 2025
TypeScript

redf0x1 / camofox-browser

Anti-detection browser server for AI agents — REST API wrapping Camoufox engine with OpenClaw plugin support

Updated May 13, 2026
TypeScript

minhlucvan / n8n-nodes-browserless

n8n node to interact with browserless instance

web-scraping browser-automation browserless n8n n8n-nodes n8n-community-node-package

Updated Oct 3, 2024
TypeScript

ayakashi

ayakashi-io / ayakashi

⚡ Ayakashi.io - The next generation web scraping framework

data-mining automation web-scraping web-crawling headless-chrome

Updated Jun 29, 2023
TypeScript

Improve this page

Add a description, image, and links to the web-scraping topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-scraping topic, visit your repo's landing page and select "manage topics."