🦉 Data Versioning and ML Experiments
-
Updated
May 18, 2026 - Python
🦉 Data Versioning and ML Experiments
Refine high-quality datasets and visual AI models
A system for agentic LLM-powered data processing and ETL
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
Nomic Developer API SDK
ContextGem: Effortless LLM extraction from documents
AI-Powered Data Processing: Use LOTUS to process all of your datasets with LLMs and embeddings. Enjoy up to 1000x speedups with fast, accurate query processing, that's as simple as writing Pandas code
Get clean data from tricky documents, powered by vision-language models ⚡
Humans and AI agents, building knowledge bases together. Self-hosted document annotation, version control, semantic search, and MCP.
Curate better data for LLMs
NucliaDB, The AI Search database for RAG
Continuously updated paper list on advancements in Data Agents. Companion repo to our paper "A Survey of Data Agents: Emerging Paradigm or Overstated Hype?"
Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.
python implementation of jordansissel's grok regular expression library
Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.
Home of the AI workforce - Multi-agent system, AI agents & tools
RAG-QA-Generator 是一个用于检索增强生成(RAG)系统的自动化知识库构建与管理工具。该工具通过读取文档数据,利用大规模语言模型生成高质量的问答对(QA对),并将这些数据插入数据库中,实现RAG系统知识库的自动化构建和管理。
Add a description, image, and links to the unstructured-data topic page so that developers can more easily learn about it.
To associate your repository with the unstructured-data topic, visit your repo's landing page and select "manage topics."