Java PDF table extraction & OCR library. Extract structured tables from text-based and scanned PDFs using stream, lattice (OpenCV-style grid detection), and hybrid parsing.
-
Updated
Mar 15, 2026 - Java
Java PDF table extraction & OCR library. Extract structured tables from text-based and scanned PDFs using stream, lattice (OpenCV-style grid detection), and hybrid parsing.
Open-source document management platform leveraging AWS managed services. RESTful API for document storage, processing, full-text search, and metadata management. Multi-tenant serverless architecture with auto-scaling... deployed entirely in your AWS account.
A lightweight, framework-agnostic Java library for adding watermarks to various file types, including PDFs and videos
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem
Open-source RAG backend for document ingestion and AI-powered chat with on-premise LLMs
Multi-language SDKs (TypeScript, Python, Go, Java, C#, Ruby, Rust, Swift, PHP) for AI-powered document processing
Spring AI DocumentReader integration for Kreuzberg document extraction engine
Microsserviço de assistentes de IA com Spring Boot e Spring AI baseado em RAG. Integra OpenAI e pgvector para ingestão de documentos, busca vetorial e geração de respostas contextualizadas por domínio.
Event-driven file upload & search demo on OpenShift
State-machine driven Document Processing Orchestration Service built with Spring Boot. Orchestrates OCR, Document Classification, and Named Entity Recognition pipelines with async processing, retry support, logging, and Dockerized deployment.
A simple Java CLI tool for batch-converting PDF files to TXT format. Supports file filtering by filename wildcards and last modified date.
Intelligent Data Extractor: Converts unstructured documents into structured JSON using LLM APIs. Spring Boot, Hexagonal Architecture, OpenAI/Gemini/Groq.
Exploring Spring AI 2.0 capabilities: RAG pipeline with intelligent agents, query routing, answer evaluation, and multi-format document ingestion using pgvector
XY.AI Workbench – Eclipse RCP solution for LLM-augmented workflows. Token-driven intelligence with tool orchestration, RAG, feedback loops, and semantic validation for reliable AI-assisted document processing.
This project is a document processing tool that converts HWP, PDF, DOCX, and other formats into HTML.
Java development with focus on legal document processing, enterprise systems, and environmental data applications. Applying software engineering principles to complex domain problems.
Enterprise Java document processing system with AI extraction, classification and workflow routing
AI-powered enterprise knowledge assistant — upload documents, ask questions, and get context-aware answers using RAG, Spring Boot 3.5, Spring AI, and pgvector.
Add a description, image, and links to the document-processing topic page so that developers can more easily learn about it.
To associate your repository with the document-processing topic, visit your repo's landing page and select "manage topics."