Security optimization for AI agent systems.
-
Updated
May 7, 2026 - Python
Security optimization for AI agent systems.
Autonomous skill improvement loop for Claude Code plugins — inspired by Karpathy's autoresearch. Modify → evaluate → keep/discard → repeat until convergence. Zero-touch quality iteration at scale.
AI-augmented QA platform for spec-driven development and testing, RAG-grounded analysis, eval-driven development and contract validation across Python, Go, Rust and Solidity.
Modular self-referencing Markdown grounding system for agentic AI software engineering and architecture
Multilingual GenAI evaluation service across 5 task types and 3 languages, with regression-trend dashboard
Most AI plugins hope they work. These prove it. Eval-driven Claude plugins for product teams.
Eval-driven development for LLM accounting skills. 50 automated test cases. 66% to 94% in 5 iterations. AI bias mitigation techniques.
Add a description, image, and links to the eval-driven-development topic page so that developers can more easily learn about it.
To associate your repository with the eval-driven-development topic, visit your repo's landing page and select "manage topics."