
Open black-box benchmark for AI API testing agents. Objective scoring on bug detection, coverage & efficiency using live APIs with planted bugs.

APIEval-20 is the first benchmark designed specifically to evaluate how well AI agents can generate API test suites that actually find bugs—using only a schema and sample payload, with no access to source code or documentation. It measures real-world black-box testing capability across 20 diverse API scenarios spanning e-commerce, payments, authentication, and more.
APIEval-20 serves AI researchers building testing agents, engineering teams evaluating automation tools, and QA leaders seeking objective metrics to compare agent performance against human-level testing standards.

open source agent engineering platform

OpenRouter for agent tools
Find gaps in your AI agents before users do

Vision-first QA testing across web and mobile

The context layer for production-grade AI agent

Autonomous quality for engineering teams

build your own software factory

The Infrastructure Behind AI Agencies | White-Label Platform

Production browser automation, built and maintained by AI

A local AI coding assistant to delegate tasks to AI agents

Agentic Wispr flow computer-use-agent living in your notch

Discover, access, and pay for any API autonomously

Ship AI agents without the operational burden

Give your agent a real number and voice to make calls.

AI Meeting companion with cross-meeting memory

An open source AI harness built with the human in mind
AI agents that turn signals into crypto + Polymarket trades

Skip the prompting. Produce consistently compelling videos.

Your Chief Agent Operator for multi-agent work

Grow your store profits with agents that know how to sell

An AI wearable that remembers your conversations all day

Al sleep companion that helps fall asleep without struggle

Scrape emails from socials and maps by location