
ClawArena: A Multi-Framework Benchmark for Evaluating AI Coding Agents on Realistic Multi-Session Scenarios
Published in arXiv preprint, 2026
ClawArena is a benchmark evaluation platform for AI coding agents, providing 64 scenarios across 8 domains with 1,879 evaluation rounds. It supports a unified pipeline to run inference, score results, and compare performance across different agent frameworks on realistic multi-session scenarios.




