Publications

You can also find my articles on my Google Scholar profile.
ClawArena: A Multi-Framework Benchmark for Evaluating AI Coding Agents on Realistic Multi-Session Scenarios

ClawArena: A Multi-Framework Benchmark for Evaluating AI Coding Agents on Realistic Multi-Session Scenarios

Haonian Ji* , Kaiwen Xiong* , Siwei Han , Peng Xia , Shi Qiu , Yiyang Zhou , Jiaqi Liu , Jinlong Li , Bingzhou Li , Zeyu Zheng , Cihang Xie , Huaxiu Yao (* equal contribution)

Published in arXiv preprint, 2026

ClawArena is a benchmark evaluation platform for AI coding agents, providing 64 scenarios across 8 domains with 1,879 evaluation rounds. It supports a unified pipeline to run inference, score results, and compare performance across different agent frameworks on realistic multi-session scenarios.

Paper | Bibtex

MetaClaw: Just Talk – An Agent That Meta-Learns and Evolves in the Wild

MetaClaw: Just Talk – An Agent That Meta-Learns and Evolves in the Wild

Peng Xia* , Jianwen Chen* , Xinyu Yang* , Haoqin Tu* , Jiaqi Liu* , Kaiwen Xiong* , Siwei Han , Shi Qiu , Haonian Ji , Yuyin Zhou , Zeyu Zheng , Cihang Xie , Huaxiu Yao (* Core Contributors)

Published in arXiv preprint, 2026

MetaClaw is an agent that meta-learns and evolves in the wild. It turns every live conversation into a learning signal by placing the model behind a proxy that injects relevant skills at each turn and meta-learns from accumulated experience, enabling continuous improvement through real-world deployment without requiring a GPU cluster.

Paper | Bibtex

Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails

Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails

Siwei Han* , Kaiwen Xiong* , Jiaqi Liu , Xinyu Ye , Yaofeng Su , Wenbo Duan , Xinyuan Liu , Cihang Xie , Mohit Bansal , Mingyu Ding , Linjun Zhang , Huaxiu Yao (* equal contribution)

Published in arXiv preprint, 2026

We identify the Alignment Tipping Process, a critical post-deployment risk unique to self-evolving LLM agents. We formalize and analyze Alignment Tipping Process through two complementary paradigms: Self-Interested Exploration, where repeated high-reward deviations induce individual behavioral drift, and Imitative Strategy Diffusion, where deviant behaviors spread across multi-agent systems.

Paper | Bibtex