Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

publications

Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails

Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails

Siwei Han* , Kaiwen Xiong* , Jiaqi Liu , Xinyu Ye , Yaofeng Su , Wenbo Duan , Xinyuan Liu , Cihang Xie , Mohit Bansal , Mingyu Ding , Linjun Zhang , Huaxiu Yao (* equal contribution)

Published in arXiv preprint, 2026

We identify the Alignment Tipping Process, a critical post-deployment risk unique to self-evolving LLM agents. We formalize and analyze Alignment Tipping Process through two complementary paradigms: Self-Interested Exploration, where repeated high-reward deviations induce individual behavioral drift, and Imitative Strategy Diffusion, where deviant behaviors spread across multi-agent systems.

Paper | Bibtex

MetaClaw: Just Talk – An Agent That Meta-Learns and Evolves in the Wild

MetaClaw: Just Talk – An Agent That Meta-Learns and Evolves in the Wild

Peng Xia* , Jianwen Chen* , Xinyu Yang* , Haoqin Tu* , Jiaqi Liu* , Kaiwen Xiong* , Siwei Han , Shi Qiu , Haonian Ji , Yuyin Zhou , Zeyu Zheng , Cihang Xie , Huaxiu Yao (* Core Contributors)

Published in arXiv preprint, 2026

MetaClaw is an agent that meta-learns and evolves in the wild. It turns every live conversation into a learning signal by placing the model behind a proxy that injects relevant skills at each turn and meta-learns from accumulated experience, enabling continuous improvement through real-world deployment without requiring a GPU cluster.

Paper | Bibtex

ClawArena: A Multi-Framework Benchmark for Evaluating AI Coding Agents on Realistic Multi-Session Scenarios

ClawArena: A Multi-Framework Benchmark for Evaluating AI Coding Agents on Realistic Multi-Session Scenarios

Haonian Ji* , Kaiwen Xiong* , Siwei Han , Peng Xia , Shi Qiu , Yiyang Zhou , Jiaqi Liu , Jinlong Li , Bingzhou Li , Zeyu Zheng , Cihang Xie , Huaxiu Yao (* equal contribution)

Published in arXiv preprint, 2026

ClawArena is a benchmark evaluation platform for AI coding agents, providing 64 scenarios across 8 domains with 1,879 evaluation rounds. It supports a unified pipeline to run inference, score results, and compare performance across different agent frameworks on realistic multi-session scenarios.

Paper | Bibtex

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.