Data
Chinese Models Benchmarked: What We Found
The hype around Chinese AI is everywhere right now. Kimi K2.5 generated more revenue in its first 20 days
The hype around Chinese AI is everywhere right now. Kimi K2.5 generated more revenue in its first 20 days
When we ran 10 models through 7 creative writing prompts in the AgentPulse v2.2 benchmark, the scores moved in
The cost spread across capable frontier AI models is over 100x. Our latest AgentPulse benchmark run tested 10 models across
One-hundredth of a point. That's the task quality gap between Gemini 3 Flash Preview and Claude Sonnet 4.
4.61 versus 4.55. That's the gap between the top two models in our first AgentPulse benchmark