Alibaba releases Qwen-AgentWorld — Language World Models for general agents
What's new
- Language World Model: First model trained to predict the next environment state (not the next action) across agent domains
- Seven domains, one model: MCP, Search, Terminal, SWE, Web, OS, Android — both text and GUI environments
- Two MoE sizes: 397B-A17B and 35B-A3B, both with 256K context, Apache 2.0
- AgentWorldBench: New eval benchmark covering all seven domains, released alongside the weights
- Trained on 10M+ real agent trajectories: Three-stage curriculum ending in RL with rule-based + quality scoring
- Benchmark wins: Highest overall simulation quality vs. GPT-5.4, Claude Opus 4.8, and Gemini 3.1 Pro on AgentWorldBench
Why it matters
For agent-framework builders and researchers — enables offline rollouts and planning by simulating environment responses, reducing reliance on live tool calls during training and evaluation.
How to try it
Weights and benchmark on Hugging Face and ModelScope under Apache 2.0; arXiv paper and GitHub repo published with the announcement.
Open original source
Alibaba Cloud