reinforcement-learning
Qwen used human-feedback training to make its image AI follow directions better News
A new recipe applies the same reinforcement-learning approach that polished chatbots to an image generator, then merges separate skill models into one - improving how faithfully it follows prompts and edits.
Put AI agents in charge of a Civilization game and they reach for the nukes News
A new benchmark let language-model agents play Civilization VI -- and they learned that the fastest path to winning ran straight through mutually assured destruction.
A wave of new methods trains AI without a human answer key News
Several research groups landed on the same idea at once - improve a model by learning from its own attempts instead of expensive human labels - and the field is debating whether it really removes the labeling burden or just hides it.
Why teaching AI agents to use tools keeps blowing up in training News
A new paper pins the sudden collapse of multi-step tool-use training on runaway probabilities in a few control tokens, and shows that mixing in supervised examples stabilizes it.
Alibaba's new models let AI agents practice in a world they imagine News
Qwen-AgentWorld trains a model to simulate the environment an agent acts in, then uses that simulation as a cheap, controllable place to learn -- reporting gains beyond training in the real thing.
AI Agents Are Learning to Build the Worlds They Train In News
Three new open research projects point the same way: instead of only learning what to do, agents are learning to simulate the environment itself, so they can practice in their own imagination.
What are world models? Lesson
A world model is an AI system's internal understanding of how an environment works — not just what it sees right now, but what will happen after an action, and what would have happened differently. Central to planning, robotics, and the next generation of physical AI.
Qwen-AgentWorld Tool
Alibaba's open language world model that simulates agent environments -- browser, terminal, phone, coding workspace and more -- so other agents can be trained inside the simulation. Released with open weights and code in two sizes.