multimodal
Image generators can't plan. This one bolts on a brain that can. News
Qwen-Image-Agent wraps planning, reasoning, and memory around a text-to-image model so it can break a hard request into steps - and the local-AI crowd immediately asked whether it runs on a gaming GPU.
One model that listens, sees, and talks back in real time News
Wan-Streamer collapses the usual chain of separate speech and video tools into a single model built for live, two-way conversation.
This model's job is to make better training data for other models News
DataClaw0 turns the grind of cleaning and labeling training data into a learned skill -- a small model that refines raw, messy multimodal streams into dense, purpose-built lessons.
Qwen-Image-2.0-Pro Tool
Alibaba's latest open image-generation model in the Qwen family, downloadable and runnable locally, part of a broad open-weight release wave that also refreshed the Qwen3.6 chat models.
MiniMax-M3 Tool
A natively multimodal open model trained on text, image, and video from the first step, with a million-token context and a sparse-attention design built for speed; downloadable for self-hosting and also offered through MiniMax's own API and agent platform.