News · 2026-06-30

Google ships a faster, cheaper image model and hands developers conversational video editing

Google released two new generative-media models this week—Nano Banana 2 Lite for fast, cheap image generation and Gemini Omni Flash for short video generation and editing—designed for volume and speed rather than peak quality. The models target builders who need to produce large quantities of images and short clips quickly and affordably, signaling a shift in AI-generation economics.

Key facts

What: A lightweight version of Google's image model now makes a picture in about four seconds for a fraction of a cent, while a new video model lets developers edit clips by talking to it.
When: 2026-06-30
Primary source: read the source

Both models were detailed in a blog post, where Google positioned them for builders who prioritize throughput over a single polished render.

Nano Banana 2 Lite, a stripped-down version of Google's image generator, produces a text-to-image picture in about four seconds and costs roughly three cents per thousand images. That pricing is the real story. At those numbers, generating images stops being a treat you dole out carefully and becomes something you can do by the thousand—testing a hundred variations of a design, auto-illustrating every item in a catalog, or letting an app generate imagery on the fly for each user. When a capability gets an order of magnitude cheaper, people do not just do the old thing for less money; they do new things that were never affordable before. Google is clearly betting that cheap-and-fast unlocks a different class of use than slow-and-pristine. It is available in Google's developer studio and its main AI programming interface, and it is rolling out to consumer products like the Gemini app and search.

Gemini Omni Flash, the more novel of the two, handles video—generating and, more interestingly, editing short clips of up to ten seconds. Google bills it as the first time developers get programmable access to conversational video editing. Traditional video editing means a timeline, tracks, and a mouse: you scrub to a frame and manually change things. Conversational editing means you describe the change in plain words—make the sky darker, slow the middle down, remove the person on the left—and the model produces the revised clip. Doing that through an API means a developer can bake that ability into their own app, so their users can revise video by talking rather than learning editing software. Combined with the fast image model, Google is sketching an end-to-end pipeline: generate a picture in seconds, turn it into a short clip, then refine the clip by conversation. It is available in the same developer surfaces plus Google's video-creation tool.

The honest caveat is that the "lite" and "flash" labels are doing a lot of quiet work. A four-second image model priced to run by the thousand is, almost by definition, making tradeoffs against the slower, pricier flagship—in fine detail, in how reliably it renders text inside an image, in handling unusual or complex prompts. Ten-second clips are short, and the hardest parts of video generation—keeping a character consistent, physics that do not melt, coherence across a longer scene—get harder the longer the clip. None of that makes these models less useful; it means they are precision tools for a specific job. The winners will be the builders who match the tool to the task: reach for cheap-and-fast when volume and iteration speed matter, and save the expensive flagship for the single hero image or the shot that has to be flawless. What Google actually shipped this week is less a leap in quality than a shift in economics—and in this field, the economics are often what decides which ideas get built at all.

Primary source, verified: read the paper →

Key questions

How fast and cheap is the new image model?

Google's Nano Banana 2 Lite generates a picture in about four seconds and costs roughly three cents per thousand images, aimed at rapid prototyping and high-volume use. It is the fastest and cheapest image model in Google's Gemini lineup.

What is new about the video model?

Gemini Omni Flash gives developers their first programmable access to conversational video editing - generating and revising short clips up to ten seconds by describing the change in words rather than using a timeline editor.

Where can developers get these?

Both are available in Google AI Studio, the Gemini API, and Google's enterprise agent platform, and they are rolling out to consumer surfaces like the Gemini app and search.

Topics: Google · Gemini · image-generation · video-generation · developer-tools

Comments are replies to this story on Bluesky — reply with any Bluesky account to join in.