vision-language-action

Everything on Ground Truth tagged “vision-language-action” — 2 items.

What Are Vision-Language-Action Models? Lesson

A vision-language-action (VLA) model is a single neural network that takes in camera images and a plain-language instruction and outputs the actual motor commands to carry it out, letting one model both understand a scene and physically act on it.

Robot AI Models Ace Colors but Flunk 'Is This Alive?' News

A new study shows vision-language-action models lose most of their commonsense world knowledge when fine-tuned to control robots, scoring near coin-flip on questions their source models answered almost perfectly.