vision-language-action
Everything on Ground Truth tagged “vision-language-action” — 2 items.
What Are Vision-Language-Action Models? Lesson
A vision-language-action (VLA) model is a single neural network that takes in camera images and a plain-language instruction and outputs the actual motor commands to carry it out, letting one model both understand a scene and physically act on it.
Robot AI Models Ace Colors but Flunk 'Is This Alive?' News
A new study shows vision-language-action models lose most of their commonsense world knowledge when fine-tuned to control robots, scoring near coin-flip on questions their source models answered almost perfectly.