Just as LLMs can do general-purpose verbal reasoning, video generation models can do general-purpose visual reasoning.