OpenAI Hyperrealistic AI Videos and AI Video Generation for World Simulators

by Brian Wang

It is capable of generating up to one-minute-long videos from textual prompts, maintaining exceptional visual quality. 

Sora utilizes a diffusion model to evolve videos from static noise into coherent visual narratives, setting a new standard in AI technology.

OpenAI also revealed research for video generation models as world simulators. They explore large-scale training of generative models on video data. Specifically, they train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. They leverage a transformer architecture that operates on spacetime patches of video and image latent codes. The largest model, Sora, is capable of generating a minute of high fidelity video. The results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.

Hyperrealistic Video can be used to generate hyper-useful AI training data. This goes in line with the scaling of training compute by 100 times every year. By 2025, this OpenAI video generation could scale to many hours. By 2026, weeks of video could be generated every hour. The generation of training data could become many multiple of real-time.