Meta Text to Video Generator

• by Brian Wang

The system uses images with descriptions to learn what the world looks like and how it is often described. It also uses unlabeled videos to learn how the world moves. With this data, Make-A-Video lets you bring your imagination to life by generating whimsical, one-of-a-kind videos with just a few words or lines of text.

Make-A-Video has three advantages:
(1) it accelerates training of the T2V model (it does not need to learn visual and multimodal representations from scratch),

(2) it does not require paired text-video data, and

(3) the generated videos inherit the vastness (diversity in aesthetic, fantastical depictions, etc.) of today's image generation models.