OpenAI, the research organization behind the popular chatbot ChatGPT and the text-to-image generator DALL-E, has unveiled its latest AI model: Sora.
Sora is a text-to-video model that can create realistic videos up to a minute long based on text prompts. The model can also generate videos from still images or extend existing videos with new content.
Sora is a breakthrough in the field of generative AI, as it can simulate the physical world in motion and produce videos that match the user’s instructions on both subject and style. For example, Sora can generate a video of woolly mammoths walking in the snow, a movie trailer featuring a spaceman, or a cooking tutorial, just by typing a few words.
OpenAI’s CEO Sam Altman has also shared some examples of Sora-generated videos on X, in response to users’ prompts.
According to OpenAI’s blog post, the model is trained on a large corpus of videos that are both publicly available and licensed from copyright owners. It uses a neural network architecture that consists of a text encoder, a video encoder, a video decoder, and a discriminator.
“The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions.”
The text encoder converts the text prompt into a vector representation, the video encoder extracts features from the input video or image, the video decoder generates a new video frame based on the text and video features, and the discriminator evaluates the realism and relevance of the generated video.
OpenAI Sora is not perfect, however. The model may struggle to capture the physics or spatial details of more complex scenes, which can lead to illogical or unnatural results. For instance, it may generate a person running in the wrong direction on a treadmill, morph a subject in weird ways, or make it disappear altogether.
Moreover, Sora may raise ethical and social issues, such as the potential for misuse, abuse, and disinformation. OpenAI says that they will be,
“Taking several important safety steps ahead of making Sora available in OpenAI’s products.” They are also, “working with red teamers — domain experts in areas like misinformation, hateful content, and bias — who will be adversarially testing the model,” and are “building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora.”
The company also says that should it choose to build the model into a public-facing product, it will ensure that provenance metadata is included in the generated outputs.
Sora is currently only available to a few researchers, filmmakers, and video creators, who will provide feedback and test the model’s capabilities and limitations.