AI seems to be advancing at a rapid pace. Not too long ago, image generators were inspiring some creators and concerning others. Now, the natural next step has come upon us. OpenAI announced Sora, a new text-to-video AI model that can turn short text prompts into photorealistic videos that are up to a minute long—a feat charged with weighty implications for artists and users alike.
“We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction,” writes OpenAI. The company claims that the clips can maintain visual quality and adhere to the user’s prompt.
To add realism and depth to each video, Sora can create detailed backgrounds and a myriad of characters. The tech company adds, “The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.”
OpenAI describes Sora as a “diffusion model,” which generates a video by starting off with one that looks like static noise and gradually transforms it by removing the noise over many steps. “Sora is capable of generating entire videos all at once or extending generated videos to make them longer,” they say. “By giving the model foresight of many frames at a time, we’ve solved a challenging problem of making sure a subject stays the same even when it goes out of view temporarily.”
Still, the company acknowledges Sora is not perfect. “It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark,” OpenAI explains. The model can also mix up the spatial details of a prompt, and like most other AI tools, it seems to struggle to get hands and other human features just right.
Despite its weaknesses, OpenAI reiterates the milestone Sora represents: “The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately persist characters and visual style.”
To demonstrate how Sora works, OpenAI shared some clips created with different prompts. Some are detailed down to the framing, setting, clothing, and what goes through the main character's mind to something as simple as “a corgi vlogging itself in tropical Maui.” When asked, Sora also delivered videos in the cartoonish 3D style that has come to characterize publicly accessible CGI animations.
Since a tool this thorough can be easily misused, OpenAI has stated that it will be taking several important safety steps. To ensure its best use, they'll be working with experts in areas like misinformation, hateful content, and bias who will adversarially test the model. The company will also be building tools to help detect misleading content and reject text input prompts that are in violation of their usage policies.
“We’ll be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology,” they conclude. “Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time.”
Open AI has unveiled an AI model called Sora, which can create video from text prompts.
The results seem unbelievably realistic.
Humans, animals, and landscapes are all realistically rendered in motion.
OpenAI: Website
h/t: [PetaPixel]
Related Articles:
AI “Completes” Keith Haring’s Intentionally Unfinished Last Artwork, Sparks Controversy
New Tool Defends Artists by “Poisoning” AI Image Generators
AI Generator Will Turn Any Person Into a Renaissance Style “Masterpiece”
Getty Images Releases Commercially Safe AI Image Generator Based on Its Own Media Library