OpenAI have developed a new neural network for video creation - Sora

Tech Feb 18, 2024 0 228 Add to Reading List

On February 15, OpenAI introduced a new generative artificial intelligence model, Sora, that allows you to convert text into video. The tool has caused delight on social networks, but it still needs a lot of improvements before its full launch.

Introducing Sora, our text-to-video model.
Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W
Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf
— OpenAI (@OpenAI) February 15, 2024

Sora is capable of generating videos up to 60 seconds long with a resolution of up to 1080p based on simple text prompts. These may include multiple characters, specific types of movement, and precise details of the subject and background.

The tool is based on research from GPT and DALL-E 3. It works on the so-called diffusion model - it turns the original image into statistical noise , and then transforms it, removing this noise step by step.

Announcing Sora — our model which creates minute-long videos from a text prompt: https://t.co/SZ3OxPnxwz pic.twitter.com/0kzXTqK9bG
— Greg Brockman (@gdb) February 15, 2024

The developers admit that at this stage Sora still has a number of shortcomings. It is difficult for her to accurately model the physics of a complex scene - she gets confused in cause-and-effect relationships.

“For example, a person can bite off a piece of a cookie, but after that there may not be a bite mark left on it,” explains OpenAI.

The tool has problems with spatial detail. The result obtained may not correspond to the given directions, it may be wrong where the right and left are.

For now, Sora is available to the "red team" of testers, as well as select designers, artists and filmmakers.

Social media reaction

The tool has received fascination on social media and is already trending X with over 173,000 posts.

To clearly demonstrate the capabilities of the model, OpenAI CEO Sam Altman began accepting user requests for video generation. At the time of writing, he has shared a total of nine videos created by Sora.

https://t.co/uCuhUPv51N pic.twitter.com/nej4TIwgaP
— Sam Altman (@sama) February 15, 2024

AI experts noted that Sora's capabilities are "speechless."

I don't even know what to say…
These clips generated by OpenAI's Sora model have me speechless.
We knew good AI text-to-video would come, but this quickly? Unreal.
We're stepping into a new world.
Buckle up. pic.twitter.com/zP7b5fKw5x
— Mckay Wrigley (@mckaywrigley) February 15, 2024

According to Nvidia senior scientist Jim Fan, Sora is much more than just another "creative toy" like DALL-E 3. He defined it as a "data-driven physics engine" because the AI model doesn't just generate abstract video, but also intuitively creates the physics of objects in the scene itself.

Along with this, a number of users expressed concerns that tools like Sora will worsen the problem of deepfakes.

According to one user, large social networks need to think about built-in protection against realistic fakes. Separately, he highlighted the threat of substitution of video evidence of crimes.

Another user expressed the need for De-AI reverse engineering technology to avoid misinterpretation of content.

Previously, OpenAI began testing a “memory” feature for the ChatGPT chatbot, which stores information discussed in conversations to improve the user experience.

At the same time, the company actively opposes the use of its products for illegal purposes.