Advanced Computing in the Age of AI | Thursday, April 18, 2024

OpenAI Launches AI Text-to-Video Generator Sora 

OpenAI, the makers of ChatGPT and Dall-E, has joined the text-to-video AI content generation race by launching Sora, which has the ability to generate videos up to a minute long based on the user’s prompt. 

The company showed several impressive videos created using Sora including a woman walking down a street in Tokyo and historical footage of California during the gold rush era.  

Sora is currently in preview for the general public but is available to select groups, such as security experts and creators. The company has allowed access to certain individuals to gain feedback on how to advance the model to be most helpful for creative professionals. The general release date has not been made public yet. 

“We are working with red teamers  —  domain experts in areas like misinformation, hateful content, and bias  —  who will be adversarially testing the model,” the company said. “We’re also building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora.”

OpenAI is not the first company to launch this type of technology. Meta, Google, and several other companies have launched or are in the process of launching their versions of text-to-AI generating applications. Some of the most popular solutions on the market include Stability AI, Runway, Pika, and Google Lumiere. However, industry analytics have pointed to the high quality of Sora’s videos as being better than most competitors. Perhaps, this is why the Sora demonstration has generated so much hype. 

According to OpenAI, the advantage of Sora compared to other models is its striking photorealism and its ability to produce longer clips from brief prompts. Sora is based on a deep understanding of language, enabling it to interpret prompts and generate characters and emotions.

The Sora demo showed its ability to generate video from a few words, however, it did not show its ability to generate videos from a single image or a sequence of frames.

(AI generated/Shutterstock)The launch of Sora is causing excitement, but it also raised a few concerns. Such technology can be used to produce deepfakes and spread misinformation. We can expect Sora to have some restrictions on the content including non-appropriate real people or the use of a platform to create content that contains pornography or violence. 

“The solution to misinformation will involve some level of mitigations on our part, but it will also need understanding from society and for social media networks to adapt as well,” says Aditya Ramesh, lead researcher and head of the Dall-E team.

Another concern with Sora is that it can infringe on the copyrighted work of others. While OpenAI claims that the training data is from content that is either licensed or publicly available, there is always some ambiguity about what is considered “publicly available”. If OpenAI is not able to address this issue, they can be ready to face a number of lawsuits against them. 

There are also some issues with Sora's ability to accurately simulate the physics of a complex scene. For example, it may have a tendency to confuse the spatial details of a prompt. 

Sora is set to empower the average user to make AI videos using text.  While text-to-AI technology has a long way to go before it threatens the filmmaking industry, these could be the baby steps that lead to a major disruption in the entertainment industry.

For now, OpenAI would not be thinking that far ahead. The company would be focused on ensuring it improves the basic safety features of the platform by rejecting inappropriate content and misinformation and labeling Sora-created videos according to the C2PA guidelines.

Related Items

AI Video Platform Synthesia Raises $90M Series C with Investment from Nvidia 

Sony’s Computer Vision AI Stack Takes Pixel Analysis to the Edge 

Unlocking Generative AI for Content Understanding in Enterprises