Sora — OpenAI’s next giant leap?

Rishabh Jogani
3 min readMar 1, 2024

--

OpenAI on February 15th, released its new AI model for video creation, Sora, which uses text prompts to create realistic and imaginative videos.

Why is it important?

These models are trained to understand and simulate the physical world. Basically, the AI models is being trained to generate videos that would obey the laws of physics and motion as we know it. Seeing the videos generated by Sora, it’s a right step made in that direction. The videos generated are not perfect, but really impressive nonetheless. The advancements made by the gaming industry all these years to get realistic light using ray tracing were really great, but Sora seems to generate as good, if not better, videos using its predictive algorithm. If the output continues to improve at the scale at which it’s improving, you soon won’t be able to identify actual videos from AI-generated ones.

Just to make it clear, Sora doesn’t render videos using ray tracing, all it’s doing is predicting the next frame or frames and filling in the remaining gaps.

Imperfections?

Some of the videos are so well generated that until and unless you specifically focus on particular parts to find abnormalities, you won’t find any on the first go. Take the below example of two ships fighting in a coffee cup. It’s really scary, how good the model has already become with physics and context to give out such an output, but if you take a closer look at the right ship, you will soon find the imperfections that really stand out once you see it.

The model needs to be trained on certain specific cases as well; for example, take the white SUV video while there aren’t major abnormalities with the physics, you will find the scene to be video gamey rather than a real-world scenario due to the locked-in camera angle.

While the above examples give a picture of some minor imperfections, there are a lot of cases highlighted by OpenAI where the model misses the plot completely. This was expected as the model is at a relatively very early stage in the grand scheme of the AI timeline. Nonetheless, Sora does give a glimpse of how much better it can become as it gets trained.

Personal Take

It’s been almost about good two weeks since the first look of Sora, and I am still amazed by how far it has come and how fast the models are getting better.

Had multiple conversations with people regarding the impact of AI. People still don’t seem to realize its impact, and it’s actually hard to fathom to what extent will AI replace tasks, people, or even industries. Just taking the example of Sora, the impact it will have on stock photos and videos, as they wouldn’t have to be licensed anymore; you could just type in a prompt, and it’s done. Who would invest in building out large-scale sets for scenes in movies, you can just type in the specifics, and you will get an almost undifferentiable output with such AI models. The media, film, entertainment, and photography sectors are poised to closely monitor the advancements in text-to-video models. These innovations have the potential to significantly influence not just their financial streams but also the livelihoods of many within these industries.

Another very interesting thing I am looking forward to is human behavior with such AI-generated content. Will there be large-scale adoption? Will there be a time, no matter how good AI-generated videos might be, humans would actually prefer content created by humans?

Imagine content later would need a “created by a human” watermark rather than the other way around.

The entire world, governments, regulators, businesses, creators, and people are all playing catch-up with this technology, which is growing at a pace that is unfathomable. What the future entails, I have absolutely no clue, but it will be interesting, to say the least.

--

--