Let's not do this again, please
OpenAI's text-to-video generator Sora is being hyped as a game-changer by an industry at a crossroads. This time we should know better
About this time in 2023, OpenAI was enjoying a crescendo of hype over its image and text generation products — hype that would be sustained nearly all year. ChatGPT passed standardized tests. AI-generated images of famous people dressed up in the style of other famous people streamed down our feeds. A New York Times columnist worried that an AI had tried to ruin his marriage. The future was coming so fast, were we ready?
On the back of all that buzz, OpenAI CEO Sam Altman became a global celebrity, embarking on an AI world tour in which he solemnly touted the grave dangers of the technology he was developing for commercial profit. The jig worked. All that hype was quickly translated into enormous investments in generative AI companies, federal lawmakers convening panels of Silicon Valley execs on how to harness this fearsome new tech — and a pervasive sense of inevitability.
It’s an ideal place to be if you were, say, a company pitching AI automation software to managers and business leaders. For just about anyone else, it all happened so rapidly we really had a chance to ask: Is this something we actually want? Does it even work as well as advertised? What are the costs?
After all, it’s quickly becoming clear that AI content production could be devastating to creative labor, a huge drain on water and power usage, and a copyright nightmare for everyone involved. It’s almost as if there was an intentional playbook OpenAI was drawing from here — release the product, hype it relentlessly, and ask forgiveness later; move fast and break things, if you will.
So: let’s not do this again please.
This week, OpenAI announced Sora, an AI model for creating video from text, and that announcement quickly and predictably went viral. One tech reporter wondered if it could be AI’s “holy shit” moment, many called it “amazing” or “completely insane,” and influential technologists and pundits worried it heralded the end of reality as we know it.
But let’s take a breath here, and look at what’s actually happening — both in terms of the technology itself, and in terms of the wider context in which this announcement is taking place.
First, as the Alex Kantrowitz reported today, OpenAI’s flagship product, ChatGPT, is stagnating. Thats the conclusion of a SimilarWeb report that found that web traffic for the once-world-beating tool has declined in five out of the eight last months; it’s down 11% from its May 2023 high.
Meanwhile, OpenAI is desperately searching for a revenue model to support the energy- and compute-intensive business. As of last year, it cost well over a million dollars a day to run its servers, and that cost has almost certainly risen as it trains its models for video products like Sora, and will rise further still if it continues to expand.
That is one very talented and physics defying dalmation.
So, OpenAI has a distinct imperative to attract more attention and investment, as well as to distract from the fact that ChatGPT is already shedding users, and may have already peaked. This was evident not just in the announcement of Sora, but in reports that Altman is seeking $7 trillion in investment — yes, with a “t” — for chips to power his company’s AI ambitions, from sources including the United Arab Emirates.
If Altman wants to keep the money flowing, he has to keep moving the goalposts, producing shiny, exciting new tools and talking the biggest game this side of Elon Musk — to that end, Sora fits the bill.
As for the technology itself, some of those videos look pretty neat! Some of them definitely don’t! There are obviously a ton of small things, like disappearing and replicating limbs, and that dog above that defies physics and to walk on a windowsill in Italy. But let’s not lose sight of the fact that however they look, much like the static images pumped out by Sora’s predecessors, Dall-e and Midjourney, they are less the products of “artificial intelligence” and more the products of sophisticated automation.
The difference matters, because as with the image generators, it’s often unclear how exactly the model is simply replicating — some would say “plagiarizing” — the source material it’s been trained on. Here’s a long thread documenting the distinct similarities between the Sora videos released so far and what you get on the image generator Midjourney with the same prompts.
And here’s one of the videos shared by OpenAI….
… compared to a video on the stock image and video site Shutterstock, with whom OpenAI inked a partnership last year — precisely so it could use videos like this one for training purposes.
It’s obviously not the same exact video, but it’s pretty darn similar, right down to the background, and makes for a good snapshot, I think, for gauging how novel Sora’s output really is. It’s not that Sora is generating new and amazing scenes based on the words you’re typing — it’s automating the act of stitching together renderings of extant video and images. Which is not uncool, in a vacuum! It’s indeed impressive that technology can do this.
But consider why it’s not being presented that way. For one, it just sounds boring, and is less likely to inspire the Emiratis to give Sam Altman billions of dollars. For another, if this is seen as a tool for automating video editing and special effects production, then the implications are clear — it is a tool for replacing creative labor. Bosses will use it to try to eliminate people’s jobs.
If, instead, the narrative becomes this tool will unleash creativity and make the impossible possible — or even “this is the end of reality itself” — then the goalposts are successfully moved once again, and we aren’t seeing clearly what’s really happening at its dull, boring core: A tech company wants to concentrate as much capital and power as possible, its founder wants to be as famous and influential as possible, and it has built some tools that automate creative work which it is using to achieve these ends.
If we accept OpenAI’s promotional narrative and get swept up in Sora, if we tremble at the forbidding power of its reality-distorting abilities, we make it much, much easier for the company to cash in on its still-proliferating mythologies.
“…this tool will unleash creativity and make the impossible possible…” that describes a human artist!
Indeed. In several remote ethnographic studies I did for organisations in the past year, few people are overly impressed with LLM tools and the common sentiment was "Too early. Maybe later." There is an increasing weariness with tech hype. These AI companies may be hurting themselves in the long run.