Vibey AI
We're measuring how good an AI model is on vibes now. Or: The beatings will continue until morale improves.
Being an IT practitioner over the past few weeks has been tiring. Among many other things, which others have written on more eloquently and precisely than I ever could, we’ve seen this full-steam ahead push that AI will be the future and we will all like it and the AI beatings will continue until morale improves.
The big news of the past couple of days is that OpenAI, the organization behind the popular ChatGPT system, has released the next iteration of its AI system, GPT-4.5. And this is the big improvement that will bring us ever closer to this AI future where everything is good? If we’re measuring on technical stuff, no. If we’re measuring on vibes, then yes. From Beni Edwards in Ars Technica:
Upon 4.5’s release, OpenAI CEO Sam Altman did some expectation tempering on X, writing that the model is strong on vibes but low on analytical strength. “It is the first model that feels like talking to a thoughtful person to me,” he wrote. He then added further down in his post, “a heads up: this isn’t a reasoning model and won’t crush benchmarks. it’s a different kind of intelligence and there’s a magic to it i haven’t felt before.”
Ye gods. No wonder an unnamed AI expert quoted in the same article thinks GPT-4.5 is a lemon. And for being something limited to the $200/month ChatGPT Pro account tier or for $75 per one million tokens via the OpenAI API. (A token is a unit of measurement to break down natural language into things an AI or large language model can understand. Most AI models are no more than $1 per one million tokens, with some of the higher-end models in the $2-$3 range per one million tokens.) Especially given the price, this has to be a letdown.
Two things can be true: I can be an incredible skeptic about AI but still use it myself. It’s been great for some things, but it’s usually things that have defined outcomes. Using it as a pair programmer within Visual Studio Code and Val Town helped me tremendously when reconfiguring this website off of WordPress. It’s fun to run local AI models on my own devices and interact with them, rather than using online services with questionable data privacy.
But it’s also something that’s given me a lot of pause. For the past few years, I’ve been going to tech conferences here in Minnesota. The most frequent series is the Minnesota Microsoft 365 User Group, which meets online monthly and has two all-day in-person sessions at the Microsoft offices in Edina. And while this is a great conference series and I’ve learned a lot, the push that they have toward all-Copilot™-all-the-time is a bit tedious. I get it’s the new hotness, but when your marketing is based on the fear of missing out rather than practical applications, you lose me.
I’m often hesitant to write about specific things we’re doing where I work, but we’ve scaled back our explorations into AI because, quite simply, it didn’t fit in with the service culture we’re trying to create and we found it to produce errors and omissions just enough so as to be untrustworthy. As many have said: “AI’s an enthusiastic intern, but you have to check its work.” We were spending as much time checking the work of the AI process rather than if we just did the work ourselves using the methods we know work for us.
At least for me, I don’t see consuming electricity on orders of magnitude compared to small nations or data centers the size of small city-states for “vibes” to be worthwhile.