I noticed it first with GitHub Copilot. The billing model changed — you no longer buy a set number of executions. You now buy token capacity. Use an expensive frontier model and your allocation drains fast. Use a smaller model and it stretches. Seems fair. Also feels like a signal.
Then Anthropic quietly ran a pricing experiment. A subset of users landing on the pricing page saw Claude Code removed from the $20 plan — $100 to access it. Not a glitch. Not a mistake. A deliberate test to measure how many people would pay the higher price. Some did. The test ran, the data was collected, and the original pricing came back. But the intent was clear: they are actively looking for where the ceiling is.
Two changes, two companies, a few days apart. The message is the same: the free ride is getting more expensive.
The Numbers Behind the Hype
There is a popular argument that goes: "AI companies make money on every request, so the economics are fine." The logic sounds reasonable until you look at what a request actually costs to produce.
Training a frontier model is not a one-time fixed cost that gets amortised cleanly. It is a continuous, compounding investment — infrastructure, researchers, compute clusters, electricity, the next training run, the one after that. The inference revenue from a single model version needs to carry all of that. For most of these companies right now, it does not.
OpenAI raised $120 billion at a burn rate somewhere between $5 and $7 billion a month. That money funds roughly 18 to 24 months of operations. That is not a company in a comfortable position — that is a company on a clock, raising more capital before the previous round expires.
Anthropic is in the same boat. Releasing Opus costs an enormous amount. If the pricing on that model does not recoup the training investment, the model runs at a loss no matter how many API calls go through it. Revenue per call is real. Profit per call is a different question.
Why Google Is Playing a Different Game
Google has been noticeably quieter about the "AI will change everything" messaging compared to the others. If you're paying attention, that quietness is itself informative.
They pour over $100 billion a year into AI infrastructure and still turn a profit. They do not depend on external investors to survive. They are not on a fundraising timeline.
A lot of the loudest AI hype — the "you're about to be replaced," the AGI-is-months-away framing — is not purely marketing. It is fundraising. When you need to raise $120 billion from investors, you tell a story big enough to justify the number. Google does not need that story. They can just build quietly and wait for everyone else to figure out the economics.
Uber Is the Case Study Nobody Wanted
Uber told their engineers to use AI as much as possible. They tied performance reviews to AI adoption. And then, four months into the year, they had spent their entire annual AI budget.
This is the inevitable result of mandating maximum usage without understanding what usage costs. LLM spend is not like a SaaS seat licence — it is not flat. Every prompt, every long context window, every agent loop running in the background compounds. At thousands of engineers, that compounds fast.
The reaction will not be "stop using AI." It will be usage caps, tiered access by seniority, approval workflows for high-cost models, and a whole new category of tooling for tracking and optimising LLM spend. FinOps is already a discipline for cloud infrastructure. The same thing is coming for AI.
What I Am Actually Changing
I use AI daily — for writing test scripts, reviewing code, debugging weird edge cases, drafting copy. The tools are genuinely useful and I am not going back to hand-coding everything. But a few things I am now doing differently:
I reach for a smaller model first. The gap between a frontier model and a mid-tier model is narrower than the price gap suggests for most routine tasks. Test generation, summarisation, quick code review — a cheaper model handles this well. I only pull in the expensive one when the task actually needs it.
I treat API costs as a project input. If I am recommending an AI-assisted workflow to a client, or building something that calls an LLM, the token cost is in the estimate now. A year ago it was rounding error. It is not anymore.
I assume any flat-rate plan will tighten. Claude Code on a $20 plan today may not look the same in six months. Building a workflow that depends on current generous limits is building on ground that might shift.
The technology itself is not going anywhere. The productivity gains are real and the use cases will keep expanding. But the current pricing — heavily subsidised, deliberately underpriced to acquire users — was always a temporary state. The companies running these models have to become financially viable at some point.
That point is getting closer.