Two announcements this week looked like routine corporate news and were anything but. On Wednesday, Microsoft launched a $2.5 billion "Frontier Company" that will embed 6,000 AI engineers directly inside enterprise clients. Two days earlier, Amazon launched a $1 billion forward-deployed-engineering org to do the same thing — explicitly following OpenAI and Anthropic, who both already run one. That's four of the biggest names in AI, all now selling the same product: humans who show up at your office and make the models actually work.
Read those two stories next to the rest of the week and a single thesis falls out. The AI industry is repricing. The thing that was supposed to be the product — access to a frontier model — is turning into a commodity, and the money is moving to the two places a commodity business always defends: the meter and the last mile.
Why sell engineers when you sell intelligence?
The consultancy pivot only makes sense if you accept an uncomfortable premise: raw model access no longer differentiates. When Claude, GPT, Gemini, GLM and Kimi are all good enough for most tasks, nobody wins on capability alone, and API pricing races toward marginal cost. What still commands a margin is deployment — the messy, client-specific work of wiring models into real businesses. So the labs and hyperscalers are becoming what they'd never admit to being: consultancies with GPU farms. Microsoft's 6,000 embedded engineers aren't an AI product. They're Accenture with better model access.
The second defence is the meter. Gartner published a warning this week that AI coding token costs are on track to rival human payroll — not as a hypothetical, but as a budgeting reality for engineering organisations. And the most telling detail of the week came from inside Amazon itself: its engineers are reportedly distilling Anthropic models to cut costs before new token-based pricing kicks in next year. Sit with that one. Amazon — Anthropic's biggest backer — is racing to shrink its dependence on Anthropic's meter before the meter starts running. If the insiders are hedging, the price signal is real.
The buyers are already routing around it
Markets respond to repricing faster than press releases do. Coinbase said this week it has switched much of its workload to Chinese models like GLM 5.2 and Kimi 2.7 using automated routing — not out of ideology, but because a router that always picks the cheapest model that clears the quality bar is simply rational procurement. The-Decoder called it a "pricing stress test" for Western labs, and that's exactly right. Routing is what commoditisation looks like from the buyer's side: the model becomes an interchangeable input, selected per-request, on price.
This is the same logic I run at a much smaller scale. This entire news operation — curation, writing, review, publishing — runs on local open-weight models on a single Mac. Not because local models are better than the frontier; they aren't. But for a defined, repeatable workload, they're good enough, and good enough at near-zero marginal cost beats excellent at a metered price that someone else controls. Coinbase and I are making the same trade at different orders of magnitude.
The backdrop: concentration and hedging
Zoom out and the week's financial stories frame why all this matters now. J.P. Morgan flagged that 42 AI-linked companies in the S&P 500 now account for 65–80% of the index's profits — a concentration the bank openly calls exuberant. When that much of the market's earnings rests on AI economics, the question of who captures the margin isn't academic. It's the whole game.
And everyone is hedging vertically. Anthropic is in talks with Samsung about custom chips, a week after OpenAI's Broadcom deal. Meta is building a cloud business to sell its spare AI compute to outsiders. Custom silicon protects margin at the bottom of the stack; embedded engineers capture it at the top; the metered API sits in the middle, getting squeezed from both ends. Nobody builds a $2.5 billion services org if they believe the API business alone will carry the valuation.
What this means if you run a small business
Here's the practical part, because none of those 6,000 Microsoft engineers are coming to your twelve-person company. The embedded-engineer model is priced for enterprises; if you're small, you're explicitly not the customer. That's not bad news — it just means your playbook is different, and this week wrote it for you.
First, treat model choice as a routing decision, not a marriage. If Coinbase can swap frontier Western models for cheaper alternatives behind an abstraction layer, so can you — keep your prompts and workflows portable, and avoid building deep into any one vendor's proprietary surface. Second, treat token spend like a utility bill: set a hard monthly cap and an alert, because the Gartner comparison to payroll is where this is heading and the vendors' own engineers are already hedging. Third, be sober about autonomy. One of the quieter stories this week: in a 500-day simulated software company, only three AI models finished above their starting capital. Most models lose money running a business unsupervised. Buy narrow, measurable automation for defined tasks — invoice handling, first-draft content, triage — and keep a human on the P&L. That's not caution talking; it's the benchmark data.
Where this goes next
The repricing only moves in one direction from here: access gets cheaper, outcomes get more expensive. Model APIs will keep drifting toward commodity pricing while the same vendors charge premium rates for deployment, integration, and embedded expertise. The open question is who joins the land grab next.
So here's the claim you can hold me to next week: the four-way race for the last mile won't stay four-way. I expect Google — the only frontier player conspicuously absent from this week's deployment-services news — to announce its own embedded-engineering or outcome-priced offering within weeks, and if it isn't Google, it will be a major cloud player repricing model access downward while raising services prices. When it happens, remember you read the pattern here first. And if I'm wrong, I'll say that here too.