Tokens in, tokens out

AI use is being throttled. OpenAI has reduced what you can do with Codex (and, introduced a more expensive subscription for the original abilities). Anthropic has been accused of pinching Claude for weeks now, and was caught removing Claude Code from the pro subscription entirely. Microsoft has halted new subscriptions on Copilot. And, my laptop got so hot trying to run Qwen 30B that I had to pause, in a kind of natural session limit if you will.

How will we cope? Spend more? Do less? Do... different?

Assuming spending more will come anyway, and doing less is not an option, we have little choice but to do differently. That means reducing tokens in, tokens out or both.

Tokens in

There are interesting initiatives happening, ranging Nuno Maduro's PAO to "json prompting". Providing tools tailored to agent usage is a fantastic idea, now and in the future. Communities such as Laravel are well engaged and you can include their efforts almost.. effortlessly (mind the ads). Trying to improve your prompting is good, but has two trap doors I think we need to avoid.

First off, as has been pointed out, prompting in specific structures approaches coding and kind of defeats the purpose of prompting. It's not a human friendly way of engaging and there is no real way to verify if it's working better than what else you could be doing with your time and energy. That doesn't mean agents can't benefit from structure: I see a bright future for event modelling producing structured output that deterministic and non-deterministic agents can implement in a shared effort.

Because secondly, and more importantly, the evolution of coding harnesses will not work with or towards your creative approach. They will work towards assisting humans to express themselves with low friction.

Tokens out

That brings us to the second option, where I see the biggest impact at the medium to long term. The future of software engineering, of product engineering, lies in verification and feedback loops. The agents need two things to make sure the tokens out are keepers: a good specification and a quality gate. Rather than optimizing for tokens in, we need to optimize for the quality of tokens out.

What is a good specification? (Un)fortunately nothing new. A good spec has low ambiguity, high decomposability (slicing), and is lean (low coupling). DDD, event modelling and probably dozens of other initiatives. What was true for human executed projects is true for AI executed projects.

Costs are "given", outcomes are not

Hired engineers are a fixed cost. We can even see a future where spinned up agents are a fixed cost. An engineering team of 5 hired "agent managers" and 5 or 10 or 200 agents per manager. The budget is what the budget is and it is unlikely that you are prompting in a 10x manner. Sorry. Also, yaml is cheaper than json.

Where actual research has shown engineering costs to balloon: rework. Fixes, changes, refactoring, they're are all waste. There is nothing special about AI here. The same old patterns are just exaggerated, scaled up.

The same lesson needs to be learned. If you want similar outcomes for less cost, just wait, and the efficiency of product evolution will fix it for you. But if you want better outcomes for similar cost, then you need to up your game.

Understand your problems, through event storming and event modelling. Decompose them, and compose solutions. Stop the propagation of rework.