37 Comments
User's avatar
Mikko's avatar

Lack of proper capitalization makes the text unreadable for me and other people. If you don't care to trivially make your text readable, then we for sure don't care to spend time to struggle through your text to see if there is any useful substance there.

Expand full comment
eeeee's avatar

I like it, this person doesn't speak for all of us

Expand full comment
Nico Appel's avatar

Yeah, gee, that’s the kind of content where comments should really focus on capitalizing letters.

Come on!

Great material.

Personally, I’m too OCD to type and publish like this. But substance over form any day.

Expand full comment
Shjs's avatar

I didn’t even notice. I had to go back and check to see that there were no capitals.

Expand full comment
eeeee's avatar

I like it, this person doesn't speak for all of us

Expand full comment
Samuel Hammond's avatar

then take 2 seconds and burn some tokens having claude capitalize it for you.

Expand full comment
Tobes's avatar

dont capitalize a thing i say

Expand full comment
Ffffffff's avatar

Ok boomer

Expand full comment
Vincent's avatar

its a good article, mikko

Expand full comment
Nathan's avatar

Lots of good points, but I think you’re overly cynical on the utility of older, cheaper models. For a lot of things, they’ll be really worthwhile, as you noted, the Haiku trick is making Claude Code a lot more cost efficient… not every task will need a GPT-5 level “worker”, just a GPT-5 level planner…

Devin was doing usage based pricing last time I checked. Those guys don’t shy away from swiping your card.

I think you’re right that burning cash to try and grow will blow up in most company’s faces, but that’s the same as it ever was. Many in betweeners will eventually raise their rates and, while the Claude Code 24/7 background looper types who ruin the party for everyone will hem and haw, most business users especially will shrug and move on. I saw this happen with Docker. They were and maybe still are troubled, but they clamped down hard eventually on all the free stuff they were giving away, and everyone moved on.

Expand full comment
Ethan Ding's avatar

fair, altho for the consumer market of flat subscriptions, there seems to be no way through

Expand full comment
Nathan's avatar

Yes- I believe we're currently in a relatively egalitarian landscape for access to the tools. Once the belt has to tighten, I think we will see a landscape where a lot of people and businesses will get caught in having to pay ever-increasing rates for AIs to stay on top. It might create a weird stratification at a societal level as "I can afford to get my kids $20K a year GPT-7 subscriptions" becomes the new "I can afford to pay for Harvard".

Expand full comment
Tommy's avatar

The last point is really interesting. Lets hope access to intelligence is not limited (the stretch of capabilitites, opoen source to closed source) by a mile.

Expand full comment
Ben's avatar

IMO he's not necessarily making a point base on the utility of those older models, but on folks' reluctance to use them when there's a (relatively) smarter one available. Intelligent down-switching to "dumber" models seems workable to me, but is high-risk in a commoditized product environment (especially when some of the players have wallets that'd make King Midas blush)

Expand full comment
Nathan's avatar

Perplexity seems to downswitch sometimes now in some form and it's painful, so I have some inclination for the argument.

Expand full comment
Earl Lee's avatar

This problem only exists if the tool allows the user to select the model they're using. But who actually wants to go through the hassle of picking the right model for the task in the first place? I can see most tools moving more towards how Claude Code works where you don't select the model directly. You can still pay up to have the best model in the repertoire but it doesn't have to mean you can always force the tool to use that best model.

Another observation: I'm a ChatGPT Pro subscriber who used to spam Deep Research and o1 Pro but honestly, recently, I've found myself valuing speed to response more often these days and preferring cheaper but faster models like 4o.

Expand full comment
Claude L. Johnson Jr.'s avatar

I agree that people will use weaker models for simpler use cases. Cost will become a multi-variate equation based on which models went into generating the output. Still, the ultimate end game will be the foundation model providers partnering (e.g. being bought out by OR merging with) the neoclouds. We saw this with all of the dark fiber during the Dot Com Bust. The vertical integration is the only way other than the token cost short squeeze. If the model providers decide to raise their prices, they're betting that their portfolio of models can make them a one-stop shop for developers. If they merge with the infrastructure providers, model use discounts can be built into the Pro, Max and Enterprise contracts.

Expand full comment
Brian Balfour's avatar

Just wanted to say that you are putting out some of the best writing on these topics I've seen. 10 out of 10.

Expand full comment
Mert Deveci's avatar

very insightful - neocloud sounds interesting.

Been through this where users definitely favour fixed flat pricing - otherwise it is way too complicated to deal with.

Some solutions:

1. Charge a flat subscription with metered billing. I believe this is what Cursor pivoted to. Why do you think this does not work? Power users will not like it so you can be unprofitable with them but overall you would be profitable. However I do agree, this is not a good mantra to have on building a business model.

2. Charge on output. Users dislike this as well but I found that as long as it is very clear, they don't care which model you use, letting you control your own costs. Example: Outreach. You charge $1 per lead, including research and writing and sending emails etc. As long as they are good and you can forecast how much token usage would be consumed per lead, you can still use best models occasionally.

Wdyt?

Expand full comment
Peter Tanham's avatar

Great piece!

When I worked in Telcoms back in the day we used to just say "Unlimited!! *" and then in really small print "*fair usage applies", then we'd kick off the <1% people who abused the limits. It worked reasonably well.

I'd also wonder how strong the prisoner's dilemma is here, when there aren't very strong usage/network effects. If Anthropic is confident that their model is the best, they'll get people paying to use up-to-the-limit, then going elsewhere until the quota resets. I don't know how much that hurts them? How much do they need to capture token-generation-share?

Expand full comment
Dan's avatar
Aug 4Edited

After reading this, I think the one thing you've sold me on is self-hosted/on-prem models are probably the future, especially for the low spend category prosumers/independent users. With the first real wave of AI-focused hardware finally hitting shelves, and as more and more of these companies struggle to keep their subscriptions reasonably priced, it looks like there will be huge opportunities for those of us willing to drop $2,000+ for a dedicated desktop to host agents for ourselves.

How this interacts with the "people only want the bleeding edge models" is probably going to be the most interesting thing to watch. If powerful consumer hardware with ample shared memory can run models close enough to the frontier, this could be a major shift in the near future.

Expand full comment
stochastic parrot's avatar

Your math is way off, it would be $1 x 3 x 24 = $72 per day at the same cost as current deep research, not $4320

Expand full comment
Ethan Ding's avatar

oh yea major brainfart moment LOL

I think I wrote $250 somewhere else, my bad good catch

Expand full comment
D Little's avatar

Feels like your assumption about power users being consumer grade is very strange.

If we assume power users are business/enterprise grade, metered pricing is moderately normalized. (See: aws, cloud, observability platforms etc).

Anthropic is aggressively targeting Enterprise and their API is metered from the drop.

Expand full comment
Vipul Devluk's avatar

If there is surplus compute available in the market (i.e., data center overbuild), token costs would end up much lower. Mimicking the fiber/telecom boom/bust in the 90s/00s. Flattening scaling laws would lead to an overbuild. How likely is that?

Odds get higher if there is a winning frontier model(s)

Expand full comment
Avenging Angels's avatar

Great article Ethan! The economics you describe resembles the economics of broadband connections.

The solution there was some combination of a flat price plan with capped usage ("$50/month for 500 MB of data") transitioning to unlimited with fair use policy ("unlimited data but we kick you off if you're using 10x the median user)

Exciting times!

Expand full comment
Josue A. Bogran's avatar

Very, very solid article!

Expand full comment
Ashwin Raghav's avatar

Nice article!

What I’m curious about is whether today’s outdated models (yesterday’s frontiers) are actually sustainable to operate at their current prices for foundational model companies. At the surface it seems like building on top of these models improving margins from day one. I don't think the unit economics to operate them are that much cheaper in practice. wdyt?

Expand full comment
Claude L. Johnson Jr.'s avatar

Even though I hadn't thought about the token "short squeeze" as a risk, the cost of compute for these models struck me recently. After Anthropic made their limits changes, it hit me. I came to the same conclusion you did. The only way for these companies to survive, if not by increasing their prices, is to vertically integrate. They will have to merge with neoclouds and other providers of infrastructure.

There's going to be a massive shakeout in this industry, much like the Dot Com infrastructure build led to. There will be neocloud roll-ups, private equity style. There will be writedowns on Nvidia GPUs en masse. There will be mergers of software providers like Anthropic with neoclouds like CoreWeave or Crusoe or Lambda so they can differentiate on the hosting and other ancillary services in an integrated way. Using Anthropic's models (or any foundation model provider outside of the open source providers) will incur additional cost; you'll get discounts if you're a customer of the infrastructure provider.

So glad I didn't invest into any of the SPVs or secondary syndicates for OpenAI stock. Sheesh. I guess I have to put my money on Oracle acquiring them for pennies.

Expand full comment
kopmoc's avatar

不是按量收费这个模式跑不通,我认为是大部分人没有意识到ai消费和传统订阅制的区别。Netflix也好Spotify也好它们都是提供品质可控的商品卖给你,你买了就能得到预期的服务。而ai则是盲盒,你需要ai来解决一个能够被解决的问题,花费了一段时间进行了多轮对话,浪费了时间和精力,最终得到一个不可用的结果。你去饭馆吃出了苍蝇你会按原价结账吗?ai输出的不可控导致它作为商品按量销售对用户的吸引太低。而按周期付费则缓解这一点,使得用户不会那么直接感受到自己的钱“损失”了。如果想按量收费那么就要给所有用户每天较少的免费额度来吸引轻量低频用户,对高频用户按量计费,但给用户一定举报次数可以举报比如10000token以内的对话,审核举报有效把token退回用户余额。当用户损失可控事按量计费就不在是问题。

Expand full comment
Andrea Russo's avatar

great read! thanks

Expand full comment
Oliver Angélil, PhD's avatar

I don't get it. Just use the models that a few a months old?

Expand full comment