Lack of proper capitalization makes the text unreadable for me and other people. If you don't care to trivially make your text readable, then we for sure don't care to spend time to struggle through your text to see if there is any useful substance there.
So instead of the writer using his tokens to format the text, and take 1 minute to do that, as part of his writing process, you suggest hundreds/thousands of people each spent their own money (and extra time to do an ad hoc procedure) to fix the issues the incompetent writer purposefully caused? What a marvelously stupid suggestion, I applaud you.
If you don't like my comment, don't read it, don't spend time suggesting something silly (so silly, that even you don't follow your own advice) and simply move on.
I don’t use caps for anything and I have never had a complaint. I have dysgraphia so in theory, I should champion proper punctuation for readability, in practice read-aloud is how I viewed this article. I also get annoyed at grammar police who want me to burn more time on writing. these days with so much AI slop being written genuine writing, with flaws, should be at a premium. Shakespeare made up words out of fresh air yet if he posted on the net today he would be kicked off the platforms for not following the grammar rules… 😆and now autocorrect is annoying by capitalising my “i”s grrrrr
Lots of good points, but I think you’re overly cynical on the utility of older, cheaper models. For a lot of things, they’ll be really worthwhile, as you noted, the Haiku trick is making Claude Code a lot more cost efficient… not every task will need a GPT-5 level “worker”, just a GPT-5 level planner…
Devin was doing usage based pricing last time I checked. Those guys don’t shy away from swiping your card.
I think you’re right that burning cash to try and grow will blow up in most company’s faces, but that’s the same as it ever was. Many in betweeners will eventually raise their rates and, while the Claude Code 24/7 background looper types who ruin the party for everyone will hem and haw, most business users especially will shrug and move on. I saw this happen with Docker. They were and maybe still are troubled, but they clamped down hard eventually on all the free stuff they were giving away, and everyone moved on.
Yes- I believe we're currently in a relatively egalitarian landscape for access to the tools. Once the belt has to tighten, I think we will see a landscape where a lot of people and businesses will get caught in having to pay ever-increasing rates for AIs to stay on top. It might create a weird stratification at a societal level as "I can afford to get my kids $20K a year GPT-7 subscriptions" becomes the new "I can afford to pay for Harvard".
The last point is really interesting. Lets hope access to intelligence is not limited (the stretch of capabilitites, opoen source to closed source) by a mile.
I prefer consistent results over “glimpses of genius” for many tasks. I did some promptfoo tuning for a medical video conference “write-up assistant” on AWS Bedrock running Claude. It would freak out the doctors if Sonnet suddenly spat out a PhD-level text from its training set 🤣 My recommendation was only to use Haiku 2.5; it was better at not being a professor of medicine at random.
So now i try to use the weaker models and to improve my prompts/workflows. In reality, time and motion studies show that LLMs feel good when they trick you with a great answer, only to find that that slot machine doesn't pay out 80% of the time. It is better to optimise for what saves your time. The older models are probably good enough for 90% of work, as long as that work isn't just “stealing IP” that the models memorised.
IMO he's not necessarily making a point base on the utility of those older models, but on folks' reluctance to use them when there's a (relatively) smarter one available. Intelligent down-switching to "dumber" models seems workable to me, but is high-risk in a commoditized product environment (especially when some of the players have wallets that'd make King Midas blush)
This problem only exists if the tool allows the user to select the model they're using. But who actually wants to go through the hassle of picking the right model for the task in the first place? I can see most tools moving more towards how Claude Code works where you don't select the model directly. You can still pay up to have the best model in the repertoire but it doesn't have to mean you can always force the tool to use that best model.
Another observation: I'm a ChatGPT Pro subscriber who used to spam Deep Research and o1 Pro but honestly, recently, I've found myself valuing speed to response more often these days and preferring cheaper but faster models like 4o.
I agree that people will use weaker models for simpler use cases. Cost will become a multi-variate equation based on which models went into generating the output. Still, the ultimate end game will be the foundation model providers partnering (e.g. being bought out by OR merging with) the neoclouds. We saw this with all of the dark fiber during the Dot Com Bust. The vertical integration is the only way other than the token cost short squeeze. If the model providers decide to raise their prices, they're betting that their portfolio of models can make them a one-stop shop for developers. If they merge with the infrastructure providers, model use discounts can be built into the Pro, Max and Enterprise contracts.
After reading this, I think the one thing you've sold me on is self-hosted/on-prem models are probably the future, especially for the low spend category prosumers/independent users. With the first real wave of AI-focused hardware finally hitting shelves, and as more and more of these companies struggle to keep their subscriptions reasonably priced, it looks like there will be huge opportunities for those of us willing to drop $2,000+ for a dedicated desktop to host agents for ourselves.
How this interacts with the "people only want the bleeding edge models" is probably going to be the most interesting thing to watch. If powerful consumer hardware with ample shared memory can run models close enough to the frontier, this could be a major shift in the near future.
Doesn't Anthropic make most of its money on api? I think they *are* doing metered pricing from day one, and it *is* getting good uptake, their flat tiers just get all the press.
I've been using them on api (via openrouter) from the beginning, evidently many others have as well.
Good article, but I think wrong. The mistake is that eventually a model does what you need. At that point the better model doesn't matter. No models are over the line yet. But once they all are, you won't care what model you are running, so everyone will have a router and will use the cheapest model that can work for your query. If your claude 6 can code your app fine why use claude 7. Claude 7 is for research projects, not your basic react crud app. Improvements past that are not going to matter for people doing easy things.
Actually, I think in the long-term, neither metered pricing nor flat rate is the correct answer. Instead, you will give Claude Code a task and a budget in dollars, and it will accept or decline, kind of like a traditional consultant might. This model works because it allows both sides to consent to a economically viable transaction - the customer proposes a budget that makes sense given the business value of the task, the AI only accepts the task if it is profitable at the specified budget.
I think the reason this hasn't happened already is that the developer job needs to evolve to something else - with a business focus on how much economically valuable various tasks are, and the AIs need to develop internal tools for understanding how expensive various queries will be to satisfy. But that is the essence of a solution - only use AI when the cost of using the AI is significantly less than the business value of the result the AI produces.
When I worked in Telcoms back in the day we used to just say "Unlimited!! *" and then in really small print "*fair usage applies", then we'd kick off the <1% people who abused the limits. It worked reasonably well.
I'd also wonder how strong the prisoner's dilemma is here, when there aren't very strong usage/network effects. If Anthropic is confident that their model is the best, they'll get people paying to use up-to-the-limit, then going elsewhere until the quota resets. I don't know how much that hurts them? How much do they need to capture token-generation-share?
Been through this where users definitely favour fixed flat pricing - otherwise it is way too complicated to deal with.
Some solutions:
1. Charge a flat subscription with metered billing. I believe this is what Cursor pivoted to. Why do you think this does not work? Power users will not like it so you can be unprofitable with them but overall you would be profitable. However I do agree, this is not a good mantra to have on building a business model.
2. Charge on output. Users dislike this as well but I found that as long as it is very clear, they don't care which model you use, letting you control your own costs. Example: Outreach. You charge $1 per lead, including research and writing and sending emails etc. As long as they are good and you can forecast how much token usage would be consumed per lead, you can still use best models occasionally.
Maybe at some point we'll get to a level where version N+1 isn't that much better than version N? Look at iPhone, at some point people were upgrading all the time but now it's wait a few years when your battery no longer holds a charge.
Also if you're selling impact via wrappers you do end up having choice in the model you choose. We've done some work with customers and they care about the quality of work. If we can deliver it at a lower cost for them (which also happens with a faster response), they're happy. I suspect more and more use cases will be covered by older generation models.
What if we can give flat rates but do model routing based on the difficulty level of the query asked or several other factors like we do for query routing in RAG but this would be for models. With this businesses can commit flat rate and not dig their own graveyard either
Lack of proper capitalization makes the text unreadable for me and other people. If you don't care to trivially make your text readable, then we for sure don't care to spend time to struggle through your text to see if there is any useful substance there.
I like it, this person doesn't speak for all of us
Yeah, gee, that’s the kind of content where comments should really focus on capitalizing letters.
Come on!
Great material.
Personally, I’m too OCD to type and publish like this. But substance over form any day.
I didn’t even notice. I had to go back and check to see that there were no capitals.
then take 2 seconds and burn some tokens having claude capitalize it for you.
So instead of the writer using his tokens to format the text, and take 1 minute to do that, as part of his writing process, you suggest hundreds/thousands of people each spent their own money (and extra time to do an ad hoc procedure) to fix the issues the incompetent writer purposefully caused? What a marvelously stupid suggestion, I applaud you.
You are not wrong but you didn't have to say it 💀.
I like it, this person doesn't speak for all of us
Ok boomer
its a good article, mikko
this isnt middle school. if you dont like the formatting, dont read it and move on. great material and great ideas here.
If you don't like my comment, don't read it, don't spend time suggesting something silly (so silly, that even you don't follow your own advice) and simply move on.
dont capitalize a thing i say
I don’t use caps for anything and I have never had a complaint. I have dysgraphia so in theory, I should champion proper punctuation for readability, in practice read-aloud is how I viewed this article. I also get annoyed at grammar police who want me to burn more time on writing. these days with so much AI slop being written genuine writing, with flaws, should be at a premium. Shakespeare made up words out of fresh air yet if he posted on the net today he would be kicked off the platforms for not following the grammar rules… 😆and now autocorrect is annoying by capitalising my “i”s grrrrr
Lots of good points, but I think you’re overly cynical on the utility of older, cheaper models. For a lot of things, they’ll be really worthwhile, as you noted, the Haiku trick is making Claude Code a lot more cost efficient… not every task will need a GPT-5 level “worker”, just a GPT-5 level planner…
Devin was doing usage based pricing last time I checked. Those guys don’t shy away from swiping your card.
I think you’re right that burning cash to try and grow will blow up in most company’s faces, but that’s the same as it ever was. Many in betweeners will eventually raise their rates and, while the Claude Code 24/7 background looper types who ruin the party for everyone will hem and haw, most business users especially will shrug and move on. I saw this happen with Docker. They were and maybe still are troubled, but they clamped down hard eventually on all the free stuff they were giving away, and everyone moved on.
fair, altho for the consumer market of flat subscriptions, there seems to be no way through
Yes- I believe we're currently in a relatively egalitarian landscape for access to the tools. Once the belt has to tighten, I think we will see a landscape where a lot of people and businesses will get caught in having to pay ever-increasing rates for AIs to stay on top. It might create a weird stratification at a societal level as "I can afford to get my kids $20K a year GPT-7 subscriptions" becomes the new "I can afford to pay for Harvard".
The last point is really interesting. Lets hope access to intelligence is not limited (the stretch of capabilitites, opoen source to closed source) by a mile.
there's always deepseek...
How big is the market, really, for consumer paid AI services? I suspect only a minority of consumers would pay even a subscription for AI.
I prefer consistent results over “glimpses of genius” for many tasks. I did some promptfoo tuning for a medical video conference “write-up assistant” on AWS Bedrock running Claude. It would freak out the doctors if Sonnet suddenly spat out a PhD-level text from its training set 🤣 My recommendation was only to use Haiku 2.5; it was better at not being a professor of medicine at random.
So now i try to use the weaker models and to improve my prompts/workflows. In reality, time and motion studies show that LLMs feel good when they trick you with a great answer, only to find that that slot machine doesn't pay out 80% of the time. It is better to optimise for what saves your time. The older models are probably good enough for 90% of work, as long as that work isn't just “stealing IP” that the models memorised.
IMO he's not necessarily making a point base on the utility of those older models, but on folks' reluctance to use them when there's a (relatively) smarter one available. Intelligent down-switching to "dumber" models seems workable to me, but is high-risk in a commoditized product environment (especially when some of the players have wallets that'd make King Midas blush)
Perplexity seems to downswitch sometimes now in some form and it's painful, so I have some inclination for the argument.
This problem only exists if the tool allows the user to select the model they're using. But who actually wants to go through the hassle of picking the right model for the task in the first place? I can see most tools moving more towards how Claude Code works where you don't select the model directly. You can still pay up to have the best model in the repertoire but it doesn't have to mean you can always force the tool to use that best model.
Another observation: I'm a ChatGPT Pro subscriber who used to spam Deep Research and o1 Pro but honestly, recently, I've found myself valuing speed to response more often these days and preferring cheaper but faster models like 4o.
I agree that people will use weaker models for simpler use cases. Cost will become a multi-variate equation based on which models went into generating the output. Still, the ultimate end game will be the foundation model providers partnering (e.g. being bought out by OR merging with) the neoclouds. We saw this with all of the dark fiber during the Dot Com Bust. The vertical integration is the only way other than the token cost short squeeze. If the model providers decide to raise their prices, they're betting that their portfolio of models can make them a one-stop shop for developers. If they merge with the infrastructure providers, model use discounts can be built into the Pro, Max and Enterprise contracts.
Just wanted to say that you are putting out some of the best writing on these topics I've seen. 10 out of 10.
After reading this, I think the one thing you've sold me on is self-hosted/on-prem models are probably the future, especially for the low spend category prosumers/independent users. With the first real wave of AI-focused hardware finally hitting shelves, and as more and more of these companies struggle to keep their subscriptions reasonably priced, it looks like there will be huge opportunities for those of us willing to drop $2,000+ for a dedicated desktop to host agents for ourselves.
How this interacts with the "people only want the bleeding edge models" is probably going to be the most interesting thing to watch. If powerful consumer hardware with ample shared memory can run models close enough to the frontier, this could be a major shift in the near future.
Doesn't Anthropic make most of its money on api? I think they *are* doing metered pricing from day one, and it *is* getting good uptake, their flat tiers just get all the press.
I've been using them on api (via openrouter) from the beginning, evidently many others have as well.
Good article, but I think wrong. The mistake is that eventually a model does what you need. At that point the better model doesn't matter. No models are over the line yet. But once they all are, you won't care what model you are running, so everyone will have a router and will use the cheapest model that can work for your query. If your claude 6 can code your app fine why use claude 7. Claude 7 is for research projects, not your basic react crud app. Improvements past that are not going to matter for people doing easy things.
Actually, I think in the long-term, neither metered pricing nor flat rate is the correct answer. Instead, you will give Claude Code a task and a budget in dollars, and it will accept or decline, kind of like a traditional consultant might. This model works because it allows both sides to consent to a economically viable transaction - the customer proposes a budget that makes sense given the business value of the task, the AI only accepts the task if it is profitable at the specified budget.
I think the reason this hasn't happened already is that the developer job needs to evolve to something else - with a business focus on how much economically valuable various tasks are, and the AIs need to develop internal tools for understanding how expensive various queries will be to satisfy. But that is the essence of a solution - only use AI when the cost of using the AI is significantly less than the business value of the result the AI produces.
Great piece!
When I worked in Telcoms back in the day we used to just say "Unlimited!! *" and then in really small print "*fair usage applies", then we'd kick off the <1% people who abused the limits. It worked reasonably well.
I'd also wonder how strong the prisoner's dilemma is here, when there aren't very strong usage/network effects. If Anthropic is confident that their model is the best, they'll get people paying to use up-to-the-limit, then going elsewhere until the quota resets. I don't know how much that hurts them? How much do they need to capture token-generation-share?
very insightful - neocloud sounds interesting.
Been through this where users definitely favour fixed flat pricing - otherwise it is way too complicated to deal with.
Some solutions:
1. Charge a flat subscription with metered billing. I believe this is what Cursor pivoted to. Why do you think this does not work? Power users will not like it so you can be unprofitable with them but overall you would be profitable. However I do agree, this is not a good mantra to have on building a business model.
2. Charge on output. Users dislike this as well but I found that as long as it is very clear, they don't care which model you use, letting you control your own costs. Example: Outreach. You charge $1 per lead, including research and writing and sending emails etc. As long as they are good and you can forecast how much token usage would be consumed per lead, you can still use best models occasionally.
Wdyt?
hah, was just thinking about this. how much of this feels like a race to the bottom - economically, creatively, and strategically.
1. "Outcome-based" pricing feels like rebadged professional services (this is where margins are being promised)
2. AI isn’t lifting humanity, it’s extracting margin (SDR spam, AI waifus, AI rebrands)
3. infra and FM providers are extracting on the promise of democratizing upstream, everyone downstream gets squeezed
just a few disjointed thoughts (planning to frame up a post later).
Good read.
Maybe at some point we'll get to a level where version N+1 isn't that much better than version N? Look at iPhone, at some point people were upgrading all the time but now it's wait a few years when your battery no longer holds a charge.
Also if you're selling impact via wrappers you do end up having choice in the model you choose. We've done some work with customers and they care about the quality of work. If we can deliver it at a lower cost for them (which also happens with a faster response), they're happy. I suspect more and more use cases will be covered by older generation models.
What if we can give flat rates but do model routing based on the difficulty level of the query asked or several other factors like we do for query routing in RAG but this would be for models. With this businesses can commit flat rate and not dig their own graveyard either
不是按量收费这个模式跑不通,我认为是大部分人没有意识到ai消费和传统订阅制的区别。Netflix也好Spotify也好它们都是提供品质可控的商品卖给你,你买了就能得到预期的服务。而ai则是盲盒,你需要ai来解决一个能够被解决的问题,花费了一段时间进行了多轮对话,浪费了时间和精力,最终得到一个不可用的结果。你去饭馆吃出了苍蝇你会按原价结账吗?ai输出的不可控导致它作为商品按量销售对用户的吸引太低。而按周期付费则缓解这一点,使得用户不会那么直接感受到自己的钱“损失”了。如果想按量收费那么就要给所有用户每天较少的免费额度来吸引轻量低频用户,对高频用户按量计费,但给用户一定举报次数可以举报比如10000token以内的对话,审核举报有效把token退回用户余额。当用户损失可控事按量计费就不在是问题。
AI消费和传统订阅制的最大区别,其实是边际成本。
拿你举Netflix做例子的话,Netflix拍电视剧属于固定开支,多一个用户少一个用户,成本差异很小。网络CDN带宽费用平均到一个用户头上1-2人民币,相比于Netflix现在一个用户70人民币的订阅可以忽略不计。
AI大模型,用户用多用少,区别是很大的。一个Deep Research任务就会消耗掉3美元,Claude Code读一个大的代码库就是5美元,20美元一个月的套餐承受不了太多。
I don't get it. Just use the models that a few a months old?
Your math is way off, it would be $1 x 3 x 24 = $72 per day at the same cost as current deep research, not $4320
oh yea major brainfart moment LOL
I think I wrote $250 somewhere else, my bad good catch
how is every single aspect of this post wrong? impressive