I think it will be difficult to measure productivity using coding agents in any sense until the industry starts standardizing around specific agentic engineering techniques
Vibe-coding/prompting a feature will produce much more slopulence than writing a .md spec and delegating specific pieces of implementation to the agent. But then is that better than writing it by hand?
Surely there’s an optimal equilibrium but I’m not convinced many engineers have found it yet
Love the analogy and as someone who has been coding for a very long time now and also run a startup (nonbios) in the same space as Cursor/Claude code - I broadly agree.
However I have a different take on two counts:
Firstly, those who build camrys, couldnt do so without AI. But now that they can, they have the opportunity to learn how to build Ferarris. And some of them will push through, and earn their place. We see it happening already at nonbios - non-engineers are 'learning' to build through AI, rather than simply delegating.
Secondly, those who build Ferraris - would still prefer to use AI to do it. The 'taste' is still the limiting function as it is the slowest part to get right, but everything around it is better delegated to AI. I do it myself - but your take is hot - it might not have meaningfully increased the speed. However, despite that, building with AI has a lower 'cognitive' cost, as it can take care of the low level stuff, while I focus on the high level design.
My engineers are hitting 2.5x their previous delivery velocity with agents while still passing all human code reviews. But this took a significant, ground up restructure of our development process. If you're still running Agile and wasting time being hyper prescriptive by typing out individual user stories, agentic dev isn't going to do much for you at all.
We're a product-first org, so whatever their product leadership (which is me for my team) is asking them for. The acceleration is consistent across product teams though.
Until now, I had to deal with narcissistic developers who created spaghetti code, making them untouchable in the eyes of management, even when the code was underperforming, slow and memory-intensive. Now, with AI, I can do it myself: I can iterate on domain-specific areas with ease, and companies don't depend on developer egos.
Hi Ethan, thank you for writing this article. I was very surprised to learn that "comma.ai‘s software subsidiary famously had an alarm that triggered when the codebase exceeded a certain size", but neither Google nor Bing has any search results about this. Could you provide a source for this? Thanks again!
The lore here is the “tinygrad” repo (https://github.com/tinygrad/tinygrad), it had a pre-commit check of whether the total library was under 1000 lines of code or not. Mostly as a dig at PyTorch being quite verbose. It’s since grown past that threshold but for the first 2-3 years it was there. You can watch George Hotz’s old livestreams on YouTube building the library from scratch and enforcing the limit on himself/others contributing.
the empirical evidence used to be with you, but now goes against these claim! cf. metr’s update to their open source source dev productivity study. an interesting point nonetheless. catching up to the frontier has always been easier than pushing the frontier, and in the world of code, coding agents have made that even easier. I do think pushing the frontier has become easier, too, through enabling faster iteration, but much less so than it has accelerated catching up.
Many good points, but we've known since the Mythical Man-Month that adding more engineers does not always speed up large projects, and it's possible AI agents just have many of the same limitations. A different, more interesting measure would be if newer AI-integrated teams can ship at the same rate and quality but with much smaller teams.
article makes very good points but it’s pretty disingenuous to frame an argument about coding agents with a chart that terminates in q1 of 2025 lol
oh i think taht's just ai LOL
asme trned line holds w axis extending to today - or at least i would stand by taht
Definitely not going to assume that when coding agents only really took off 6 months after your chart ends.
Some of the top engineers in the world stopped writing code as of 4-5 months ago.
eh, doubt it
i know folks like karpathy say that but karpathy's majority of projects dont' involve a ton of horrible codebases
I think it will be difficult to measure productivity using coding agents in any sense until the industry starts standardizing around specific agentic engineering techniques
Vibe-coding/prompting a feature will produce much more slopulence than writing a .md spec and delegating specific pieces of implementation to the agent. But then is that better than writing it by hand?
Surely there’s an optimal equilibrium but I’m not convinced many engineers have found it yet
How do I tell Substack I never want to see your content
What is the source of your first graph, the k-shaped productivity curve? Thanks.
Love the analogy and as someone who has been coding for a very long time now and also run a startup (nonbios) in the same space as Cursor/Claude code - I broadly agree.
However I have a different take on two counts:
Firstly, those who build camrys, couldnt do so without AI. But now that they can, they have the opportunity to learn how to build Ferarris. And some of them will push through, and earn their place. We see it happening already at nonbios - non-engineers are 'learning' to build through AI, rather than simply delegating.
Secondly, those who build Ferraris - would still prefer to use AI to do it. The 'taste' is still the limiting function as it is the slowest part to get right, but everything around it is better delegated to AI. I do it myself - but your take is hot - it might not have meaningfully increased the speed. However, despite that, building with AI has a lower 'cognitive' cost, as it can take care of the low level stuff, while I focus on the high level design.
My engineers are hitting 2.5x their previous delivery velocity with agents while still passing all human code reviews. But this took a significant, ground up restructure of our development process. If you're still running Agile and wasting time being hyper prescriptive by typing out individual user stories, agentic dev isn't going to do much for you at all.
Is your revenue exploding / accelerating?
Can you tell us how your engineers are choosing what to deliver?
We're a product-first org, so whatever their product leadership (which is me for my team) is asking them for. The acceleration is consistent across product teams though.
Until now, I had to deal with narcissistic developers who created spaghetti code, making them untouchable in the eyes of management, even when the code was underperforming, slow and memory-intensive. Now, with AI, I can do it myself: I can iterate on domain-specific areas with ease, and companies don't depend on developer egos.
Hi Ethan, thank you for writing this article. I was very surprised to learn that "comma.ai‘s software subsidiary famously had an alarm that triggered when the codebase exceeded a certain size", but neither Google nor Bing has any search results about this. Could you provide a source for this? Thanks again!
uhhhh i guess it wasn’t as famous as I thought
the guy who told me it was an engineer on our team, went to their office - said it was famous, i guess i never double checked them
The lore here is the “tinygrad” repo (https://github.com/tinygrad/tinygrad), it had a pre-commit check of whether the total library was under 1000 lines of code or not. Mostly as a dig at PyTorch being quite verbose. It’s since grown past that threshold but for the first 2-3 years it was there. You can watch George Hotz’s old livestreams on YouTube building the library from scratch and enforcing the limit on himself/others contributing.
the empirical evidence used to be with you, but now goes against these claim! cf. metr’s update to their open source source dev productivity study. an interesting point nonetheless. catching up to the frontier has always been easier than pushing the frontier, and in the world of code, coding agents have made that even easier. I do think pushing the frontier has become easier, too, through enabling faster iteration, but much less so than it has accelerated catching up.
I think its safe to say that we need more manual QA in today’s world. or at least a focus on SDET
Many good points, but we've known since the Mythical Man-Month that adding more engineers does not always speed up large projects, and it's possible AI agents just have many of the same limitations. A different, more interesting measure would be if newer AI-integrated teams can ship at the same rate and quality but with much smaller teams.
"— the problem isn’t just financial, it’s conceptual"
Lol what's going on here
The Camry vs Ferrari analogy is perfect.
AI helps you build faster, but not *better* at the top end.
Taste still wins.
Interesting approach to this shift here: https://shorturl.at/cvsda
cool article! putting some numbers on the feeling I've had for a while
Nice article overall! I was going to write a snarky comment about the charts but then I saw your bio. Have a great day!