Before performing the study, the developers in question expected the AI tools would lead to a 24 percent reduction in the time needed for their assigned tasks. Even after completing those tasks, the developers believed that the AI tools had made them 20 percent faster, on average. In reality, though, the AI-aided tasks ended up being completed 19 percent slower than those completed without AI tools.

  • JordanZ@lemmy.world
    link
    fedilink
    arrow-up
    25
    ·
    2 days ago

    Coders spent more time prompting and reviewing AI generations than they saved on coding.

    See that’s the problem right there. You’re just suppose to take its output as gospel and move on. Skip this “reviewing” step and massive productivity gains await!

    Obviously /s

  • hedgehog@ttrpg.network
    link
    fedilink
    arrow-up
    13
    arrow-down
    1
    ·
    2 days ago

    Ars points out that these findings contradict those of other experiments and then goes on to postulate as to why. I clicked on the link to the other experiment:

    when data is combined across three experiments and 4,867 developers, our analysis reveals a 26.08% increase (SE: 10.3%) in completed tasks among developers using the AI tool

    By comparison, this experiment considered 16 developers. That’s 0.3% as many as the experiments its findings contradict. Fortunately, the authors don’t claim their findings are broadly applicable. They even have a table that reads:

    We do not provide evidence that | Clarification —- | —- AI systems do not currently speed up many or most software developers | We do not claim that our developers or repositories represent a majority or plurality of software development work AI systems do not speed up individuals or groups in domains other than software de- velopment | We only study software development AI systems in the near future will not speed up developers in our exact setting | Progress is difficult to predict, and there has been substantial AI progress over the past five years [2] There are not ways of using existing AI systems more effectively to achieve positive speedup in our exact setting | Cursor does not sample many tokens from LLMs, it may not use optimal prompting/scaffolding, and domain/repository-specific training/finetuning/few-shot learning could yield positive speedup

    That said, the study has been an interesting read so far. I highly recommend reading it directly rather than just the news posts about it. Check out their own blog post: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

    I personally find the psychological effect - the devs thought they were 20% faster even afterward - to be pretty interesting, as it suggests that even if more time overall is spent, use of AI could reduce cognitive load and potentially side effects like burnout.

    I’d like to see much larger scale studies set up like this, as well as studies of other real world situations. For example, how does this affect the amount of time this takes 10,000 different developers to onboard onto an unfamiliar repository?

    • Catoblepas@piefed.blahaj.zone
      link
      fedilink
      English
      arrow-up
      7
      ·
      2 days ago

      I personally find the psychological effect - the devs thought they were 20% faster even afterward - to be pretty interesting, as it suggests that even if more time overall is spent, use of AI could reduce cognitive load and potentially side effects like burnout.

      This assumes that lower estimated time = lower stress levels, when other factors could easily be throwing off time estimation. Think the trope of someone very busy at work who realizes they’ve worked through lunch or dinner. I would have expected people who spend 20% less mental effort on something to be less engaged and more bored by the passage of time, not less.

      Also, importantly, improving worker conditions is something that can reduce burnout without the burden of massive data centers. We don’t have to make a machine that produces the illusion of speech to pay people better.

      • errer@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        4
        ·
        2 days ago

        Also: you can multitask with these things! Prompt it and let it cook for several minutes while you do something else. I feel like the people in this study must have been blankly staring at the code generating to get an overall slowdown…

    • usernamesAreTricky@lemmy.ml
      link
      fedilink
      arrow-up
      6
      ·
      2 days ago

      Though also when breaking down the study earlier to more experience developers, a similar same pattern of within margin of error change or decrease in productivity shows in that metastudy

      They are also not comparing the same metrics here. The earlier study is looking at number of commits and pull requests as a metric for productivity. The other is looking at the time per task

      Number of commits / PRs / similar kinds of metrics like lines of aren’t great for measuring productivity in general and especially here. Usage patterns with AI could very easily change your commit pattern and PR patterns without changing how much you are getting done