Why AI Isn’t Ready to Be a Real Coder - IEEE Spectrum

Embark on a journey into the dynamic world of AI development with our blog series, where we explore the latest and most innovative AI Developer Tools. As we delve into the tools and technologies shaping the future of artificial intelligence, discover how these resources empower developers to create intelligent, efficient, and scalable solutions. Whether you’re a seasoned AI professional or just stepping into the realm of machine learning, these articles aim to provide insights, tips, and practical guidance to navigate the diverse landscape of AI Developer Tools. Join us in unraveling the potential and staying at the forefront of the ever-evolving field of AI development.
AI’s coding evolution hinges on collaboration and trust
Rina Diane Caballar is a Contributing Editor covering tech and its intersections with science, society, and the environment.
How good can AI coding tools get?
Artificial intelligence (AI) has transformed the coding sphere, with AI coding tools completing source code, correcting syntax errors, creating inline documentation, and understanding and answering questions about a codebase. As the technology advances beyond automating programming tasks, the idea of full autonomy looms large. Is AI ready to be a real coder?
A new paper says not yet—and maps out exactly why. Researchers from Cornell University, MIT CSAIL, Stanford University, and UC Berkeley highlight key challenges that today’s AI models face and outline promising research directions to tackle them. They presented their work at the 2025 International Conference on Machine Learning.
The study offers a clear-eyed reality check amid all the hype. “At some level, the technology is powerful and useful already, and it has gotten to the point where programming without these tools just feels primitive,” says Armando Solar-Lezama, a co-author of the paper and an associate director at MIT CSAIL, where he leads the computer-aided programming group. He argues, however, that AI-powered software development has yet to reach “the point where you can really collaborate with these tools the way you can with a human programmer.”
According to the study, AI still struggles with several crucial facets of coding: sweeping scopes involving huge codebases, the extended context lengths of millions of lines of code, higher levels of logical complexity, and long-horizon or long-term planning about the structure and design of code to maintain code quality.
Koushik Sen, a professor of computer science at UC Berkeley and also a co-author of the paper, cites fixing a memory safety bug as an example. (Such bugs can cause crashes, corrupt data, and open security vulnerabilities.) Software engineers might approach debugging by first determining where the error originates, “which might be far away from where it’s crashing, especially in a large codebase,” Sen explains. They’ll also have to understand the semantics of the code and how it works, and make changes based on that understanding. “You might have to not only fix that bug but change the entire memory management,” he adds.

These kinds of complex tasks can be difficult for AI development tools to navigate, resulting in hallucinations about where the bug is or its root cause, as well as irrelevant suggestions or code fixes with subtle problems. “There are many failure points, and I don’t think the current LLMs [large language models] are good at handling that,” says Sen.
Among the various paths suggested by the researchers toward solving these AI coding challenges—such as training code LLMs to better collaborate with humans and ensuring human oversight for machine-generated code—the human element endures.
“A big part of software development is building a shared vocabulary and a shared understanding of what the problem is and how we want to describe these features. It’s about coming up with the right metaphor for the architecture of our system,” Solar-Lezama says. “It’s something that can be difficult to replicate by a machine. Our interfaces with these tools are still quite narrow compared to all the things that we can do when interacting with real colleagues.”
Creating better interfaces, which today are driven by prompt engineering, is integral for developer productivity in the long run. “If it takes longer to explain to the system all the things you want to do and all the details of what you want to do, then all you have is just programming by another name,” says Solar-Lezama.
Shreya Kumar, a software engineer and an associate teaching professor in computer science at the University of Notre Dame who was not involved in the research, echoes the sentiment. “The reason we have a programming language is because we need to be unambiguous. But right now, we’re trying to adjust the prompt [in a way] that the tool will be able to understand,” she says. “We’re adapting to the tool, so instead of the tool serving us, we’re serving the tool. And it is sometimes more work than just writing the code.”
As the study notes, one way to address the dilemma of human-AI interaction is for AI systems to learn to quantify uncertainty and communicate proactively, asking for clarification or more information when faced with vague instructions or unclear scenarios. Sen adds that AI models might also be “missing context that I have in my mind as a developer—hidden concepts that are embedded in the code but hard to decipher from it. And if I give any hint to the LLM about what is happening, it might actually make better progress.”

For Abhik Roychoudhury, a professor of computer science at the National University of Singapore who was also not involved in the research, a crucial aspect missing from the paper and from most AI-backed software development tools entails capturing user intent.
“A software engineer is doing a lot of thinking in understanding the intent of the code. This intent inference—what the program is trying to do, what the program is supposed to do, and the deviation between the two—is what helps in a lot of software engineering tasks. If this outlook can be brought in future AI offerings for software engineering, then it will get closer to what the software engineer does.”
Roychoudhury also assumes that many of the challenges identified in the paper are either being worked on now or “would be solved relatively quickly” due to the rapid pace of progress in AI for software engineering. Additionally, he believes that an agentic AI approach can help, viewing significant promise in AI agents for processing requirements specifications and ensuring they can be enforced at the code level.
“I feel the automation of software engineering via agents is probably irreversible. I would dare say that it is going to happen,” Roychoudhury says.
Sen is of the same view but looks beyond agentic AI initiatives. He pinpoints ideas such as evolutionary algorithms to enhance AI coding skills and projects like AlphaEvolve that employ genetic algorithms “to shuffle the solutions, pick the best ones, and then continue improving those solutions. We need to adopt a similar technology for coding agents, where the code is continuously improving in the background.”
However, Roychoudhury cautions that the bigger question lies in “whether you can trust the agent, and this issue of trust will be further exacerbated as more and more of the coding gets automated.”
That’s why human supervision remains vital. “There should be a check and verify process. If you want a trustworthy system, you do need to have humans in the loop,” says Notre Dame’s Kumar.
Solar-Lezama agrees. “I think it’s always going to be the case that we’re ultimately going to want to build software for people, and that means we have to figure out what it is we want to write,” he says. “In some ways, achieving full automation really means that we get to now work at a different level of abstraction.”
So while AI may become a “real coder” in the near future, Roychoudhury acknowledges that it probably won’t gain software developers’ complete trust as a team member, and thus might not be allowed to do its tasks fully autonomously. “That team dynamics—when an AI agent can become a member of the team, what kind of tasks will it be doing, and how the rest of the team will be interacting with the agent—is essentially where the human-AI boundary lies,” he says.
Rina Diane Caballar is a writer covering tech and its intersections with science, society, and the environment. An IEEE Spectrum Contributing Editor, she's a former software engineer based in Wellington, New Zealand.

source

ZoomYourWeb3

Why AI Isn’t Ready to Be a Real Coder – IEEE Spectrum

Contact Us

Quick Links