Effective Practices for Coding with a Chat-Based AI – infoq.com

Team ZYT Web3 / 4 weeks
July 6, 2025
0
16 min read

Welcome to the forefront of conversational AI as we explore the fascinating world of AI chatbots in our dedicated blog series. Discover the latest advancements, applications, and strategies that propel the evolution of chatbot technology. From enhancing customer interactions to streamlining business processes, these articles delve into the innovative ways artificial intelligence is shaping the landscape of automated conversational agents. Whether you’re a business owner, developer, or simply intrigued by the future of interactive technology, join us on this journey to unravel the transformative power and endless possibilities of AI chatbots.
A monthly overview of things you need to know as an architect or aspiring architect.
View an example

We protect your privacy.
Facilitating the Spread of Knowledge and Innovation in Professional Software Development

In this article, we explore how AI agents are reshaping software development and the impact they have on a developer’s workflow. We introduce a practical approach to staying in control while working with these tools by adopting key best practices from the discipline of software architecture, including defining an implementation plan, splitting tasks, and so on.
In this podcast, Michael Stiefel spoke with Rashmi Venugopal about two topics. The first is how the middle-tier creates the application from the raw materials in the back-end, and how the front-end uses the middle-tier to present a meaningful workflow to the user. The second is how to manage the usually inevitable software migration that results from a successful software product.
Denys Linkov shares lessons on preventing LLM production issues. He explains the flaws of single metrics, the importance of treating models as observable systems, building user-issue-alerting metrics, and focusing on business value. He emphasizes a "crawl, walk, run" approach to LLM metric maturity for successful, trust-building deployments.
In this podcast, Shane Hastie, Lead Editor for Culture & Methods, spoke to Natan Žabkar Nordberg about how effective leadership requires treating people as whole humans, giving trust first, implementing guided autonomy with clear boundaries, and building diverse teams through shared experiences.
AI images typically bloat from massive library installations and base OS components, with large Docker images slowing AI development and increasing costs. Chirag Agrawal demonstrates how to diagnose bloat using Docker’s history and the interactive ‘dive’ tool to examine each layer in detail. The article shows how effective diagnosis leads to targeted optimizations.
Learn practical solutions to today’s most pressing software challenges. Register now with early bird tickets.
Explore insights, real-world best practices and solutions in software development & leadership. Register now.
Learn how leading engineering teams run AI in production-reliably, securely, and at scale. Register now.
Understand emerging trends like advanced AI/ML integration, FinOps, modern security practices & team leadership. Join now.
InfoQ Homepage Articles Effective Practices for Coding with a Chat-Based AI
Jul 04, 2025 17 min read
by
Enrico Piccinin
reviewed by
Sergio De Simone
Since GitHub Copilot launched as a preview in Summer 2021, we have seen an explosion of coding assistant products. Initially used as code completion on steroids, some products in this space (like Cursor and Windsurf) rapidly moved towards agentic interactions. Here the assistant, triggered by prompts, autonomously performs actions such as modifying code files and running terminal commands.
Recently, GitHub Copilot added its own "agent mode" as a feature of the integrated chat, through which it is possible to ask an agent to perform various tasks on your behalf. GitHub Copilot "agent mode" is another example of the frantic evolution in the agentic space. This "agent mode" should not to be confused with GitHub Copilot "coding agent", which can be invoked from GitHub’s interfaces, like github.com or the GitHub CLI to work autonomously on GitHub issues.
In this article we want to take a look at what it means to use agents in software development and which kind of changes they bring in the developer’s workflow. To get a concrete feeling about how this new workflow could look like, we will use GitHub Copilot "agent mode" to build a simple Angular app which searches Wikipedia articles and shows the result as a list (see how to access GitHub Copilot agent in VSCode). Let’s call it the "Search wiki app".

The app we want to build
[Click here to expand image above to full-size]
We will first try to build the app in one shot, sending just one prompt to the agent. Next we will try to do the same with a more guided approach.
"Search wiki app" is pretty simple. What it is and what it does can be described in a relatively short prompt like the one below. Note that technical details about Angular are not necessary to illustrate the agent impact on the developer workflow. However, they remind us that even when working with agents, the developer must be aware of important technical details that should be provided when crafting a prompt to perform a task.
Our prompt to ask GitHub Copilot "agent" to build the entire app
GitHub Copilot "agent-mode" lets us choose the LLM model to use. The experiments we ran showed that the choice of the LLM engine is key. It is important to underline this concept to avoid the risk of considering LLMs as commodities whose differences are noteworthy only in nerdish discussions. Somehow, this belief may be even reinforced by GitHub Copilot allowing developers to select which LLM to use from a simple dropdown list. LLMs have different (and continuously evolving) capabilities, which translate into different costs and outcomes.
To prove this, we tried the same prompt with two different models: "Claude Sonnet 4" from Anthropic and "o4-mini (preview)" from OpenAI. Both are rightly regarded as powerful models, but their nature and capabilities are quite different. "Claude Sonnet 4" is a huge model, with over 150B parameters, specifically fine-tuned for coding, while "o4-mini (preview)" is a much smaller model, with 8B parameters, fine-tuned to be generalistic. Therefore it is not surprising that the results we got were very different, but this diversity is inherent to the present LLM landscape, so we better keep it into account.
Using "o4-mini (preview)" the GitHub Copilot agent has not been able to build a working feature. In fact, the first version had some errors that prevented compilation. Subsequently, We started a conversation with the agent, asking to correct the errors. After a few iterations, we stopped because errors continued to pop up and, more importantly, we were not able to easily understand how the solution was designed and the code was difficult to follow, even if we have a certain familiarity with Angular. For those curious, you can view the code produced with this experiment.
"Claude Sonnet 4" gave us totally different results. The code generated in the first iteration worked as expected, without any need for iterating or manual intervention. The design of the solution looked clean, modularized, and with a clear project folder structure.
We even asked it to generate an architectural diagram and the agent produced nice mermaid diagrams along with detailed explanations of the key elements of the design. For the curious, you can view the code produced in this experiment.

Partial view of the Data Flow diagrams generated by "Claude Sonnet 4"
Even if the Claude Sonnet 4-powered coding agent produced a good working solution and nice documentation, still my feeling was "I am not in control". For instance, to make sure the generated diagrams were accurate, I had to follow the code closely and cross-check it against both the diagrams and the generated documentation. In other words, to truly understand what the agent has done, we basically have to reverse-engineer the code and validate the generated documentation.
However, this should not be seen as an optional activity. In fact, it is essential because it helps us better understand what the agent has done, especially since we are ultimately responsible for the code, even if it was developed by AI.
We may say that this is not very much different from having a co-worker creating a diagram or documenting something for us. The whole point is trust. While working in teams, we tend to develop trust with some of our colleagues; outcomes from trusted colleagues are generally accepted as good. With agents and LLMs, trust is risky, given the hallucination problem that even the more sophisticated models continue to have. Hence, we need to check everything produced by AI.
To stay in the driver’s seat, let’s try a different approach: first we design the structure of the solution we want to build, then we draw an implementation plan, splitting the task into small steps. In other words, let’s start doing what a good old application architect would do. Only when our implementation plan is ready, we will ask the agent to perform each step, gradually building the app. If you would like, you can view the design of the solution and the implementation plan.
Since we want to be good architects, we have to define the best practices that we want our agent to follow. Best practices can be naming conventions, coding styles, patterns, and tools to use.
GitHub Copilot provides a convenient way to define such best practices through "instruction files" which are then automatically embedded in any message sent to the agent. We can even use generative AI, via standard chatbot like ChatGPT, to help us define a good list of best practices.
In this example we instructed the agent to write comprehensive tests, implement accessibility in all views, and write clear comments on the most complex pieces of code using the JSDoc standard. The results of these instructions have been very good.
For those curious, you can view the detailed instructions we used in this exercise.
An interesting side effect of defining best practices in instruction files is that they can then be treated as part of the project deliverables. These defined best practices can therefore be versioned in the project repo and shared among all developers. This mechanism helps enforce the use of centrally defined best practices across all contributors of a project, as long as they use agents to help them do the work.
Having defined the implementation plan and the best practices, it is time to build the app. This is the way we proceed:
We build our app step-by-step, always remaining in control of what is happening.

The workflow with the agent
Such an approach makes it possible to create a good quality app since the agent, controlled by our "instructions" file, typically follows many best practices. In our experience we have seen that:
In summary, we built a working app in four steps, using four prompts which all produced the expected result at the first attempt using "Claude Sonnet 4" as the LLM engine.

The Search wiki app
[Click here to expand image above to full-size]
For the curious, you can view a detailed description of each step, the prompt used and the code generated at each step.
As we already stated, model choice can make the difference. We tried the same guided approach with some other LLMs using the same sequence of prompts. With GPT-4.1, the agent generated a working app with almost no need for corrections (the only errors were wrong import paths), but with less quality. For instance, the app did not follow material design (as it should have per instructions) and did not handle the "enter key" event. Also, accessibility was not implemented on the first go.
With this guided approach we have created a fully working application, with comprehensive sets of tests and many more quality features, using just four prompts. This only took a couple hours of work at most, and probably less. This speed is quite impressive when compared to a traditional approach, where the developer would have to check many technical details, such as the Wikipedia API documentation or the latest Angular best practices guidelines, before writing all the code.
At the same time, it could be argued that we could have been even faster, asking the agent to build the entire app with just one prompt. The point is that we may have sacrificed some speed for the benefit of producing the solution that we want to build. In other words, although it may be faster to ask the agent to generate a complete application with just one prompt, (and an agent may be powerful enough to do it), we do not want to just create an app, we want to create our app, an app that we understand and that follows the design we want, because eventually we may have to maintain it and evolve it. Designing the structure of the app and drawing an implementation plan takes some time but guarantees also that the result is something under our control, manageable and maintainable. And everything is obtained at a much higher speed than with a traditional hand writing approach.
Agents can be a very powerful tool in the hands of developers, especially if powered by effective LLMs. They can speed up development, complete tasks that sometimes we leave behind, such as tests, and do all this following the guidelines and best practices we defined for them. But all this power comes with the risk of losing control. We may end up with working solutions which require time to be understood and we may be tempted just to trust them without maintaining due control. Sometimes generative AI hallucinates, which is a big risk. Even if we assume that the AI will never hallucinate, relying on a solution that we do not understand is risky.
To maintain control over what is created while leveraging the power of agents, we can adopt a workflow that mixes human architectural knowledge with an agent’s effectiveness. In this workflow, the design and planning phase is left in the hands of experienced human architects, who define the solution and the steps to reach it. The coding is delegated to the agent, which works fast and with good quality.
We believe that our experiment is brutally simple. We also believe that building a new greenfield app is very different from working on an existing and complex (often confused) code base. Still, the results are impressive and clearly show that we find our way to work together with agents better. This approach brings us efficiency and quality.
If we want to control what agents do for us, experience is key. Experience lets us design a good solution, plan an effective implementation, and gives us the judgment to check what AI has generated. How will we develop this experience in a world where agents do the heavy lifting in coding? Well, this is a different question, one that applies to many human intellectual activities in a world which has access to generative AI tools.
The answer to this question is probably still to be found and is, in any case, outside the scope of this short article.
In this section, we will review how to access the agent within VSCode.
The GitHub Copilot agent is integrated in the GitHub Copilot chat. From the chat, it is possible to select the agent mode as well as the LLM engine that the agent will use behind the scenes.

GitHub Copilot agent in VSCode
Using the chat, we can ask the agent to perform tasks for us. We will use the chat to build our "Search wiki app" as described earlier.
With an agent-based workflow, we first design the solution and then list the tasks that will bring us to the desired result (we define an implementation plan). This approach is what any good architect would do before starting to code.

The design of the "Search wiki app"
In the implementation plan we work bottom up, starting from the services connecting to the external systems and then building the views on top of them. For the sake of simplicity, the app implements state management.
So this is our plan to build the "Search wiki app":
The following are the implementation steps, with links to the codebase status at each step, and the prompts used for each of them.
1. Create WikiService
Code status after prompt execution of Step 1
2. Create WikiCard component
Code status after prompt execution of Step 2
3. Create the WikiListComponent
Code status after prompt execution of Step 3
4. Configure WikiList as the start page
Code status after prompt execution of Step 4
Below are the instructions that we defined and that the agent used throughout the exercise:
You are an expert Angular developer with extensive experience in Angular v19. While generating code, please follow these coding standards and best practices:
As per prompting best practices, those published recently by Anthropic for instance, instructions should start with a role definition, considering that these instructions end up playing a role similar to the "system prompt" present in many LLM APIs.
Looking at these instructions, we can see several guidelines which have actually turned into code generated by the agent:
There are also instructions for bash commands (e.g., "when running tests use always the command "npx ng test –browsers=ChromeHeadless –watch=false –code-coverage=false""). These kinds of instructions are also followed by the agent.
The agent definitely complies with the instructions we provide, which makes defining such guidelines extremely important if we want to enforce a certain set of rules. We should treat such instructions as first class project deliverables shared among all developers to be sure that a certain level of quality standardization is maintained.
One last note on "meta-prompting". Instructions provide guidelines and therefore they greatly depend on the type of project we have to deal with.
A pragmatic way to create our instructions is to start asking an LLM to generate an instruction file for us, providing the type of project we are working with, for instance a React front-end app, or a Go app. This approach is called "meta-prompting", which is creating a prompt through a prompt.
Once we have the starting point generated for us, we can customize it with the requirements of our specific project.

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

Join senior software developers for two days of real-world insights into today’s critical engineering priorities. With 25+ technical talks, uncover practical strategies for AI-native development, resilient and secure architectures, serverless adoption, platform engineering, and more.
Learn from active senior practitioners at companies like Deutsche Telekom, Shopify, DKB, AWS, Zalando, and more, sharing actionable solutions.
Register Now
InfoQ.com and all content copyright © 2006-2025 C4Media Inc.
Privacy Notice, Terms And Conditions, Cookie Policy

source

Which Countries Are the...

Elon Musk says Grok's...

ChatGPT dominates among AI...

OpenAI pulls ChatGPT feature...

Validation, loneliness, insecurity: Why...

From friendship to love,...

Effective Practices for Coding with a Chat-Based AI – infoq.com

Which Countries Are the Most Polite.

Elon Musk says Grok's latest feature.

ChatGPT dominates among AI chatbots: more.

OpenAI pulls ChatGPT feature that showed.