Build a contextual chatbot application using Amazon Bedrock Knowledge Bases – Amazon.com

Team ZYT Web3 / 5 days
June 29, 2025
0
10 min read

Welcome to the forefront of conversational AI as we explore the fascinating world of AI chatbots in our dedicated blog series. Discover the latest advancements, applications, and strategies that propel the evolution of chatbot technology. From enhancing customer interactions to streamlining business processes, these articles delve into the innovative ways artificial intelligence is shaping the landscape of automated conversational agents. Whether you’re a business owner, developer, or simply intrigued by the future of interactive technology, join us on this journey to unravel the transformative power and endless possibilities of AI chatbots.
May 2024: This post was reviewed and updated in to provide the chatbot application’s infrastructure as code using the AWS CDK.
Modern chatbots can serve as digital agents, providing a new avenue for delivering 24/7 customer service and support across many industries. Their popularity stems from the ability to respond to customer inquiries in real time and handle multiple queries simultaneously in different languages. Chatbots also offer valuable data-driven insights into customer behavior while scaling effortlessly as the user base grows; therefore, they present a cost-effective solution for engaging customers. Chatbots are able to use the advanced natural language capabilities of large language models (LLMs) to respond to customer questions. They can understand conversational language and respond naturally. However, chatbots that merely answer basic questions have limited utility. To become trusted advisors, chatbots need to provide thoughtful, tailored responses that can help the end-user fulfill a task.
One way to enable more contextual conversations is by linking the chatbot to internal knowledge bases and information systems. Integrating proprietary enterprise data from internal knowledge bases enables chatbots to contextualize their responses to each user’s individual needs and interests. For example, a chatbot could suggest products that match a shopper’s preferences and past purchases, explain details in language adapted to the user’s level of expertise, or provide account support by accessing the customer’s specific records. The ability to intelligently incorporate information, understand natural language, and provide customized replies in a conversational flow allows chatbots to deliver real business value across diverse use cases.
The popular architecture pattern of Retrieval Augmented Generation (RAG) is often used to augment user query context and responses. RAG combines the capabilities of LLMs with the grounding in facts and real-world knowledge that comes from retrieving relevant texts and passages from a corpus of data. These retrieved texts are then used to inform and ground the output, reducing hallucination and improving relevance.
In this post, we illustrate contextually enhancing a chatbot by using Amazon Bedrock Knowledge Bases, a fully managed serverless service. Amazon Bedrock Knowledge Bases allows your chatbot to provide more relevant, personalized responses by linking user queries to related information data points. Amazon Bedrock Knowledge Bases securely connects foundation models (FMs) to internal company data sources for RAG, to deliver more relevant and accurate responses. All the information retrieved from Amazon Bedrock Knowledge Bases is provided with citations to improve transparency and minimize hallucinations. For this post, we use the Amazon letters to shareholders dataset to develop this solution.
RAG is an approach to natural language generation that incorporates information retrieval into the generation process. RAG architecture involves two key workflows: data preprocessing through ingestion, and text generation using enhanced context.
The data ingestion workflow uses LLMs to create embedding vectors that represent semantic meaning of texts. Embeddings are created for documents and user questions. The document embeddings are split into chunks and stored as indexes in a vector database. The text generation workflow then takes a question’s embedding vector and uses it to retrieve the most similar document chunks based on vector similarity. It augments prompts with these relevant chunks to generate an answer using the LLM. For more details, refer to the Primer on Retrieval Augmented Generation, Embeddings, and Vector Databases section in Preview – Connect Foundation Models to Your Company Data Sources with Agents for Amazon Bedrock.
The following diagram illustrates the high-level RAG architecture.

Although the RAG architecture has many advantages, it involves multiple components, including a database, retrieval mechanism, prompt, and generative model. Managing these interdependent parts can introduce complexities in system development and deployment. The integration of retrieval and generation also requires additional engineering effort and computational resources. Some open source libraries provide wrappers to reduce this overhead; however, changes to libraries can introduce errors and add additional overhead of versioning. Even with open source libraries, significant effort is required to write code, determine optimal chunk size, generate embeddings, and more. This setup work alone can take weeks depending on data volume.
Therefore, a managed solution that handles these undifferentiated tasks could streamline and accelerate the process of implementing and managing RAG applications.
Amazon Bedrock Knowledge Bases is a serverless option to build powerful conversational artificial intelligence (AI) systems using RAG. It offers fully managed data ingestion and text generation workflows.
For data ingestion, Amazon Bedrock provides the StartIngestionJob API to start an ingestion job. It handles creating, storing, managing, and updating text embeddings of document data in the vector database automatically. It splits the documents into manageable chunks for efficient retrieval. The chunks are then converted to embeddings and written to a vector index, while allowing you to see the source documents when answering a question.
For text generation, Amazon Bedrock provides the RetrieveAndGenerate API to create embeddings of user queries, and retrieves relevant chunks from the vector database to generate accurate responses. It also supports source attribution and short-term memory needed for RAG applications.
This enables you to focus on your core business applications and removes the undifferentiated heavy lifting.
The solution presented in this post uses a chatbot application built using the following solution architecture.

This architecture workflow includes the following steps:
To set up this solution, complete the following prerequisites:
An AWS Cloud9 integrated development environment (IDE) comes pre-installed with the AWS CLI and AWS CDK tools.
The solution presented in this post is available in the following GitHub repo. You need to clone the GitHub repository to your local machine. Open a terminal window and run the following command (this is a single git clone command):
Complete the following steps to deploy the solution:
Provide a chatbot client IP address that is allowed to access the API Gateway in CIDR format as part of the allowedip context variable.
When the deployment is complete, use the outputs as shown in the following screenshot and note the API Gateway URL and DocsBucketName values:

The chatbot application backend deployed a knowledge base and S3 data source using resources from the AWS Generative AI Constructs Library for Amazon Bedrock. The AWS Generative AI Constructs Library is an open source extension of the AWS CDK that provides multi-service, well-architected patterns for quickly defining solutions in code to create predictable and repeatable infrastructure, called constructs. The goal of the AWS Generative AI CDK Constructs Library is to help developers build generative AI solutions using pattern-based definitions for their architecture.
We download the dataset for our knowledge base and upload it into a S3 bucket. This dataset will feed and power knowledge base. Complete the following steps:
To test your chatbot application, complete the following steps:

The following table includes some sample questions and related knowledge base responses. Try out some of these questions by using prompts.
During the first call to the Lambda function, the RetrieveAndGenerate API returns a sessionId, which is then passed by the React app along with the subsequent user prompt as an input to the RetrieveAndGenerate API to continue the conversation in the same session. The RetrieveAndGenerate API manages the short-term memory and uses the chat history as long as the same sessionId is passed as an input in the successive calls.
Congratulations, you have successfully created and tested a chatbot application using Amazon Bedrock Knowledge Bases.
Failing to delete resources such as the S3 bucket, OpenSearch Serverless collection, and knowledge base will incur charges. To clean up these resources, run the following command from the project’s folder called amazon-bedrock-rag/backend:
In this post, we provided an overview of contextual chatbots and explained why they’re important. We described the complexities involved in data ingestion and text generation workflows for a RAG architecture. We then introduced how Amazon Bedrock Knowledge Bases creates a fully managed serverless RAG system, including a vector store. Finally, we provided a solution architecture and sample code in a GitHub repo to retrieve and generate contextual responses for a chatbot application using a knowledge base.
By explaining the value of contextual chatbots, the challenges of RAG systems, and how Amazon Bedrock Knowledge Bases addresses those challenges, this post aimed to showcase how Amazon Bedrock enables you to build sophisticated conversational AI applications with minimal effort.
For more information, see the Amazon Bedrock Developer Guide and Knowledge Base APIs.
Manish Chugh is a Principal Solutions Architect at AWS based in San Francisco, CA. He specializes in machine learning and generative AI. He works with organizations ranging from large enterprises to early-stage startups on problems related to machine learning. His role involves helping these organizations architect scalable, secure, and cost-effective workloads on AWS. He regularly presents at AWS conferences and other partner events. Outside of work, he enjoys hiking on East Bay trails, road biking, and watching (and playing) cricket.
Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.
Pallavi Nargund is a Principal Solutions Architect at AWS. In her role as a cloud technology enabler, she works with customers to understand their goals and challenges, and give prescriptive guidance to achieve their objective with AWS offerings. She is passionate about women in technology and is a core member of Women in AI/ML at Amazon. She speaks at internal and external conferences such as AWS re:Invent, AWS Summits, and webinars. Outside of work she enjoys volunteering, gardening, cycling and hiking.
Anand Komandooru is a Principal Cloud Architect at AWS. He joined AWS Professional Services organization in 2021 and helps customers build cloud-native applications on AWS cloud. He has over 20 years of experience building software and his favorite Amazon leadership principle is “Leaders are right a lot.“
Fabiano Meneses is a Principal Cloud Application Architect with AWS Professional Services. He is a highly passionate IT professional with over 25 years of international experience in designing and implementing solutions to deliver business outcomes for customers. His current focus is building cloud-native distributed systems, with a keen interest in serverless technologies.
View Comments