How to Build Your Own Google AI Chatbot Within 5 Minutes – Towards Data Science

Welcome to the forefront of conversational AI as we explore the fascinating world of AI chatbots in our dedicated blog series. Discover the latest advancements, applications, and strategies that propel the evolution of chatbot technology. From enhancing customer interactions to streamlining business processes, these articles delve into the innovative ways artificial intelligence is shaping the landscape of automated conversational agents. Whether you’re a business owner, developer, or simply intrigued by the future of interactive technology, join us on this journey to unravel the transformative power and endless possibilities of AI chatbots.
Publish AI, ML & data-science insights to a global community of data professionals.
Fully utilising the power of Google LLM and your private knowledge
Authors: Selina Li, Tianyi Li
· The Problem · The Solution · Use Case · Step by Step Guide ∘ Step 1: Environment Setup ∘ Step 2: Prepare Private Knowledge and store them into Google Cloud Storage (low code) ∘ Step 3: Create Chatbot and the Data Store sitting behind the Chatbot (no code) ∘ Step 4: Test the Chatbot (no code) ∘ Step 5: Publish / Integrate your Chatbot (low code) ∘ Step 6 (Optional): Publish it through a Beautiful Application (low code) · What makes this "magic"? · Some Observations · Wrap-up · Enjoyed This Story?
You might have been familiar with AI chats powered by Large Language Model (LLM) such as OpenAI ChatGPT or Google Bard. And you might have noticed one thing – these LLMs have extensive general knowledge about the world, but might not give you satisfactory answers when you ask about a very specific or professional area, especially if the knowledge of this area is not that publicly available or sharable.
Have you thought about "giving" your private knowledge to LLM and creating your own Chatbot?
Do you know this can be done within 5 minutes with no code or low code?
The end product will be like this:
During the Asia Pacific Google Cloud Applied AI Summit, Alan Blount from Google shared an interesting idea of achieving this using Google Cloud Vertex AI Search and Conversation, which I found pretty attractive to try out.
The idea is simple, first put a corpus of private knowledge documents onto Google Cloud Storage:
then create a Data Store, and import the documents from the Cloud Storage into the Data Store:
finally plug that Data Store into Dialogflow CX:
then we are done!
We can test Chatbot like this:
And if we want to publish it through a beautiful application, Google provides a public git repo for a Chat App that we can utilise. With a bit of coding knowledge, we can plug the link of the Dialogflow Chatbot into the Chat App, and customize the interface like this:
OR this:
In this case, assuming I am the owner of an ecommerce website. I would like to create a Chatbot, so my users can ask specific questions regarding anything about this website (price, product, service, shipping, etc.) as they are in the store. The Chatbot will be supplied with the "private knowledge" and ground its answers to the contents of the website.
Given I am not actually owning an ecommerce website, I will take a workaround to crawl contents from an existing website available on the Internet. This is tricky because most websites are anti-scraping as specified in their terms of use, and it could be illegal to scrape ecommerce websites such as Amazon, eBay, Alibaba, etc.
ChatGPT provided me with a perfect option –
Books to Scrape (https://books.toscrape.com/). A simulated bookstore specifically designed for web scraping practice. It offers a straightforward structure for scraping book details like title, price, and rating.
In this use case, I would assume I am the owner of this Books to Scrape website, and create the Chatbot based on it.
This might look a bit lengthy at first glance because it covers every detailed step that you will need. Once you have run through, you can get the same done within 5 minutes.
The tool we are going to use is sitting on Google Vertex AI and we will need a Google Cloud Platform (GCP) account.
Google has a free-tier program to provide new Google Cloud Platform (GCP) users with a 90-day trial period that includes $300 as free Cloud Billing credits.
Follow the tutorial here to set up the free Google Cloud account.
After you have set up Google Cloud account and can access the console, create a storage bucket (step-by-step guide here) for the next step use.
As mentioned above, the private knowledge in this case will be the contents sitting on the book store website.
For owners of ecommerce websites, all you need to do is to provide the website URLs, and Google can automatically crawl website content from a list of domains you define.
Given I am not a real owner, I will resolve this by scrawling. Alan Blount from Google provided a very useful notebook to achieve this. All the code snippet does is to scrawl webpages from the website that you specified and store them in a Google Cloud Storage bucket that you specified.
This is all you need to do:
2.1 Save a copy of the notebook in your own drive
Recall that in step 2 you have created a new Google account when you registered for Google Cloud? Your Google account will have Google Drive and you can save a copy of this notebook to your drive.
Select "Save a copy in Drive" option from the dropdown menu of "File"
Then if you go to Google Drive, you will be able to see the notebook you created. Feel free to rename it according to your need.
2.2 On your own notebook, locate the below and specifywebsite_url
refers to the website page URL that you would like to scrawl.storage_bucket
refers to the Google Cloud Storage that you created in above step 1.metadata_filename
refers to a json file that will be created and stored together with the webpages. You might want to make it relevant to your website by changing applied_ai_summit_flutter_search
to something that can describe your use case.
This is my version:
2.3 Run all
2.4 When it prompts you to authenticate the Google Colab notebook to access your Google credentials, click "Allow" -> "Continue"
Then the script should run through and show the progress of the scrawling at the bottom, just like this:
And if you refer to your Google Cloud storage bucket, you will see these html files get scrawled and stored properly within your bucket:
One thing to notice is that the code snippet is not designed for every use case, and you might need some slight tuning of the codes to achieve your goal.
For example, in my case, I tuned the code a bit by changing
into
By default the html_string
will be uploaded as text/plain
. By changing into text/html
, I would like to enable this HTML contents to show properly in a later stage.
You can tune the code as much as you like.
Go to Google Cloud Console (https://console.cloud.google.com/) and type "search and conversation" as the service:
Create "NEW APP":
Select "Chat":
Provide your "Company name" and "Agent name". Note that the "Agent name" here will be the name of the Chatbot, you might want to put a good name for your users.
At this "Data" page, select "CREATE NEW DATA STORE":
For owners of ecommerce websites, select "Website URLs" and provision your website URLs
As I have scrawled the website contents into Cloud Storage, we can select "Cloud Storage" here:
Specify the Cloud Storage bucket name, and select "Unstructured documents" in below:
Give your data store a name, then "CREATE"
You will see your data store listed, then "CREATE"
Your data store will be created as below
If you click into it, you will see your data store is "processing data" by importing documents from the Cloud Storage bucket that we specified earlier:
If we click the "ACTIVITY" tab, we can see the import is in progress:
Import will take minutes to hours depending on the number of documents in your Cloud Storage bucket.
In my case, I have over 1,000 files and it finishes within minutes.
After import is completed, the status as highlighted has changed:
And if you switch back to the "DOCUMENTS" tab, you will see the list of files imported into the data store:
That means you’ve got all the materials and you are ready to cook!
In step 3 above, we have already created a Chatbot app as well as the data store sitting behind it.
Click "Apps" on the top:
You will see the Chatbot you created in the previous step 3:
If you click into the Chatbot name, you will be directed to the Dialogflow CX page like below:
To test the Chatbot, select "Test Agent" in the right up corner:
And the dialogue box will pop up:
You can start the conversation by saying "hi" and start asking questions to the Chatbot:
It works!
If you are happy with the Chatbot, it is easy to integrate it with your web application
Go to the left pane, select "Manage" -> "Integrations" -> "Dialogflow Messenger"
You can choose the type of API and UI style according to your needs
For demo purpose, I selected "Unauthenticated API" as API and "Pop-out" as UI style:
After selecting "Done", a code snippet in HTML will be generated in the next page as below:
You may copy the code snippet and easily paste it into your applications for integration.
For demo purpose, I copy paste this HTML snippet into JSFiddle and run it, then I am getting my little Chatbot working as shown in the right down corner!
In case you don’t have an application yet and you want to have one, Google provides a good starting point through a public git repository Chat App.
This is a Chatbot Application written in Node.js and you can easily adapt it for your own use by changing the code snippets a bit within chat-app/src/routes/+page.svelte
.
You will need to change the project-id
, agent-id
and chat-title
into yours.
And once you run/deploy the app, you will get the web UI like this:
Of course you can change the appearance of the UI as you like.
Now you can have your own application!
Recalling the solution design that we mentioned at the beginning. This looks a bit magic as you can get your own LLM powered Chatbot by simply supplying your private knowledge to a Google Cloud Storage bucket.
This is achieved as Google has done quite a bit of integrations behind the scene, by integrating Vertex AI platform with the chatbot agent service Dialogflow CX, and coming up with a new abstraction called Vertex AI Conversation (formerly Gen App Builder). This new abstraction also supports Search and Recommend, and the full name of this service is "Vertex AI Search and Conversation".
As we can see, this new abstraction of "Vertex AI Search and Conversation" is sitting on top of Vertex AI which has orchestrated a bunch of foundation models, and gets "augmented" by user-supplied updated real world information, so it can contextualize its responses to these information.
The integration is great as it can help at least two groups of people –
Imagine you are a traditional Chatbot builder using Dialogflow CX, you are creating pages, intents and routes to route customer intentions to the corresponding page. Basically you are defining "if customer say this then I respond with this" which is a bit hard-coding. Now Google plugs in Vertex AI which can utilise LLM models (e.g. text-bison, gemini) to generate agent responses and control conversation flow in a much smarter way. This can significantly reduce agent design time and improve agent quality.
On the other hand, image you are exploring the power of LLMs and Generative AI but not sure what to do with it. This Vertex AI Conversation feature can enable you to easily build and launch your own Chatbot applications quickly and make them available for real use case. This can significantly shorten the go-to-market time of LLM and GenAI solutions.
Despite the "magic" as it appears, we observed several things worth sharing with developers who are considering use this "Vertex AI Search and Conversation" feature.
Our gut feeling is this is a new product Google brought in by "integrating" several existing tools and is still working towards making it better. It lacks clarity how the integration happens behind the scene, and how developers can best understand and configure it.
I got our chatbot very quickly but once I started looking at how to fine tune it, it took me quite a bit of time to figure out how Dialogflow CX works, what is "generator" and how it works. At this moment I’m still confused why this Chatbot works so great without me even configuring any "generator" as described in Google doc, and whether/how we can make it better by using "generator".
Some other observations during the development:
In the above use case, I assume I am an online book store owner and created a Chatbot based on my e-commerce website contents in HTML.
Similarly, you can supply "private knowledge" in the format of blogs, files (e.g. PDF, HTML, TXT) and all kinds of websites to the Google Cloud Storage, and create your own Chatbot.
This enables individuals / businesses to fully utilize the power of the Google LLMs (text-bison, gemini, etc.) and augment it with private knowledge, and create own Chatbots in a very quick manner.
This marks the end of this article. Hope you find it helpful!
(PS: I am working on a video to make this step-by-step guide easier to follow. Will share if I get it done in near future.)
Selina Li (Selina Li, LinkedIn) is a Principal Data Engineer working at Officeworks in Melbourne Australia. Selina is passionate about AI/ML, data engineering and investment.
Jason Li (Tianyi Li, LinkedIn) is a Full-stack Developer working at Mindset Health in Melbourne Australia. Jason is passionate about AI, front-end development and space related technologies.
Selina and Jason would love to explore technologies to help people achieve their goals.
Unless otherwise noted, all images are by the authors.
Written By
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Feature engineering, structuring unstructured data, and lead scoring
Learn how we use LLM Agents to improve and customise transactions in a chatbot!
Explore the wisdom of LSTM leading into xLSTMs - a probable competition to the present-day LLMs
Building a Chatbot with Rasa – part IV
Hallucination metrics for open-domain and closed-domain question answering
Deliberately Exploring Design Decisions for Parameter Efficient Finetuning (PEFT) with LoRA
Conversational assistance solutions increasingly include sentiment analysis functionality. What does this mean? Is it relevant?…
Your home for data science and Al. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.