Academy
Updated on
Dec 7, 2024

Building AI Agents for Customer Support Using LangGraph, Llama 3.1, and ChromaDB

This project explores the steps involved in building an AI agent by leveraging LangGraph, Llama 3.1, Gemma-2-9B, and Vector Search.

Building AI Agents for Customer Support Using LangGraph, Llama 3.1, and ChromaDB
Ready to engage AI developers to build AI-powered products? Schedule a free consultation with our team today.

Introduction

In an era where customer expectations are at an all-time high, top businesses are turning to AI to deliver instant, efficient, and personalized support. AI agents can transform customer support by resolving queries, providing recommendations, and improving overall customer satisfaction.

This project explores the step-by-step process to build an AI agent by leveraging LangGraph, Llama 3.1, Gemma2-9B, and Vector Search. We’ll use datasets derived from three PDFs containing detailed information about iPhone, Xiaomi, and Samsung models, in order to showcase how we can build agentic recommendation systems for customer support. The AI agent we’ll create will be capable of analyzing user queries, routing them to the correct data source, and delivering precise, dynamic responses.

By combining the power of natural language understanding with semantic search and a graph-like agentic framework, this system will offer a scalable, efficient, and user-friendly solution that meets a wide range of business use-cases.

What Are AI Agents?

Before we dive into the tutorial, let’s take a brief look at what AI agents are.

AI agents are intelligent software programs that autonomously perform tasks or solve problems on your behalf. A simple way to understand an AI agent is with an example related to what we are building.

Suppose a customer support team for a smartphone marketplace frequently receives queries about the specifications, compatibility, and features of iPhone, Xiaomi, and Samsung devices. Currently, these queries would be handled manually by customer support executives, which can lead to delays, inconsistencies, and high operational costs.

Now, if we were to attack this problem statement with an AI agent, here’s how it can be done:

1. Customer Query Input: A customer types a query such as, "Does the Xiaomi Redmi Note 12 support 5G networks?" into the live chat or email system.

2. Perception: The AI agent receives the query and breaks it down using LLMs. It identifies key entities: "Xiaomi Redmi Note 12", “5G Networks”.

3. Reasoning: Once it understands the query, the agent connects to the relevant knowledge base stored in a vector search engine. The vector search system retrieves relevant data from a pre-indexed dataset containing PDFs of product specifications for iPhone, Xiaomi, and Samsung models.

4. Action: The AI agent processes the retrieved data, verifies compatibility, and crafts a response using an LLM: "Yes, the Xiaomi Redmi Note 12 supports 5G networks, with compatibility for both SA and NSA modes."

5. Interaction with CRM (optional): The agent also logs this interaction in the CRM system, updating the customer profile with their interest in 5G features and Xiaomi devices, which can help in building future personalized recommendations.

6. Continuous Learning: If the agent identifies any ambiguity in the query (e.g., "Does it support 5G?"), it asks clarifying questions or delegates it to a human operator to improve its understanding. The feedback received can be used to fine-tune the model.

The workflow outlined above could have been built using a frontend design with a search system, form fields, and buttons, where a human operator would have gone through the exact steps that the agent took. However, that would have been time consuming and would have involved manual labor. With an AI agent, we avoid these complexities, and reduce it down to a natural language-powered AI system. The agentic approach is also a lot more scalable.

How Are AI Agents Built?

AI agents leverage artificial intelligence techniques, such as large language models (LLMs) or vision language models (VLMs), and knowledge representation, to perceive their environment, reason about it, and take actions to achieve defined objectives.

Unlike traditional software programs that follow a rigid set of instructions, AI agents are adaptive, context-aware, and capable of making decisions based on their understanding of dynamic inputs. Therein lies their power: you can reduce one year of effort of building a UI-centric rigid workflow with an AI agent with a natural language interface, built in 1/4th the time.

You can think of an AI agent as a digital concierge for your business processes. It doesn't just execute pre-defined tasks; it learns, adapts, and improves over time. For example, in customer support, an AI agent doesn’t merely respond to FAQs—it understands the customer’s intent, retrieves relevant information, provides precise, actionable answers, and remembers the customer’s request to personalize future interactions.

At a high level, an AI agent operates in a cycle of perception, reasoning, and action:

  1. Perception: The agent gathers input from its environment, which could be user queries, historical data, or external APIs. In this tutorial, your AI agent will perceive user input through queries related to smartphones.
  2. Reasoning: Once the input is gathered, the agent uses its reasoning capabilities to interpret the data. This step might involve semantic understanding, decision-making based on predefined rules, or knowledge retrieval from a database or vector search engine like ChromaDB.
  3. Action: Based on its reasoning, the agent takes an action—whether that’s generating a response, performing a task, or initiating further workflows.

Characteristics of AI Agents

To build effective AI agents, you need to consider their core characteristics:

  1. Autonomy: The agent can operate independently without requiring constant human intervention.
  2. Proactiveness: It can anticipate needs and take initiative, such as suggesting solutions before the user explicitly asks for them.
  3. Learning Capability: Over time, you can build it in a way that it refines its behavior using machine learning or reinforcement learning techniques.
  4. Context Awareness: By leveraging LLMs and graph-based agentic frameworks, you can iteratively enhance its complexity over time.
  5. Interactivity: The agent communicates naturally with users, often using conversational AI to simulate human-like interactions.

Types of AI Agents

AI agents come in various forms depending on their application. For instance:

  1. Reactive Agents: These respond to specific triggers without maintaining any memory or learning from past interactions. For example, a chatbot that answers FAQs.
  2. Deliberative Agents: These use internal models to plan actions and solve problems, often leveraging frameworks like LangGraph for multi-hop reasoning.
  3. Collaborative Agents: These interact with other agents or humans to achieve shared goals. In customer support, this might mean coordinating with CRM systems to fetch personalized customer data.

Relevance to Customer Support

AI agents are particularly transformative in customer support due to their ability to:

  1. Handle Large Volumes of Queries: AI agents can process and respond to thousands of queries simultaneously, ensuring faster resolution times.
  2. Personalize Interactions: By using contextual data, such as user purchase history or preferences, they tailor responses to individual needs.
  3. Provide 24/7 Support: They never rest, ensuring your customers receive assistance anytime, anywhere.
  4. Reduce Operational Costs: Automating routine queries allows human agents to focus on complex or high-value tasks.

The Architecture for Building an AI Agent

In order to build this agent, we’ll use LangGraph, Llama 3.1, and ChromaDB (a vector search engine). There are other frameworks available as well, such as CrewAI, n8n, DAGWorks and Haystack. You can also program an agent without using any framework as long as you are able to handle the state changes and routing well. We’ll use LangGraph for its simplicity and flexibility.

What Is LangGraph?

LangGraph is a robust framework designed to streamline AI workflows. It enables the creation of complex decision trees. You can create agentic workflows using LangGraph where you integrate AI technologies like LLMs and vector stores into a cohesive pipeline. LangGraph provides an intuitive programmatic interface for mapping user queries to predefined paths, so that you can ensure accurate routing and processing.

The Dataset We’ll Use

Any AI agent you build will rely on comprehensive and well-structured datasets. Data quality is key, as they form the foundation upon which LLMs generate their response.

For the purpose of this guide, we’ll use a dataset that contains three PDFs containing information on:

  • iPhone Models: Detailed specifications and features of iPhones.
  • Xiaomi Models: Information about Xiaomi smartphones, including advanced features like Leica cameras.
  • Samsung Models: Data covering Samsung devices, such as their innovative display technologies.

Why Choose These Datasets?

The datasets will help us by providing quality data that we can retrieve at query time.

  • They will allow the system to provide brand-specific recommendations and comparisons.
  • By converting them into embeddings, we’ll ensure quick and accurate retrieval of relevant information.

Step-by-Step Guide to Create an AI Agent for Customer Support

Let’s start. We’ll first install the tools and libraries, prepare our dataset, insert them into ChromaDB, and then stitch together the agentic workflow.

Step 1 - Prerequisites: Tools and Libraries

The following tools power the AI agent:

  1. LangGraph: This framework is central to the workflow, and simplifies query classification and routing.
  2. Vector Store: ChromaDB is used to store and retrieve vector embeddings, enabling fast and precise similarity searches needed for retrieval.
  3. LLM: Llama 3.1, a powerful large language model by Meta, that handles natural language understanding and response generation.
! pip install streamlit chromadb llama-index-embeddings-huggingface langchain-core langchain-openai langgraph groq

Step 2 - Preparing the Dataset

Structuring the Product List

  1. Vectorization: Each phone model (iPhone, Xiaomi, Samsung) dataset is broken down into chunks and then converted into embeddings, allowing similarity searches later.some text
    • Each PDF is split into chunks of 500 characters.
    • There is a chunk overlap of 50 characters, so that the LLM response generation has context of the text surrounding the chunk.
    • The persist_directory stores the location of the ChromaDB vector store.
    • We’ll use the embedding model all-MiniLM-L6-v2 using HuggingFaceEmbeddings.
from langchain.vectorstores import Chroma
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import HuggingFaceEmbeddings 


embedding_function = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
chunk_size = 500
chunk_overlap = 60
splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
persist_directory = "/home/ml/projects/langgraph/chroma"

  1. Metadata Storage: Product details from the PDFs are indexed in the vector store (ChromaDB) for seamless retrieval.
  2. Using the splitter made earlier, we first load the pdf and then we split it using the splitter, then we store it in the vector store in the new collections named Samsung, iPhone, and Xiaomi respectively.

Loading the Samsung Dataset

loader = PyPDFLoader("/home/ml/projects/langgraph/Samsung Galaxy Models Overview.pdf")
samsung_docs = loader.load()
samsung_doc_splits = splitter.split_documents(samsung_docs)


samsung_vector_store = Chroma(
   collection_name="Samsung",
   embedding_function=embedding_function,
   persist_directory=persist_directory
)


ids = [f"doc_{i}" for i in range(len(samsung_doc_splits))]
samsung_vector_store.add_texts(
   texts=[doc.page_content for doc in samsung_doc_splits],
   ids=ids
)

Loading the iPhone Dataset

iphone_loader = PyPDFLoader("/home/ml/projects/langgraph/iPhone Models  Overview.pdf")
iphone_docs = iphone_loader.load()


iphone_doc_splits = splitter.split_documents(iphone_docs)


iphone_vector_store = Chroma(
   collection_name="iPhone",
   embedding_function=embedding_function,
   persist_directory=persist_directory
)


iphone_ids = [f"iphone_doc_{i}" for i in range(len(iphone_doc_splits))]
iphone_vector_store.add_texts(
   texts=[doc.page_content for doc in iphone_doc_splits],
   ids=iphone_ids
)

Loading the Xiaomi Dataset

xiaomi_loader = PyPDFLoader("/home/ml/projects/langgraph/Xiaomi Mi Series  Overview.pdf")
xiaomi_docs = xiaomi_loader.load()
xiaomi_doc_splits = splitter.split_documents(xiaomi_docs)


xiaomi_vector_store = Chroma(
   collection_name="Xiaomi",
   embedding_function=embedding_function,
   persist_directory=persist_directory
)
xiaomi_ids = [f"xiaomi_doc_{i}" for i in range(len(xiaomi_doc_splits))]
xiaomi_vector_store.add_texts(
   texts=[doc.page_content for doc in xiaomi_doc_splits],
   ids=xiaomi_ids
)

Step 3 - Retrieving Relevant Chunks

ChromaDB stores these vectorized chunks for fast and efficient semantic search. A dedicated function processes user queries by performing a semantic search within the stored vectors, retrieving the most relevant chunks based on the input query.

Let’s write a function that would retrieve the chunks from the iPhone dataset.

from langchain.schema import Document


def iphone_retrieve(state):
   iphone_vector_index = VectorStoreIndexWrapper(vectorstore=iphone_vector_store)


   iphone_retriever = iphone_vector_store.as_retriever()
   iphone_retriever_tool = iphone_retriever.as_tool(name="iPhone Retriever Tool", description="Fetch documents from iPhone collection")


   question = state["question"]


   documents = iphone_retriever_tool.invoke(question)
   return {"documents": documents, "question": question}

Next, let’s write the function to retrieve the Samsung dataset.

def samsung_retrieve(state):
   samsung_vector_index = VectorStoreIndexWrapper(vectorstore=samsung_vector_store)


   samsung_retriever = samsung_vector_store.as_retriever()
   samsung_retriever_tool = samsung_retriever.as_tool(name="Samsung Retriever Tool", description="Fetch documents from collection")




   question = state["question"]


   documents = samsung_retriever_tool.invoke(question)
   return {"documents": documents, "question": question}

Finally, let’s write the function to retrieve the Xiaomi dataset.

def xiaomi_retrieve(state):
   xiaomi_vector_index = VectorStoreIndexWrapper(vectorstore=xiaomi_vector_store)
   xiaomi_retriever = xiaomi_vector_store.as_retriever()
   xiaomi_retriever_tool = xiaomi_retriever.as_tool(name="Xiaomi Retriever Tool", description="Fetch documents from Xiaomi collection")


   question = state["question"]


   documents = xiaomi_retriever_tool.invoke(question)
   return {"documents": documents, "question": question}

Designing the AI Agent Workflow

Query Analysis

When the user query is presented, we’ll use the LLM to decide which route the query should take. To create this routing, we’ll use LangGraph.

Let’s look at how the query routing is done.

from langchain_groq import ChatGroq
import os
from typing import Literal
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field


class RouteQuery(BaseModel):
   """Route a user query to the most relevant datasource."""


   datasource: Literal["iphone_retrieve", "samsung_retrieve","xiaomi_retrieve"] = Field(
       ...,
       description="Given a user question choose to route it to iphone_retrieve or samsung_retrieve or xiaomi_retrieve.",
   )
structured_llm_router = llm.with_structured_output(RouteQuery)

Decision Routing

We’ll use a smaller LLM, Gemma 2-9B, to route the queries. Queries will be routed to one of the three nodes based on the topic:

  • iPhone Search Node: Fetches product specifications from the iPhone dataset.
  • Xiaomi Search Node: Fetches product specifications from the Xiaomi dataset.
  • Samsung Search Node: Fetches product specifications from the Samsung dataset.

For the code below to work, you would need to sign up to Groq (or any other model provider), and add the API key here. We could have also used Ollama or vLLM and deployed on cloud infrastructure like AWS, GCP, or Azure, or on-prem infrastructure (preferred approach for sovereign AI applications). For the sake of simplicity, let’s use Groq.

llm=ChatGroq(groq_api_key=groq_api_key, model_name="Gemma2-9b-It")
system = """You are an expert at routing a user question to the most relevant retrieval system.
The available retrieval systems are:
- `iphone_retrieve`: Contains documents about iPhone models and related information.
- `samsung_retrieve`: Contains documents about Samsung models and related information.
- `xiaomi_retrieve`: Contains documents about Xiaomi models and related information.


Route the question to the most relevant system based on the topic of the question. If the question does not pertain to any of these, respond with `None`."""


route_prompt = ChatPromptTemplate.from_messages(
   [
       ("system", system),
       ("human", "{question}"),
   ]
)


question_router = route_prompt | structured_llm_router
def route_question(state):


   question = state["question"]
   source = question_router.invoke({"question": question})
   return source.datasource

Creating the Agentic Workflow Graph

LangGraph ensures seamless integration of all the components, and provides a robust and scalable pipeline for query handling.

The way LangGraph works is — you create a graph-like routing for your agent’s workflow. The graph comprises nodes and edges. Let’s see what they are in this case.

  • Nodes: In this workflow, we use five nodes:
  1. iphone_retrieve: Retrieves relevant data from the iPhone dataset.
  2. xiaomi_retrieve: Fetches information from the Xiaomi dataset.
  3. samsung_retrieve: Accesses the Samsung dataset for relevant details.
  4. start: Marks the beginning of the workflow, initiating the query routing process.
  5. end: Concludes the workflow after the relevant data has been retrieved and processed.

These nodes work together within LangGraph to efficiently route queries and retrieve the necessary information.

  • Routing: The routing function directs user queries to the appropriate data retrieval nodes based on the query’s content. some text
    • The RouteQuery class defines the format, including the data source, to specify which retrieval function to use.
    • The Groq API leverages a smaller LLM to analyze the query and route it to the correct node (iPhone, Xiaomi, or Samsung).
    • LangGraph then ensures the query reaches the appropriate retrieval function, which fetches relevant chunks of data for response generation. This process ensures efficient and accurate handling of user queries.
  • Edges

Edges in the workflow define the flow of data between the nodes, enabling seamless query routing and processing. The six edges used in the code are:

  1. START → route_question: Connects the starting node to the routing function, which analyzes the user query and determines the appropriate retrieval node.
  2. route_question → iphone_retrieve: Directs the query to the iphone_retrieve node if the query pertains to iPhone models.
  3. route_question → xiaomi_retrieve: Routes the query to the xiaomi_retrieve node if the topic is related to Xiaomi smartphones.
  4. route_question → samsung_retrieve: Sends the query to the samsung_retrieve node for Samsung-related details.
  5. iphone_retrieve → END: Marks the completion of the workflow after retrieving iPhone data.
  6. xiaomi_retrieve/samsung_retrieve → END: Similarly, concludes the process once data retrieval from the Xiaomi or Samsung dataset is complete.
from typing import List
from typing_extensions import TypedDict
from langgraph.graph import END, StateGraph, START


class GraphState(TypedDict):
   question: str
   generation: str
   documents: List[str]


workflow = StateGraph(GraphState)


# Define the nodes
workflow.add_node("iphone_retrieve", iphone_retrieve)  # web search
workflow.add_node("samsung_retrieve", samsung_retrieve)  # retrieve
workflow.add_node("xiaomi_retrieve", xiaomi_retrieve)  # web search




# Build graph
workflow.add_conditional_edges(
   START,
   route_question,
   {
       "iphone_retrieve": "iphone_retrieve",
       "samsung_retrieve": "samsung_retrieve",
       "xiaomi_retrieve": "xiaomi_retrieve",
   },
)
workflow.add_edge( "xiaomi_retrieve", END)
workflow.add_edge( "samsung_retrieve", END)
workflow.add_edge( "iphone_retrieve", END)


# Compile
app = workflow.compile()

One powerful feature of LangGraph is that it allows you to visualize your agentic workflow graph. Here’s what’s going on in our simple agent system.

Dynamic Response Generation

Retrieving Relevant Chunks

We have defined a make_context function that processes user questions and sends them to the query workflow and then returns relevant chunks of context based on the node.

  • These chunks ensure that the system focuses only on the most contextually relevant parts of the PDFs.
def make_context(question):
   inputs = {
   "question": question
}
   for output in app.stream(inputs):
       for key, value in output.items():
           if 'documents' in value and value['documents']:
               page_content = value['documents']
   return page_content

Generating Human-Readable Responses

To generate responses, we’ll use the Llama 3.1 LLM, which processes the retrieved chunks to generate detailed and human-readable responses based on the data.

For example, when asked, “What makes the Xiaomi 13 camera unique?”, the system retrieves relevant chunks from the Xiaomi dataset and generates a clear explanation. It highlights key features such as the Leica-branded cameras, known for their professional-grade quality. This approach ensures that users receive concise, accurate, and easy-to-understand answers tailored to their queries.

from groq import Groq


groq_api = Groq(api_key=groq_api_key)
def respond(question):
   chat_completion = groq_api.chat.completions.create(
       messages=[
           {
               "role": "user",
               "content": f"This is the question asked by user {question} and the context given is {make_context(question)} answer this question based on the context provided",
           }
               ],
               model="llama-3.1-70b-versatile",
           )
  
   return chat_completion.choices[0].message.content

Looking at the Results

Our agent is now complete. Let’s look at the response to one query around Xiaomi 13.

Question: Tell me about xiaomi 13

You can test the agent with a number of queries around different phone models, and see how it responds.

In an actual production use-case, you might have data stored in SQL databases, key-value stores, or NoSQL databases. By adding nodes to the graph, you can easily construct more complex retrieval workflows, and use LLMs along the way to translate natural language queries into structured queries that work on various kinds of data.

Additionally, you could also use a vision language model (VLM), like Llama 3.2 or Amazon’s recently released Nova family of models, and allow the user to query with images. We’ll explain this approach in a future tutorial.

Deploying the AI Agent

How do you deploy AI agents? Depending on your use-case, you can consider multiple deployment options.

Deployment Options

  1. Cloud-Based: This is the most common scenario, and will work for most companies. You should ideally choose the cloud infrastructure which your company already uses, be it AWS, Azure, GCP, or any of the others. In a cloud setup, depending on your privacy requirements and per-token-cost model, you can decide whether you want to use an AI API from platforms like OpenAI, Cohere, Anthropic, or others, or if you should use open models, or newly launched models like Amazon’s Nova family on AWS.  
  2. On-Premises: If you operate in an industry where you have to adhere to strict data privacy laws and compliance requirements, and need to airgap your workflow, you should consider the on-prem approach. In such a scenario, you should consider deploying the LLM locally, and creating a system where you do not share sensitive customer data with AI platforms. The AI models chosen in such scenarios will typically be open models.

Monitoring and Maintenance

There are various ways to monitor and measure the performance of your agent. You can use a combination of evaluation frameworks like Ragas, and monitoring systems like Prometheus and Grafana.

Whichever system you choose, make sure that you measure the following:

  • Key metrics: Query response time, accuracy, and user satisfaction.
  • Regular updates to the vector store data to ensure that the latest phone models and features are incorporated.

You should also consider logging systems, like Datadog or Graylog, to log the agentic workflow steps. This will help you debug your system if the agent responses are not up to the mark.

Conclusion and Next Steps

In this guide, we explained what AI agents are, and demonstrated how they work by building a simple AI agent in the customer support domain. We’ve used LangGraph for workflow management, and ChromaDB for semantic search, and streamlined the tasks of processing datasets, routing queries, and generating detailed, human-readable responses.

At Superteams.ai, we work with a network of AI engineers to build agentic AI workflows for businesses that want to scale. If you’d like to build an AI agent for your product stack, feel free to connect with us.

Authors