This project explores the steps involved in building an AI agent by leveraging LangGraph, Llama 3.1, Gemma-2-9B, and Vector Search.
In an era where customer expectations are at an all-time high, top businesses are turning to AI to deliver instant, efficient, and personalized support. AI agents can transform customer support by resolving queries, providing recommendations, and improving overall customer satisfaction.
This project explores the step-by-step process to build an AI agent by leveraging LangGraph, Llama 3.1, Gemma2-9B, and Vector Search. We’ll use datasets derived from three PDFs containing detailed information about iPhone, Xiaomi, and Samsung models, in order to showcase how we can build agentic recommendation systems for customer support. The AI agent we’ll create will be capable of analyzing user queries, routing them to the correct data source, and delivering precise, dynamic responses.
By combining the power of natural language understanding with semantic search and a graph-like agentic framework, this system will offer a scalable, efficient, and user-friendly solution that meets a wide range of business use-cases.
Before we dive into the tutorial, let’s take a brief look at what AI agents are.
AI agents are intelligent software programs that autonomously perform tasks or solve problems on your behalf. A simple way to understand an AI agent is with an example related to what we are building.
Suppose a customer support team for a smartphone marketplace frequently receives queries about the specifications, compatibility, and features of iPhone, Xiaomi, and Samsung devices. Currently, these queries would be handled manually by customer support executives, which can lead to delays, inconsistencies, and high operational costs.
Now, if we were to attack this problem statement with an AI agent, here’s how it can be done:
1. Customer Query Input: A customer types a query such as, "Does the Xiaomi Redmi Note 12 support 5G networks?" into the live chat or email system.
2. Perception: The AI agent receives the query and breaks it down using LLMs. It identifies key entities: "Xiaomi Redmi Note 12", “5G Networks”.
3. Reasoning: Once it understands the query, the agent connects to the relevant knowledge base stored in a vector search engine. The vector search system retrieves relevant data from a pre-indexed dataset containing PDFs of product specifications for iPhone, Xiaomi, and Samsung models.
4. Action: The AI agent processes the retrieved data, verifies compatibility, and crafts a response using an LLM: "Yes, the Xiaomi Redmi Note 12 supports 5G networks, with compatibility for both SA and NSA modes."
5. Interaction with CRM (optional): The agent also logs this interaction in the CRM system, updating the customer profile with their interest in 5G features and Xiaomi devices, which can help in building future personalized recommendations.
6. Continuous Learning: If the agent identifies any ambiguity in the query (e.g., "Does it support 5G?"), it asks clarifying questions or delegates it to a human operator to improve its understanding. The feedback received can be used to fine-tune the model.
The workflow outlined above could have been built using a frontend design with a search system, form fields, and buttons, where a human operator would have gone through the exact steps that the agent took. However, that would have been time consuming and would have involved manual labor. With an AI agent, we avoid these complexities, and reduce it down to a natural language-powered AI system. The agentic approach is also a lot more scalable.
AI agents leverage artificial intelligence techniques, such as large language models (LLMs) or vision language models (VLMs), and knowledge representation, to perceive their environment, reason about it, and take actions to achieve defined objectives.
Unlike traditional software programs that follow a rigid set of instructions, AI agents are adaptive, context-aware, and capable of making decisions based on their understanding of dynamic inputs. Therein lies their power: you can reduce one year of effort of building a UI-centric rigid workflow with an AI agent with a natural language interface, built in 1/4th the time.
You can think of an AI agent as a digital concierge for your business processes. It doesn't just execute pre-defined tasks; it learns, adapts, and improves over time. For example, in customer support, an AI agent doesn’t merely respond to FAQs—it understands the customer’s intent, retrieves relevant information, provides precise, actionable answers, and remembers the customer’s request to personalize future interactions.
At a high level, an AI agent operates in a cycle of perception, reasoning, and action:
To build effective AI agents, you need to consider their core characteristics:
AI agents come in various forms depending on their application. For instance:
AI agents are particularly transformative in customer support due to their ability to:
In order to build this agent, we’ll use LangGraph, Llama 3.1, and ChromaDB (a vector search engine). There are other frameworks available as well, such as CrewAI, n8n, DAGWorks and Haystack. You can also program an agent without using any framework as long as you are able to handle the state changes and routing well. We’ll use LangGraph for its simplicity and flexibility.
LangGraph is a robust framework designed to streamline AI workflows. It enables the creation of complex decision trees. You can create agentic workflows using LangGraph where you integrate AI technologies like LLMs and vector stores into a cohesive pipeline. LangGraph provides an intuitive programmatic interface for mapping user queries to predefined paths, so that you can ensure accurate routing and processing.
Any AI agent you build will rely on comprehensive and well-structured datasets. Data quality is key, as they form the foundation upon which LLMs generate their response.
For the purpose of this guide, we’ll use a dataset that contains three PDFs containing information on:
The datasets will help us by providing quality data that we can retrieve at query time.
Let’s start. We’ll first install the tools and libraries, prepare our dataset, insert them into ChromaDB, and then stitch together the agentic workflow.
The following tools power the AI agent:
! pip install streamlit chromadb llama-index-embeddings-huggingface langchain-core langchain-openai langgraph groq
from langchain.vectorstores import Chroma
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import HuggingFaceEmbeddings
embedding_function = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
chunk_size = 500
chunk_overlap = 60
splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
persist_directory = "/home/ml/projects/langgraph/chroma"
Loading the Samsung Dataset
loader = PyPDFLoader("/home/ml/projects/langgraph/Samsung Galaxy Models Overview.pdf")
samsung_docs = loader.load()
samsung_doc_splits = splitter.split_documents(samsung_docs)
samsung_vector_store = Chroma(
collection_name="Samsung",
embedding_function=embedding_function,
persist_directory=persist_directory
)
ids = [f"doc_{i}" for i in range(len(samsung_doc_splits))]
samsung_vector_store.add_texts(
texts=[doc.page_content for doc in samsung_doc_splits],
ids=ids
)
Loading the iPhone Dataset
iphone_loader = PyPDFLoader("/home/ml/projects/langgraph/iPhone Models Overview.pdf")
iphone_docs = iphone_loader.load()
iphone_doc_splits = splitter.split_documents(iphone_docs)
iphone_vector_store = Chroma(
collection_name="iPhone",
embedding_function=embedding_function,
persist_directory=persist_directory
)
iphone_ids = [f"iphone_doc_{i}" for i in range(len(iphone_doc_splits))]
iphone_vector_store.add_texts(
texts=[doc.page_content for doc in iphone_doc_splits],
ids=iphone_ids
)
Loading the Xiaomi Dataset
xiaomi_loader = PyPDFLoader("/home/ml/projects/langgraph/Xiaomi Mi Series Overview.pdf")
xiaomi_docs = xiaomi_loader.load()
xiaomi_doc_splits = splitter.split_documents(xiaomi_docs)
xiaomi_vector_store = Chroma(
collection_name="Xiaomi",
embedding_function=embedding_function,
persist_directory=persist_directory
)
xiaomi_ids = [f"xiaomi_doc_{i}" for i in range(len(xiaomi_doc_splits))]
xiaomi_vector_store.add_texts(
texts=[doc.page_content for doc in xiaomi_doc_splits],
ids=xiaomi_ids
)
ChromaDB stores these vectorized chunks for fast and efficient semantic search. A dedicated function processes user queries by performing a semantic search within the stored vectors, retrieving the most relevant chunks based on the input query.
Let’s write a function that would retrieve the chunks from the iPhone dataset.
from langchain.schema import Document
def iphone_retrieve(state):
iphone_vector_index = VectorStoreIndexWrapper(vectorstore=iphone_vector_store)
iphone_retriever = iphone_vector_store.as_retriever()
iphone_retriever_tool = iphone_retriever.as_tool(name="iPhone Retriever Tool", description="Fetch documents from iPhone collection")
question = state["question"]
documents = iphone_retriever_tool.invoke(question)
return {"documents": documents, "question": question}
Next, let’s write the function to retrieve the Samsung dataset.
def samsung_retrieve(state):
samsung_vector_index = VectorStoreIndexWrapper(vectorstore=samsung_vector_store)
samsung_retriever = samsung_vector_store.as_retriever()
samsung_retriever_tool = samsung_retriever.as_tool(name="Samsung Retriever Tool", description="Fetch documents from collection")
question = state["question"]
documents = samsung_retriever_tool.invoke(question)
return {"documents": documents, "question": question}
Finally, let’s write the function to retrieve the Xiaomi dataset.
def xiaomi_retrieve(state):
xiaomi_vector_index = VectorStoreIndexWrapper(vectorstore=xiaomi_vector_store)
xiaomi_retriever = xiaomi_vector_store.as_retriever()
xiaomi_retriever_tool = xiaomi_retriever.as_tool(name="Xiaomi Retriever Tool", description="Fetch documents from Xiaomi collection")
question = state["question"]
documents = xiaomi_retriever_tool.invoke(question)
return {"documents": documents, "question": question}
When the user query is presented, we’ll use the LLM to decide which route the query should take. To create this routing, we’ll use LangGraph.
Let’s look at how the query routing is done.
from langchain_groq import ChatGroq
import os
from typing import Literal
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
class RouteQuery(BaseModel):
"""Route a user query to the most relevant datasource."""
datasource: Literal["iphone_retrieve", "samsung_retrieve","xiaomi_retrieve"] = Field(
...,
description="Given a user question choose to route it to iphone_retrieve or samsung_retrieve or xiaomi_retrieve.",
)
structured_llm_router = llm.with_structured_output(RouteQuery)
We’ll use a smaller LLM, Gemma 2-9B, to route the queries. Queries will be routed to one of the three nodes based on the topic:
For the code below to work, you would need to sign up to Groq (or any other model provider), and add the API key here. We could have also used Ollama or vLLM and deployed on cloud infrastructure like AWS, GCP, or Azure, or on-prem infrastructure (preferred approach for sovereign AI applications). For the sake of simplicity, let’s use Groq.
llm=ChatGroq(groq_api_key=groq_api_key, model_name="Gemma2-9b-It")
system = """You are an expert at routing a user question to the most relevant retrieval system.
The available retrieval systems are:
- `iphone_retrieve`: Contains documents about iPhone models and related information.
- `samsung_retrieve`: Contains documents about Samsung models and related information.
- `xiaomi_retrieve`: Contains documents about Xiaomi models and related information.
Route the question to the most relevant system based on the topic of the question. If the question does not pertain to any of these, respond with `None`."""
route_prompt = ChatPromptTemplate.from_messages(
[
("system", system),
("human", "{question}"),
]
)
question_router = route_prompt | structured_llm_router
def route_question(state):
question = state["question"]
source = question_router.invoke({"question": question})
return source.datasource
LangGraph ensures seamless integration of all the components, and provides a robust and scalable pipeline for query handling.
The way LangGraph works is — you create a graph-like routing for your agent’s workflow. The graph comprises nodes and edges. Let’s see what they are in this case.
These nodes work together within LangGraph to efficiently route queries and retrieve the necessary information.
Edges in the workflow define the flow of data between the nodes, enabling seamless query routing and processing. The six edges used in the code are:
from typing import List
from typing_extensions import TypedDict
from langgraph.graph import END, StateGraph, START
class GraphState(TypedDict):
question: str
generation: str
documents: List[str]
workflow = StateGraph(GraphState)
# Define the nodes
workflow.add_node("iphone_retrieve", iphone_retrieve) # web search
workflow.add_node("samsung_retrieve", samsung_retrieve) # retrieve
workflow.add_node("xiaomi_retrieve", xiaomi_retrieve) # web search
# Build graph
workflow.add_conditional_edges(
START,
route_question,
{
"iphone_retrieve": "iphone_retrieve",
"samsung_retrieve": "samsung_retrieve",
"xiaomi_retrieve": "xiaomi_retrieve",
},
)
workflow.add_edge( "xiaomi_retrieve", END)
workflow.add_edge( "samsung_retrieve", END)
workflow.add_edge( "iphone_retrieve", END)
# Compile
app = workflow.compile()
One powerful feature of LangGraph is that it allows you to visualize your agentic workflow graph. Here’s what’s going on in our simple agent system.
We have defined a make_context function that processes user questions and sends them to the query workflow and then returns relevant chunks of context based on the node.
def make_context(question):
inputs = {
"question": question
}
for output in app.stream(inputs):
for key, value in output.items():
if 'documents' in value and value['documents']:
page_content = value['documents']
return page_content
To generate responses, we’ll use the Llama 3.1 LLM, which processes the retrieved chunks to generate detailed and human-readable responses based on the data.
For example, when asked, “What makes the Xiaomi 13 camera unique?”, the system retrieves relevant chunks from the Xiaomi dataset and generates a clear explanation. It highlights key features such as the Leica-branded cameras, known for their professional-grade quality. This approach ensures that users receive concise, accurate, and easy-to-understand answers tailored to their queries.
from groq import Groq
groq_api = Groq(api_key=groq_api_key)
def respond(question):
chat_completion = groq_api.chat.completions.create(
messages=[
{
"role": "user",
"content": f"This is the question asked by user {question} and the context given is {make_context(question)} answer this question based on the context provided",
}
],
model="llama-3.1-70b-versatile",
)
return chat_completion.choices[0].message.content
Our agent is now complete. Let’s look at the response to one query around Xiaomi 13.
Question: Tell me about xiaomi 13
You can test the agent with a number of queries around different phone models, and see how it responds.
In an actual production use-case, you might have data stored in SQL databases, key-value stores, or NoSQL databases. By adding nodes to the graph, you can easily construct more complex retrieval workflows, and use LLMs along the way to translate natural language queries into structured queries that work on various kinds of data.
Additionally, you could also use a vision language model (VLM), like Llama 3.2 or Amazon’s recently released Nova family of models, and allow the user to query with images. We’ll explain this approach in a future tutorial.
How do you deploy AI agents? Depending on your use-case, you can consider multiple deployment options.
There are various ways to monitor and measure the performance of your agent. You can use a combination of evaluation frameworks like Ragas, and monitoring systems like Prometheus and Grafana.
Whichever system you choose, make sure that you measure the following:
You should also consider logging systems, like Datadog or Graylog, to log the agentic workflow steps. This will help you debug your system if the agent responses are not up to the mark.
In this guide, we explained what AI agents are, and demonstrated how they work by building a simple AI agent in the customer support domain. We’ve used LangGraph for workflow management, and ChromaDB for semantic search, and streamlined the tasks of processing datasets, routing queries, and generating detailed, human-readable responses.
At Superteams.ai, we work with a network of AI engineers to build agentic AI workflows for businesses that want to scale. If you’d like to build an AI agent for your product stack, feel free to connect with us.