Academy
Updated on
Nov 25, 2024

Step by Step Guide to Building RAG Applications Using DSPy and Llama3

Explore how to create advanced Retrieval-Augmented Generation (RAG) applications with DSPy, Qdrant and Llama 3

Step by Step Guide to Building RAG Applications Using DSPy and Llama3
We help you build teams with the top 1% of AI developers to harness the power of Generative AI for your business.

Imagine a world where applications understand your queries and provide the most relevant information instantly. This article explores creating sophisticated Retrieval-Augmented Generation (RAG) applications using DSPy and Llama 3. Discover how these cutting-edge technologies can change the way you interact with data, enabling seamless and highly accurate information retrieval.

Introduction

In this article, we will explore how to develop a Retrieval-Augmented Generation (RAG) chatbot application using DSPy, a self-reasoning framework, and Llama 3, an open-source language model.

Let’s try to understand “Why should you use DSPy instead of any other framework ?“

Why Should You Use DSPy ?

DSPy is an open-source Python framework that aims to prioritize programming over prompting during the development of Large Language Model applications.

On the other hand, frameworks like LangChain need to fine-tune the model properly by adding up prompts. These prompts may vary with a change in the language model. Moreover, prompting requires a lot of hit-and-trial to accomplish a task efficiently.

So, to solve these issues, Stanford scientists came up with a solution by developing DSPy. With its large framework, DSPy provides modules, signatures, metrics and optimizers to make the model more powerful and to eliminate sophisticated prompting from the code.

In this article, we will develop a pipeline with DSPy as the framework, Qdrant as the vector store database, and Llama 3 as the LLM model to create a RAG application efficiently. Side by side, we will also try to understand more about the workings of DSPy.

GitHub

Access the full code and implementation on GitHub.

Let’s Code

Setting Up the Environment

Let's start by installing the necessary libraries that we’ll be using in the project. 

pip install langchain
pip install langchain_community
pip install dspy-ai[qdrant]
pip install sentence_transformers
pip install pypdf

Data Preparation

We’ll be using a dataset which contains a pdf explaining in brief details about the important events that happened during the times of World War I and World War II.

You can access the dataset here.

We will first need to split the data into smaller chunks in order to perform vector operations over it; for that we are using RecursiveCharacterTextSplitter.

from langchain.text_splitter import RecursiveCharacterTextSplitter

r_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=20)

We also need to declare a variable doc_ids that will store all the ids assigned to each chunk of the data created.

from langchain.document_loaders import PyPDFLoader

document = PyPDFLoader("wordWarData.pdf")
loaded_doc = document.load_and_split(text_splitter = text_splitter )
splitted_doc = [doc.page_content for doc in loaded_doc]

doc_ids = list(range(1, len(loaded_doc) + 1))

Our next step will be to convert these small chunks of data into vector embeddings. For that, we will be using all-mpnet-base-v2 as the embedding model.

Note: If you are using a CPU, please change device to cpu.

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2", device='cuda')
vectors = model.encode(splitted_doc)

Now, since we have all the vectors of the data chunks, we can proceed to store them in the database.

Storing Vectors in the Database

It's time to store all our data embeddings in the database. We are using Qdrant as the vector DB in order to have fast and efficient retrieval of the data when needed.

Let's start by initiating the database.

from qdrant_client import QdrantClient
client = QdrantClient(":memory:")

Next, we need to create a collection in the database. For that, we will first check if we already have a collection with the same name; then we will delete that collection and create a new one in order to avoid any conflict in the data.

from qdrant_client.models import Distance, VectorParams

client.delete_collection(collection_name="cf_data")
client.create_collection(
    collection_name="cf_data",
    vectors_config=VectorParams(size=768, distance=Distance.COSINE),
)

It's time to put up all the vectors created along with their respective IDs into the vector database.

client.upload_collection(
    collection_name="cf_data",
    ids=doc_ids,
    vectors=vectors
)

We will be initiating a retriever model, which will help in retrieving the dataset that matches best with the data we stored in it.

from dspy.retrieve.qdrant_rm import QdrantRM
qdrant_retriever_model = QdrantRM("cf_data", client, k=3)

Initializing the Language Model

We will be needing a large language model to format and show the outputs in natural language. Here, we are employing Llama 3 as the LLM.

To learn more about the language models supported by DSPy, refer here.

import dspy
lm = dspy.OllamaLocal(model="llama3",timeout_s = 180)

Configuring the DSPy Module

It’s time to configure our no prompting only programming code with the help of the DSPy framework.

dspy.settings.configure(rm=qdrant_retriever_model, lm=lm)

Let’s begin by creating a function get_context to retrieve the best matching data chunk stored in the database; this provides the context to the LLM for better answering.

def get_context(text):
    query_vector = model.encode(text)

    hits = client.search(
        collection_name="cf_data",
        query_vector=query_vector,
        limit=3
    )

It's time to create our first signature in DSPy but, first, we should understand “What are signatures in DSPy?” 

Signatures in DSPy

A DSPy Signature is the most basic form of task description which simply requires inputs and outputs and, optionally, a small description about them and the task too.

                                                                               input -> output

There are, generally, two types of signatures in DSPy: inline signatures and class-based signatures.

Inline signatures can work with small tasks where more complex reasoning is not required but, for a RAG application, class-based signatures work best.

Let's try to build a simple class-based signature for our application.

class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often 1 to 5 word answer")
  • In order to clarify the nature of the task, we have put it up in docstrings.
  • To provide hints about what input might come up, we have put that as desc in the InputField.
  • To provide hints for what output is needed, we have put that as desc in the OutputField.

Moving forward, we need to build a module in DSPy, so let's understand: “What are modules in DSPy ?”

Modules in DSPy

A module in DSPy is like a building block for the programs that use Large Language Models. They are a generalized way to handle any DSPy Signature. These modules have a structure similar to neural networks in PyTorch.

Similar to signatures, DSPy modules can be categorized into two types: built-in modules and user-defined modules.

We are going to create a user-defined module which is built over the built-in modules. In our program, we have used Prediction and ChainOfThought as the built-in modules.

class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()


        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)


    def forward(self, question):
        context = get_context(question)
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

Now our basic model is ready to be put to work but, before that, we need to check out its accuracy. For that purpose, we need to take up a new concept, which is: Metrics in DSPy.

Metrics in DSPy

Metrics is an evaluation process to check how well the model is doing and if further optimization is required.

It takes up a sample question and sample answer as the input and tries to determine the answer with the help of the context provided to the LLM. Based on the similarity between the generated answer and the  actual provided answer, it gives a score.

But, to do all that, we will need some sample questions to evaluate the model. For this dataset, you can get the sample questions and answers here.

Now, in order to perform metrics evaluation, the data needs to be formatted in a specific way. Let's first do that.

import json

testdata = json.load(open("/content/testingData.json", "r"))['examples']
testset = [dspy.Example(question=e['question'], answer=e['answer']).with_inputs('question') for e in testdata]

We’ll have the data in this format.

Let's pass on this testset into the evaluation metrics to get the accuracy score of our model.

rag = RAG()
from dspy.evaluate.evaluate import Evaluate
evaluate_on_qa = Evaluate(devset=testset, num_threads=1, display_progress=True, display_table=17)

metric = dspy.evaluate.answer_exact_match
evaluate_on_qa(rag, metric=metric)

This will result in providing a context to what answers were predicted by the LLM and what accuracy score was achieved by the model.

This shows that the accuracy achieved by the model is about 41.2 % for the given set of questions.

But, what if we wanted a higher accuracy score? For that, we need to use the Optimizers (formerly, Teleprompters) in DSPy.

Optimizers in DSPy

A DSPy optimizer (also known as teleprompter) is an algorithm that can tune the parameters of a DSPy program, like the prompts which it has generated, or the Language Model weights, in order to achieve the highest possible accuracy.

There are various types of DSPy optimizers available depending on the use case and performance.

Here we are using BootstrapFewShot as the optimizer to fine-tune and optimize our model and, thereby, to achieve a higher accuracy score.

For this, we will be needing training data similar to the testing data we used above, which can be downloaded from here. So let's load and format the testing data.

import json
traindata = json.load(open("trainingData.json", "r"))['examples']
trainset = [dspy.Example(question=e['question'], answer=e['answer']).with_inputs('question') for e in traindata]

Let’s now initialize our optimizer and pass this traindata into it.

from dspy import teleprompt

def validate_context_and_answer(example, pred, trace=None):
    if pred.context is None:
        return False
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

teleprompter = teleprompt.BootstrapFewShot(metric=validate_context_and_answer)

compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

Once the optimization is done, we have our new model compiled_rag with updated weights and prompts.

Let’s again check for the accuracy of the model using the evaluation metrics to make sure that the accuracy has been increased. Repeat the same process as earlier just by changing  evaluate_on_qa(rag, metric=metric) to  evaluate_on_hotpotqa(compiled_rag, metric=metric).

Now we are getting the accuracy as:

After optimization, we have achieved an accuracy of about 52.9%. This can be increased more by increasing the amount of test data.

Testing the Model

Since the model is ready, let's test it on some questions manually. It can be done simply by passing a question as an argument in the compiled_rag().answer function.

compiled_rag("Who was the leader of France during its occupation by Germany in World War 2?").answer

Creating the Frontend

As the model is working fine over the passed question, we can proceed to develop the frontend of the app.

We will be using Streamlit to create the frontend in order to have a smart-looking, user-interactive web page. Let’s begin by defining a title  and icon for the web page.

st.set_page_config(page_title="DSPy RAG Chatbot",  page_icon=":robot_face:")

Let's add some images, a heading, and a sub-heading to the web app to give it an interactive look.

st.markdown("""
<div style="text-align: center;">
            <img src="https://dspy-docs.vercel.app/img/logo.png" alt="Chatbot Logo" width="100"/>
    <img src="https://img.freepik.com/premium-vector/robot-icon-chat-bot-sign-support-service-concept-chatbot-character-flat-style_41737-796.jpg?" alt="Chatbot Logo" width="200"/>
    <h1 style="color: #0078D7;">DSPy based RAG Chatbot</h1>
</div>
""", unsafe_allow_html=True)


st.markdown("""
<p style="text-align: center; font-size: 18px; color: #555;">
    Hello! Just ask me anything from the dataset.
</p>
""", unsafe_allow_html=True)

Let's create a divider to separate the header section from the input-output section.

st.markdown("<hr/>", unsafe_allow_html=True)

Now we'll create an input text box for our web app, which takes the queries from the user.

user_query = st.text_input("Enter your question:", placeholder="E.g., What is the aim of AI act?")

Lastly, we will set up a button for the machine to read the input query and pass it over to the model we created, and then get back the relevant answer to display.

if st.button("Answer"):
    bot_response = respond(user_query)
   
    st.markdown(f"""
    <div style="background-color: #f9f9f9; padding: 10px; border-radius: 5px; margin-top: 20px;">
        <h4 style="color: #0078D7;">Bot's Response:</h4>
        <p style="color: #333;">{bot_response}</p>
    </div>
    """, unsafe_allow_html=True)

Final Output

When a user puts up their query in the input box, the model initially converts the query into the respective vector embeddings. These are then matched with the data present in the database to find the best matches, and that data is further processed with an LLM and the prompts generated by DSPy to show up an interactive answer.

You can see the video demo here: 

https://youtu.be/uGZ5AiyemJ8?si=A72UmU19gLUKf3d2

Conclusion

In summary, this project illustrates the effectiveness of building RAG applications using DSPy and Llama 3. By following a step-by-step guide, we've demonstrated how to harness the capabilities of powerful tools like DSPy to develop efficient retrieval-augmented generation systems.

References

https://dspy-docs.vercel.app/api/local_language_model_clients/Ollama

https://qdrant.tech/documentation/frameworks/dspy/

https://dspy-docs.vercel.app/docs/building-blocks/optimizers

https://dspy-docs.vercel.app/docs/building-blocks/signatures

https://dspy-docs.vercel.app/docs/tutorials/rag

Authors