Learn how to select vector search or knowledge graph for your AI application
Extracting valuable insights from large volumes of unstructured data has long been a challenging task. As the article ‘Tapping the power of unstructured data’ from MIT Sloan explains, nearly 80%-90% of the world's data is unstructured - like PDF documents, online websites and blogs, video, audio, web server logs, customer reviews and support requests, social media, and more. Less than 18% of enterprises globally have been able to make use of this data, the rest of it remaining largely untapped.
With the emergence of large language models (LLMs) and vision language models (VLMs), we now have a breakthrough to help us build solutions that reveal insights hidden in this data. This is particularly powerful for SaaS companies in a wide range of industries, such as Fintech, Healthtech, Insurtech, ClimateTech, and others. A powerful architectural pattern that is used to build such data-centric applications is Retrieval-Augmented Generation (RAG), or its special form, GraphRAG.
If you want to build an RAG application, say, for the healthcare, finance, or ESG domain, which has specific jargon on the information you have in your document, which has an underlying knowledge structure, how should you architect your RAG system? Should you use a similarity search? Or should you leverage graph databases? What is the difference between the two techniques? Can you combine them?
In this article, we will explain how RAG systems work and discuss different approaches to RAG, comparing RAG systems powered by Vector Stores and Knowledge Graphs (GraphRAG). We will explain the pros and cons of both approaches and also showcase the outcome when you combine the two. This can help you decide which approach is useful for your use case.
Note: the article is technically deep and you should be familiar with programming. If you are looking to build such a system and are looking for guidance or teams, feel free to reach out to us.
LLMs have limitations on how many tokens they can handle in one go (otherwise known as token limit). They also suffer from the ‘lost in the middle’ problem, where information hidden in the middle of a long context is lost during generation. Additionally, the larger the context, the higher your token cost, so you should only provide the essential context needed for the generation step. To work around these limitations, architectural patterns like Retrieval-Augmented Generation (RAG) emerged, which use a retrieval module, like a Vector Store or Knowledge Graph (GraphRAG) to help provide accurate contextual information to the LLM.
RAG systems are typically made of three key phases: ingestion, retrieval, and generation. During the ingestion phase, data is converted into either vector embeddings or a knowledge graph and stored in a Vector Search Engine or a Graph Database. When the system is queried by a user, or through an API, the data is retrieved using semantic search (in the case of Vector Stores) or Cypher queries (in the case of Knowledge Graph). Finally, this data is presented to the LLM as context, for the final generation step.
However, not all RAG systems are created equal. The quality of your RAG system will depend heavily on the data at hand, and the retrieval module you use. To understand this better, let's look at how these two techniques differ.
Vector similarity search works by breaking documents into smaller chunks, each of which is converted into a high-dimensional vector representation using a pre-trained embedding model, such as Sentence Transformers or Cohere or OpenAI's embeddings. These embeddings capture the semantic meaning of the text, where similar chunks cluster close to each other in the vector space. This enables you to search and retrieve data similar to each other in semantic terms.
Once the embeddings are generated, they are stored in a Vector Search Engine (e.g., Qdrant, Pinecone, Weaviate, or Postgresql with pgvector), along with the text chunk they represent and any associated metadata or payload. This engine indexes the embeddings and allows you to retrieve the most semantically similar vectors to a given query efficiently. The retrieval phase involves generating an embedding for the user's query and performing a nearest-neighbor search in the vector database to find relevant chunks.
Let’s look at a simple example. Suppose we have a dataset of four football players.
Dataset:
The embeddings generated from the dataset would be something like:
These embeddings try to capture the features of the data.
Now, suppose you want to perform the following query:
Query: Find players similar to Kevin De Bruyne.
The query vector embedding: [0.85, 0.80, 0.90, 0.65, 0.60]
The vector search engine would compare the query vector against the stored embeddings using a similarity metric like Cosine Similarity. The engine would, essentially, compute how "close" Kevin De Bruyne's vector is to each stored player's vector.
The search engine will then retrieve the top matches based on similarity:
As you can see, this sort of scoring would have been difficult to achieve using keyword search or other naive search methodologies.
Since vector search works on similarity, it is excellent in unearthing pieces of information from your dataset that are semantically similar. For instance, if a 400-page annual report document contains information about the company’s ESG practices or plans, vector search is the technology to use.
However, in some scenarios where structured knowledge is important, a knowledge graph may be the right technique to use. Let’s see why.
A knowledge graph works differently from a vector search. In a knowledge graph, you organize information by linking things (called entities) through meaningful connections (called relationships). Entities can be anything, like concepts, ideas, events, or objects. The relationships act as a bridge between these entities, showing how they are connected. Additionally, you can attach properties (or metadata) with each node or relationship.
Knowledge graphs turn messy, unstructured data into a clear and structured format, using nodes (entities) and edges (relationships) to represent the data. When you want to retrieve information, you query a knowledge graph to find nodes and their relationships, and other interconnected nodes. You can even query and discover how far apart two nodes are.
Let’s explore this with an example. Suppose you have a paragraph of text from a sports magazine:
“Lionel Messi, a forward from Argentina, currently plays for Paris Saint-Germain (PSG) and has won multiple Ballon d'Or awards. Time and again he has showcased his exceptional skill as a playmaker and goal-scorer. Cristiano Ronaldo, a Portuguese forward, is also known for his incredible goal-scoring abilities and now represents Al Nassr after successful stints at clubs like Manchester United and Real Madrid. And let's not forget Kevin De Bruyne, the Belgian midfielder, who is celebrated for his vision and passing accuracy and plays for Manchester City in the English Premier League. Or, Virgil van Dijk, the Dutch defender, who is a key player for Liverpool and is known for his strength in tackling and aerial duels. Both Messi and Ronaldo have won the UEFA Champions League multiple times.”
Now, if you were to create a knowledge graph from this text, you would end up with nodes and edges like the following:
Now, if we were to query the graph to answer questions, those queries would be converted to a format that’s appropriate for graph retrieval and generate highly accurate responses:
As you can see, the knowledge graph approach uses relationships within data to unearth results. In domains where data is highly structured, knowledge graphs are extremely useful.
If you are building an RAG system where you need the system to reason over your data accurately, you should choose a knowledge graph as your retrieval module. These systems are known as GraphRAG systems.
When powered by vector search, for instance, the quality of your RAG system would depend heavily on how you chunk your information. When using knowledge graphs, you need to know how to convert your unstructured data into Cypher queries.
When building data-centric solutions using LLMs, we have learned that to get accuracy from your RAG system you have to understand your data and architect it in a way that makes the best use of your data. You should also choose the retrieval technique that works best for the data in hand.
To explain this further, we will use a medical domain dataset and build an RAG system using both the above paradigms. We will be using Neo4j as it supports both knowledge graph and vector similarity search. In the future, you can even use a similar approach to build agentic applications where your application chooses the right retrieval technique depending on the user query.
Let’s start. We will first create knowledge graph and vector embeddings, and demonstrate how to query both for results.
First, you need to log in and create an instance in Neo4j AuraDB to get credentials to connect with the database. Alternatively, you can install it using Docker locally. Here’s how to use the Docker approach:
docker run \
--publish=7474:7474 --publish=7687:7687 \
--volume=$HOME/neo4j/data:/data \
neo4j
Next, create a Python virtual environment, and then install and launch Jupyter Lab.
$ pip install jupyterlab
$ jupyter lab
Now, in your notebook, set the environment variables:
import os
NEO4J_URI = 'your_neo4j_uri'
NEO4J_USERNAME = 'your_neo4j_username'
NEO4J_PASSWORD = 'your_neo4j_password'
os.environ['OPENAI_API_KEY']='your_openai_secret_api_key'
Also, install the necessary libraries:
pip install neo4j-graphrag
Now, we need to import the GraphDatabase module from the Neo4j package and create a driver object with GraphDatabase.driver(), to connect to the Neo4j database.
import neo4j
driver = neo4j.GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))
In this example, we’ll use OpenAI’s GPT-4o-mini for convenience. It’s a fast and cost-effective model. The Neo4j-GraphRAG Python package is versatile and supports various LLMs, including models from OpenAI, Google VertexAI, Anthropic, Cohere, Azure OpenAI, local Ollama models, or any chat model compatible with LangChain. You can even create a custom interface for other LLMs if needed.
For the embedding model, we’ll use OpenAI’s default text-embedding-ada-002, but you can choose from other embedding options offered by different providers.
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.llm.openai_llm import OpenAILLM
llm = OpenAILLM(
model_name="gpt-4o-mini",
model_params={
"response_format": {"type": "json_object"}, # use json_object formatting for best results
"temperature": 0 # turning temperature down for more deterministic results
}
)
#create text embedder
embedder = OpenAIEmbeddings()
There are many ways to create knowledge graphs from very basic implementations using NLP techniques using modules like NLTK, networkX, etc to LLM-based techniques, where the LLM is used to unearth structured graphs from chunks of text (for example, LLMGraphTranformer from Langchain).
In this article, we will use the SimpleKGPipeline from Neo4j. The SimpleKGPipeline class makes it easy to build a knowledge graph automatically with just a few essential inputs.
from neo4j_graphrag.experimental.components.text_splitters.fixed_size_splitter import FixedSizeSplitter
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
kg_builder_pdf = SimpleKGPipeline(
llm=llm,
driver=driver,
text_splitter=FixedSizeSplitter(chunk_size=500, chunk_overlap=100),
embedder=embedder,
entities=node_labels,
relations=rel_types,
prompt_template=prompt_template,
from_pdf=True
)
In this setup, the kg_builder_pdf instance is configured to process PDF documents (from_pdf=True). It uses a FixedSizeSplitter to divide the text into manageable chunks of 500 characters, with an overlap of 100 for context preservation. By combining entity labels (node_labels), relationship types (rel_types), and a prompt template (prompt_template), this pipeline structures unstructured data into a knowledge graph efficiently and seamlessly.
For text splitter, you can write a custom function as well, refer to this for more information, here.
Although optional, adding a graph schema is strongly recommended to enhance the quality of your knowledge graph. It serves as a blueprint, defining the types of nodes and relationships to generate during entity extraction.
Here’s how you can do it (for a medical dataset):
#define node labels
basic_node_labels = ["Object", "Entity", "Group", "Person", "Organization", "Place","Protein","Symptoms","Disease"]
academic_node_labels = ["Article", "Publication","Paper"]
node_labels = basic_node_labels + academic_node_labels
# define relationship types
rel_types = ["ACTIVATES", "AFFECTS", "ASSESSES", "ASSOCIATED_WITH", "AUTHORED",
"CAUSES", "CITES", "CONTRIBUTES_TO", "DESCRIBES", "EXPRESSES",
"HAS_REACTION", "HAS_SYMPTOM", "INCLUDES", "INTERACTS_WITH",
"PRODUCES", "RECEIVED", "RESULTS_IN", "TREATS", "USED_FOR"]
We will include a custom prompt for entity extraction. Although the GraphRAG Python package provides a default internal prompt, tailoring a prompt to suit your specific use case can significantly improve the relevance and usability of the resulting knowledge graph. The prompt below was refined through some experimentation.
prompt_template = '''
You are a medical researcher tasks with extracting information from papers
and structuring it in a property graph to inform further medical and research Q&A.
Extract the entities (nodes) and specify their type from the following Input text.
Also extract the relationships between these nodes. the relationship direction goes from the start node to the end node.
Return result as JSON using the following format:
{{"nodes": [ {{"id": "0", "label": "the type of entity", "properties": {{"name": "name of entity" }} }}],
"relationships": [{{"type": "TYPE_OF_RELATIONSHIP", "start_node_id": "0", "end_node_id": "1", "properties": {{"details": "Description of the relationship"}} }}] }}
- Use only the information from the Input text. Do not add any additional information.
- If the input text is empty, return empty Json.
- Make sure to create as many nodes and relationships as needed to offer rich medical context for further research.
- An AI knowledge assistant must be able to read this graph and immediately understand the context to inform detailed research questions.
- Multiple documents will be ingested from different sources and we are using this property graph to connect information, so make sure entity types are fairly general.
Use only fhe following nodes and relationships (if provided):
{schema}
Assign a unique ID (string) to each node, and reuse it to define relationships.
Do respect the source and target node types for relationship and
the relationship direction.
Do not return any additional information other than the JSON in it.
Examples:
{examples}
Input text:
{text}
'''
We can now run the knowledge graph builder kg_builder_pdf function to process our PDF.
pdf_file_paths = ['medical_doc.pdf']
for path in pdf_file_paths:
print(f"Processing : {path}")
pdf_result = await kg_builder_pdf.run_async(file_path=path)
print(f"Result: {pdf_result}")
The result will display the number of nodes that were discovered and created - like this:
Processing : medical_doc.pdf
Result: run_id='9c003652-3088-4726-9600-890c49133f44' result={'resolver': {'number_of_nodes_to_resolve': 915, 'number_of_created_nodes': 568}}
Under the hood, the Entity Resolver component of the pipeline resolves the entities. The KG Writer component generates new nodes for each identified entity without attempting to determine their similarity. The Entity Resolver then enhances the knowledge graph by merging nodes that correspond to the same real-world entity.
This package implements a single resolver that merges nodes with the same label and identical “name” property.
We can also retrieve the schema of the knowledge graph that was created.
## Schema Retrieval
from neo4j import GraphDatabase
# Create a Neo4j driver instance
schema = ""
with driver.session() as session:
# Query for node properties
node_query = """
CALL db.schema.nodeTypeProperties() YIELD nodeType, propertyName, propertyTypes
RETURN nodeType AS label, collect(propertyName + ": " + apoc.text.join(propertyTypes, "|")) AS properties
"""
nodes = session.run(node_query)
node_properties = []
for record in nodes:
node_properties.append(f"{record['label']} {{{', '.join(record['properties'])}}}")
# Query for relationship properties
rel_query = """
CALL db.schema.relTypeProperties() YIELD relType, propertyName, propertyTypes
RETURN relType AS type, collect(propertyName + ": " + apoc.text.join(propertyTypes, "|")) AS properties
"""
relationships = session.run(rel_query)
relationship_properties = []
for record in relationships:
if record['properties']:
relationship_properties.append(f"{record['type']} {{{', '.join(record['properties'])}}}")
# Query for relationships between nodes
connection_query = """
MATCH (a)-[r]->(b)
RETURN DISTINCT labels(a)[0] AS source, type(r) AS relationship, labels(b)[0] AS target
"""
connections = session.run(connection_query)
relationship_connections = []
for record in connections:
relationship_connections.append(f"(:{record['source']})-[:{record['relationship']}]->(:{record['target']})")
# Format the final schema string
schema = "Node properties:\n" + "\n".join(node_properties) + "\n\n"
schema += "Relationship properties:\n" + "\n".join(relationship_properties) + "\n\n"
schema += "The relationships:\n" + "\n".join(relationship_connections)
driver.close()
print(schema)
After the knowledge graph is built you can visualize the graph in your workspace. Here are some useful Cypher queries to use to display the graph:
1. To view the entire graph -
MATCH p=()-->() RETURN p;
2. To visualize the Schema -
CALL db.schema.visualization();
3. To get properties of each node label -
MATCH (n) RETURN DISTINCT labels(n) AS NodeLabels, keys(n) AS Properties
4. To get properties for each relationship type -
MATCH ()-[r]->() RETURN DISTINCT type(r) AS RelationshipType, keys(r) AS Properties
We will now create the vector search index as well. We will set up the index with the Cosine similarity metric.
from neo4j_graphrag.indexes import create_vector_index
create_vector_index(driver, name="text_embeddings", label="Chunk",
embedding_property="embedding", dimensions=1536, similarity_fn="cosine")
We now have both the knowledge graph and the vector search index.
The GraphRAG Python package offers several powerful classes for retrieving data from your knowledge graph, including:
These retrievers empower you to implement various data retrieval strategies, enhancing the relevance and precision of your RAG pipelines.
In this article, we will focus on three strategies, Vector Retriever, Vector Cypher Retriever, and Text2cypher.
To perform a vector search, you have to first set up the vector_retriever.
from neo4j_graphrag.retrievers import VectorRetriever
vector_retriever = VectorRetriever(
driver,
index_name="text_embeddings",
embedder=embedder,
return_properties=["text"],
)
Now you can query it like this:
import json
vector_res = vector_retriever.get_search_results(
query_text = "What was XTT used for?",
top_k=3)
for i in vector_res.records: print("====\n" + json.dumps(i.data(), indent=4))
Alternatively, you can query the knowledge graph using text by using Text2CypherRetriever. Here’s how you can do it. Setup the pipeline first:
from neo4j_graphrag.retrievers import Text2CypherRetriever
Kgllm= OpenAILLM(model_name="gpt-4o-mini", model_params={"temperature": 0})
Graph_retriever = Text2CypherRetriever(
driver=driver,
llm= Kgllm,
neo4j_schema=schema,
# optionally, you can also provide your own prompt
# for the text2Cypher generation step
# custom_prompt="",
)
rag = GraphRAG(retriever=Graph_retriever, llm=Kgllm)
Now, you can query like this:
query_text = "What was the Sabouraud medium used for T asahii as given in the context?"
print(Graph_retriever.search(query_text=query_text))
This methodology combines both the vector search and knowledge graph retrieval tactics. You have to use the VectorCypherRetriever module.
from neo4j_graphrag.retrievers import VectorCypherRetriever
vc_retriever = VectorCypherRetriever(
driver,
index_name="text_embeddings",
embedder=embedder,
retrieval_query="""
//1) Go out 2-3 hops in the entity graph and get relationships
WITH node AS chunk
MATCH (chunk)<-[:FROM_CHUNK]-()-[relList:!FROM_CHUNK]-{1,2}()
UNWIND relList AS rel
//2) collect relationships and text chunks
WITH collect(DISTINCT chunk) AS chunks,
collect(DISTINCT rel) AS rels
//3) format and return context
RETURN '=== text ===\n' + apoc.text.join([c in chunks | c.text], '\n---\n') + '\n\n=== kg_rels ===\n' +
apoc.text.join([r in rels | startNode(r).name + ' - ' + type(r) + '(' + coalesce(r.details, '') + ')' + ' -> ' + endNode(r).name ], '\n---\n') AS info
"""
)
You can now use the retriever like this:
vc_res = vc_retriever.get_search_results(query_text = "What was XTT used for?", top_k=3)
You can now write a simple script to compare the results on your dataset. Here’s how:
from neo4j_graphrag.llm import OpenAILLM as LLM
from neo4j_graphrag.generation import RagTemplate
from neo4j_graphrag.generation.graphrag import GraphRAG
llm = LLM(model_name="gpt-4o", model_params={"temperature": 0.0})
rag_template = RagTemplate(template='''Answer the Question using the following Context. Only respond with information mentioned in the Context. Do not inject any speculative information not mentioned.
# Question:
{query_text}
# Context:
{context}
# Answer:
''', expected_inputs=['query_text', 'context'])
vector_rag = GraphRAG(llm=llm, retriever=vector_retriever, prompt_template=rag_template)
vector_graph_rag = GraphRAG(llm=llm, retriever=vc_retriever, prompt_template=rag_template)
Graphrag = GraphRAG(retriever=Graph_retriever, llm=llm)
We have used the 4o model and created an RAG system out of all three retrieval tactics. Depending on the dataset you use, you can compare and contrast the results, and decide which methodology works best for your use-case.
Here’s one of the queries we used to test:
q = "Briefly describe about T. asahii in bullet points"
Here’s the output on our dataset, each response in the respective column as a bullet point to the following query:
In this article, we explored and implemented an end-to-end RAG (Retrieval-Augmented Generation) system using three distinct retrieval approaches. With our dataset, you likely noticed that the Vector + Cypher query combination retrieved richer context compared to other methods. However, there’s no one-size-fits-all strategy for every use case. Your choice of retrieval method will depend on the dataset at hand.
This workflow also leads to an exciting idea that we are currently working on: an agentic retrieval strategy, which uses LLMs to intelligently analyze the complexity of a query and route it to the most effective retrieval approach—be it Cypher, Vectors, or a Hybrid methodology. By tailoring the retrieval strategy dynamically, we can unlock even more optimized and efficient results.
At Superteams.ai, we build vetted AI teams to help businesses incorporate advanced AI into their product stack or workflow. Our teams have helped a range of innovative technology companies push the frontier in their domain, from early-stage startups to publicly listed companies.
To schedule a free consultation with us, click here.