When we think of knowledge graphs, we might simplify it to the equation: Knowledge + Graphs. But there's so much more to it!
This hands-on blog will cover the basics of knowledge graphs, defining key terms and exploring the related tech stack. We’ll then examine how these graphs can help solve crimes and unsolved mysteries, serving as a modern detective’s tool for investigative reports.
Key Learnings from the Blog:
1. Understanding what knowledge graphs are and how they work
2. Tools and technology stack for implementation
3. Introduction to Neo4j as a graph database
4. Implementing GraphRAG using Langchain and Neo4j
5. The role of knowledge graphs and RAG in investigation reports
If you want to check out the code and examples used in this post, you can find it here.
What is a Knowledge Graph ?
A knowledge graph or a semantic network, is basically a fancy way of organising information from different places using a graph like structure.
Often used in data science and data mining. the objective is to represent real world entities and the relationship between them using a graph for better visualisation and thus gain valuable insights.
Remember the family trees we used to draw as kids? Knowledge graphs are quite similar to that.
Let’s take an example. I have always wanted to understand the family hierarchy in the Harry Potter movies and how each character is related (though I’m clearly not a potterhead).
After some research, I came across a website that explains the Potter family hierarchy in detail. While they provided a drawn family tree, I wanted to create our own using Knowledge Graphs (KGs) along with the code we’ll explore later in this blog.
So, I input the 'Family Members' section into the model, instructing it to highlight all the entities, along with their properties and the relationships each entity has with one another, in order to gain insights.
Below is the output visualisation.
Well, this looks stunning, doesn't it? That’s the magic of Knowledge Graphs.
Whether dealing with structured or unstructured data, once we pass it to the model, we can easily obtain some pretty cool insights at a glance.
Mind you, the more data we feed in, the more detailed graph is generated. The quality of the graph also depends on the language model used, in this case, we utilised the GPT-4 model.
You might think, all this is good, but how does it work? Get to the technical aspect, Shivam! I hear you. Let’s dive deep into how this work in the next section.
How Knowledge Graphs Work?
Knowledge graphs typically consist of three main components. These are,
- Nodes: Represents entities, such as people, places, or products. Each node is assigned one or more labels to define its type and can possess attributes (properties) that provide additional details. For example, in a movie KG, nodes could represent actors, screenwriters, and movie titles.
- Relationships: These link two nodes and indicate how they are related. Like nodes, each relationship has a label to identify its type and may also have properties. For instance, a relationship could capture which actor starred in a specific movie.
- Edges and Labels: Edges are the links that connect the nodes, while labels are attributes that define the relationships between the nodes and the reasoning rules associated with the edges.
Semantic enrichment is the process of emphasising the main components, i.e, nodes, edges, and labels in the creation of a knowledge graph.
When we input data (structured or unstructured) semantic enrichment highlights these key components within the text. This process enables knowledge graphs to identify individual objects and understand the relationships between them.
Once a knowledge graph is complete, it empowers question-answering and search systems to retrieve and provide comprehensive answers to queries.
What is Neo4j and why this?
Neo4j is a leading open-source graph database management system used to store, manage, and query data represented as graphs.
Unlike traditional relational databases that use tables and rows, Neo4j utilizes a graph model composed of nodes, relationships, and properties. This structure is ideal for applications where relationships and connections between data points are essential, making it particularly useful for various domains, including social networks, recommendation systems, and, importantly, knowledge graphs.
We will be using Neo4j for this blog.
To follow along with this guide, we recommend creating an account here if you haven't done so already.
Once your account is created, click on Create Instance and select the free package. Please be patient, as it may take some time to set up. You might see a password on the screen, be sure to note it down along with the connection URL, which will look something like this,
Keep a record of the Neo4j URL, username, and password, as you will need them for future connections and visualisations of the knowledge graph.
What is GraphRAG ?
Before setting up the environment and getting our hands dirty with some code, lets first touch what GraphRAG is and what makes it different from the conventional RAG.
GraphRAG is an advanced extension of Retrieval Augmented Generation (RAG) that incorporates knowledge graphs (KGs) to enhance the effectiveness and precision of LLM-based (Large Language Model) applications.
While traditional RAG uses a vector database to retrieve information semantically similar to a query, it faces challenges with complex questions that require multi-step reasoning or connecting different data points. GraphRAG was developed to solve these issues by using the strengths of knowledge graphs, which structure information based on entity relationships.
Traditional RAG vs Graph RAG
In a baseline RAG system:
- A vector database retrieves semantically similar documents or chunks based on a user query.
- An LLM uses this retrieved information to generate an answer.
This setup generally works well for straightforward queries but has limitations:
- It struggles with multi-hop reasoning (questions requiring multiple inference steps).
- It is less effective with queries that require a deep understanding of relationships between disparate data pieces.
For example, answering "What was the first movie directed by the student of the person who directed Pulp Fiction?" is difficult with standard RAG, as it only retrieves context based on similarity without understanding the complex relationships within the query.
Whereas GraphRAG builds on the RAG framework by adding knowledge graphs that model complex relationships. Here’s how it works:
1. Indexing: The input text corpus is divided into small segments, and the system identifies entities, relationships, and claims within each segment.
2. Querying: Two distinct workflows handle global and local queries.
a. Global Search uses summaries of relationships to answer broad questions related to the whole dataset.
b. Local Search zeroes in on specific entities, retrieving their relationships and relevant contextual information, making it more precise for questions about a particular topic or person.
We will rely on LangChain in this guide, which handles all the implementation for us behind the scenes.
Setting Up the Development Environment
Since we have our Neo4j setup ready, let’s move on to connecting this instance with the knowledge graph. But first, let’s learn how to build one.
Knowledge graphs are straightforward, they help reveal relationships among different nodes, providing useful insights. We’ll use the Langchain framework here, which provides wrapper functions around Neo4j for creating knowledge graphs. I’ll be running all code in Google Colab.
Ensure you have a running Neo4j instance and that it’s active on the Neo4j site.
Once initialized, we’ll use Langchain’s Neo4j wrapper to set up the graph, passing in the URL, username, and password.
Setting the enhanced_schema attribute to True allows for a clearer display of the graph schema after adding the knowledge graph.
Now, decide on a language model, I chose Claude 3.5 Sonnet for its good reasoning and intelligence.
This is followed up by a basic initialisation of the large language model.Let’s now proceed to the next section that deals with the building of the knowledge graph.
Building a Knowledge Graph for Investigative Analysis
One of the most impactful uses of a Knowledge Graph (KG) is in investigative work, where it can serve as a powerful tool for detectives and analysts to uncover connections and gain insights that might otherwise be hidden in complex, interconnected data.
Whether it’s in crime-solving, fraud detection, network analysis, or even cybersecurity, KGs offer a unique way of visually representing relationships between entities, making it easier to see patterns, link evidence, and uncover leads.
- In investigations, especially those involving large datasets or complex criminal networks, the connections between people, locations, events, or assets may be vast.
- A KG visually maps out these relationships, enabling detectives to easily follow paths between different nodes (entities) and discover links that were previously unrecognized.
- Often, investigators have isolated pieces of information like names or transaction records. On their own, these records might not mean much.
- However, when connected in a knowledge graph, they create a bigger picture, unraveling more insights that lead to better decision-making.
- By combining KGs with tools like GraphRAG, investigators can ask more complex questions and get answers quickly.
- Instead of sifting through reports manually, a detective might ask, “Who are all the people connected to suspect X in the last six months?” GraphRAG can pull up those connections, providing valuable leads and insights in a fraction of the time.
By organizing and visualizing relationships, KGs provide a clearer view of the information at hand, making it easier to solve cases and uncover crucial insights.Keeping this in mind, we dedicated this implementation entirely to discovering insights from all the previous unsolved cases, hoping to gain a different perspective on the situation.
ALRIGHT!
For this demonstration, I chose the unsolved Sheena Bora murder case as our example. However, feel free to experiment with other cases. I took the input from the Wikipedia page on the case.Start by creating a new crime.txt file and pasting the content there. Then, read the content from the file and convert it into a Document format.
This conversion is essential to break down the total content into manageable chunks, ensuring it fits within the context window of the language model (LLM). We achieve this using LangChain's RecursiveCharacterTextSplitter function.
This function allows us to specify two key parameters:
1. chunk_size: The maximum number of characters (in this case, 2000) that can be included in a single chunk. This limit ensures we stay within the bounds of what the language model can handle effectively.
2. chunk_overlap: Setting this to 200 means that each chunk will overlap with the next by 200 characters. When splitting text, important information can often lie at the boundaries. By overlapping chunks, we ensure that the language model doesn’t miss out on relevant details that might bridge the gap between two chunks.
We can confirm each chunk by printing it out and reviewing the output,
This might produce an output like,
Integration with Perplexity AI
What sets our approach apart is using Perplexity AI, which has internet access and excels at retrieving relevant information. Here, it enhances the content by identifying and adding any critical missing details that might bring more clarity to our report.
We begin by initialising the chat model from Perplexity AI.
The system prompt guides the model’s behaviour. We instruct it to review the investigation report and fill in any gaps with relevant information from the internet. The goal here is to enhance the report with only essential details, avoiding unnecessary information that could distrub the focus.
Next, we iterate through each page (chunk) of the original report, sending it to the model. The response from the model is stored and later compiled into an enriched report, which we can save for comparison,
After creating the enhanced report, we need to structure it similarly to how we processed the original document. This allows us to seamlessly combine both sets of chunks, original and enhanced, into a unified dataset for further processing.
Transforming to a Graph Format
The next crucial step is converting these chunks into a graph format, where nodes represent entities (like people, places, and events), and edges depict relationships between these entities (like "is a suspect in," "was seen at," etc.). This representation is fundamental for any Knowledge Graph as it provides a visual and structured way to analyze complex relationships.
LangChain’s LLMGraphTransformer, which streamlines the process of converting unstructured text into a graph format. Below is a demonstration of how this is accomplished,
At its core, the LLMGraphTransformer utilises the capabilities of Large Language Models (LLMs) to extract relevant information from textual inputs. The transformer operates in a multi-step process, as mentioned below,
1. The LLMGraphTransformer begins by taking raw textual data as input. This data can come from various sources, such as articles, reports, or any other narrative text.
2. The transformer then identifies and extracts entities from the input text.
3. Alongside entity recognition, the transformer analyses the relationships among these entities.
4. After identifying nodes (entities) and edges (relationships), the transformer constructs a GraphDocument.
5. The final output comprises a collection of GraphDocument objects that represent the original textual data in a graph format. Each graph document systematically lays out the entities and their relationships.
The key functionality of the LLMGraphTransformer lies in its convert_to_graph_documents method.
The convert_to_graph_documents function processes the input pages
, extracting relevant entities and their interrelations to construct a graph representation. Synchronously converts a sequence of documents into graph documents.
The output depends on the chosen BaseLanguageModel (LLM), and while processing can take some time due to document complexity, it results in GraphDocument objects that facilitate easier analysis.
The output from these print statements provides a clear view of the nodes and relationships constructed. For instance, you might see an output like this,
Nodes :
Relationships:
Now, you might be wondering, what about our Neo4j instance? And where are the visualizations?
I hear you. Let’s bring our Knowledge Graph (KG) into Neo4j.
To load these visualisations into our Neo4j instance, simply run:
Now, open your Neo4j instance, refresh the page, and check the left sidebar. You’ll see plenty of new nodes and relationships added, click on them to explore the graph visually!
I’m sure it’s all coming together now. Take a look at this masterpiece, notice how cleverly it links all the nodes, transforming dense text information into an easily analysable visual format.
GraphRAG in Action
The core of this blog centers on implementing GraphRAG within our existing Knowledge Graphs (KGs). Why use GraphRAG? It allows us to query our KGs and extract deeper, more interconnected insights. Unlike traditional RAG, GraphRAG is especially effective here, as it not only retrieves information but also preserves semantic relationships and the connections between nodes.
Our example includes the Sheena Bora murder case, a complex web of events, relationships, and timelines involving multiple people. Each person has connections that need to be understood in terms of their role, relationships to others, and involvement in events related to the case. Here’s how GraphRAG works,
- To start, I built a knowledge graph for the Sheena Bora murder case, gathering important details like people involved, timelines and relationships.
- Now, let’s say I ask, “Who are the main suspects, and how are they connected to Sheena Bora?”, LangChain’s GraphCypherQAChain adeptly converts this natural language query into Cypher code for execution against the graph database,
This generates Cypher that accurately reflects the query's intent. The subsequent output from the query against our KGs provides insights like the following,
The context of the KG is also taken into consideration. This context is nothing but the semantic relationship among different nodes.
3. If I inquire further with, “Is this a conspiracy plan?”, the model once again translates the question into Cypher,
The system processes this query against the context stored in the KG, resulting in,
4. LangChain’s GraphCypherQAChain helps by converting the question into a Cypher query, allowing us to find relevant transformer papers and determine the most influential authors based on metrics like citation counts or collaborative works.
5. The query ultimately returns a comprehensive view of important papers and notable authors. GraphRAG pulls everything together, identifying top authors based on their influence in the field.
As we go through these questions, each one helps bring the case details together visually, connecting people, events, and testimonies in a clear, organised way.
Using GraphRAG in this way, we’re able to take vast amounts of text-based information and transform it into a structured, organised format that’s straightforward to analyse and understand.
Real-World Use Cases and Applications of Knowledge Graphs and GraphRAG in Investigations
These tools have practical applications in real world scenarios where uncovering relationships within massive data sets is key to solving cases, understanding networks, and finding hidden insights. Let’s dive into a few scenarios where KGs and GraphRAG play an instrumental role,
- Tracking Criminal NetworksImagine detectives are investigating a series of crimes suspected to be linked to a large criminal organisation. By building a KG that includes suspects, their associates, locations, and relevant incidents, investigators can begin to see how certain individuals are connected.
- Fraud Detection in Financial NetworksKGs are highly effective for financial institutions looking to detect such fraud by revealing the flow of money across different entities, including accounts, businesses, and transaction locations. With GraphRAG, they can directly query patterns, like searching for transactions over a certain amount flowing through multiple accounts in quick succession.
- Tracing Missing PersonsWhen someone goes missing or is at risk, time is of the essence, and every clue matters. In these cases, a KG can be built to include known connections, last known locations, call records, social media posts, and relevant events.
These scenarios show just a glimpse of how Knowledge Graphs and GraphRAG empower investigators, analysts, and security professionals.
Conclusion and Future Directions
In this blog, we looked at how Knowledge Graphs (KGs) and Graph Retrieval-Augmented Generation (GraphRAG) can help in investigations. From building one to putting it to test with GraphRAG.
Looking ahead, we can expect KGs and GraphRAG to become even more powerful. Imagine KGs that not only store information but also learn from it, can we all agree how cool would that be?Another fascinating direction is the capability for real-time updates.
This means investigators could have immediate access to the latest information, helping them make quicker, more informed decisions.For instance, if new evidence comes to light during an investigation, a KG could instantly update its structure and provide insights based on this new data, enabling a more agile response.
As we embrace these innovations, it’s important to stay curious and keep exploring how KGs can evolve. By understanding and adapting to these changes, we can ensure that we’re using the best available tools to navigate the ever-complex world of investigations. The future looks bright, and the potential for KGs in this space is immense!
If you're looking to build custom AI solutions for your organization or want to transform your AI product ideas into reality, we can help. At Ionio, we specialise in taking your concepts from idea to product. Contact us to start your AI journey today.
Happy coding!