Diffbot
Open In Colab
Diffbot is a suite of products that make it easy to integrate and research data on the web.
The Diffbot Knowledge Graph is a self-updating graph database of the public web.
Use caseโ
Text data often contain rich relationships and insights used for various analytics, recommendation engines, or knowledge management applications.
Diffbot's NLP API
allows for the extraction of entities,
relationships, and semantic meaning from unstructured text data.
By coupling Diffbot's NLP API
with Neo4j
, a graph database, you can
create powerful, dynamic graph structures based on the information
extracted from text. These graph structures are fully queryable and can
be integrated into various applications.
This combination allows for use cases such as:
- Building knowledge graphs from textual documents, websites, or social media feeds.
- Generating recommendations based on semantic relationships in the data.
- Creating advanced search features that understand the relationships between entities.
- Building analytics dashboards that allow users to explore the hidden relationships in data.
Overviewโ
LangChain provides tools to interact with Graph Databases:
Construct knowledge graphs from text
using graph transformer and store integrationsQuery a graph database
using chains for query creation and executionInteract with a graph database
using agents for robust and flexible querying
Setting upโ
First, get required packages and set environment variables:
%pip install --upgrade --quiet langchain langchain-experimental langchain-openai neo4j wikipedia
Diffbot NLP Serviceโ
Diffbot's NLP
service is a tool for extracting entities,
relationships, and semantic context from unstructured text data. This
extracted information can be used to construct a knowledge graph. To use
their service, youโll need to obtain an API key from
Diffbot.
from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer
diffbot_api_key = "DIFFBOT_API_KEY"
diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)
API Reference:
This code fetches Wikipedia articles about โWarren Buffettโ and then
uses DiffbotGraphTransformer
to extract entities and relationships.
The DiffbotGraphTransformer
outputs a structured data GraphDocument
,
which can be used to populate a graph database. Note that text chunking
is avoided due to Diffbotโs character limit per API
request.
from langchain_community.document_loaders import WikipediaLoader
query = "Warren Buffett"
raw_documents = WikipediaLoader(query=query).load()
graph_documents = diffbot_nlp.convert_to_graph_documents(raw_documents)
API Reference:
Loading the data into a knowledge graphโ
You will need to have a running Neo4j instance. One option is to create a free Neo4j database instance in their Aura cloud service. You can also run the database locally using the Neo4j Desktop application, or running a docker container. You can run a local docker container by running the executing the following script:
docker run \
--name neo4j \
-p 7474:7474 -p 7687:7687 \
-d \
-e NEO4J_AUTH=neo4j/pleaseletmein \
-e NEO4J_PLUGINS=\[\"apoc\"\] \
neo4j:latest
If you are using the docker container, you need to wait a couple of second for the database to start.
from langchain_community.graphs import Neo4jGraph
url = "bolt://localhost:7687"
username = "neo4j"
password = "pleaseletmein"
graph = Neo4jGraph(url=url, username=username, password=password)
API Reference:
The GraphDocuments
can be loaded into a knowledge graph using the
add_graph_documents
method.
graph.add_graph_documents(graph_documents)
Refresh graph schema informationโ
If the schema of database changes, you can refresh the schema information needed to generate Cypher statements
graph.refresh_schema()
Querying the graphโ
We can now use the graph cypher QA chain to ask question of the graph. It is advisable to use gpt-4 to construct Cypher queries to get the best experience.
from langchain.chains import GraphCypherQAChain
from langchain_openai import ChatOpenAI
chain = GraphCypherQAChain.from_llm(
cypher_llm=ChatOpenAI(temperature=0, model_name="gpt-4"),
qa_llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"),
graph=graph,
verbose=True,
)
API Reference:
chain.run("Which university did Warren Buffett attend?")
> Entering new GraphCypherQAChain chain...
Generated Cypher:
MATCH (p:Person {name: "Warren Buffett"})-[:EDUCATED_AT]->(o:Organization)
RETURN o.name
Full Context:
[{'o.name': 'New York Institute of Finance'}, {'o.name': 'Alice Deal Junior High School'}, {'o.name': 'Woodrow Wilson High School'}, {'o.name': 'University of Nebraska'}]
> Finished chain.
'Warren Buffett attended the University of Nebraska.'
chain.run("Who is or was working at Berkshire Hathaway?")
> Entering new GraphCypherQAChain chain...
Generated Cypher:
MATCH (p:Person)-[r:EMPLOYEE_OR_MEMBER_OF]->(o:Organization) WHERE o.name = 'Berkshire Hathaway' RETURN p.name
Full Context:
[{'p.name': 'Charlie Munger'}, {'p.name': 'Oliver Chace'}, {'p.name': 'Howard Buffett'}, {'p.name': 'Howard'}, {'p.name': 'Susan Buffett'}, {'p.name': 'Warren Buffett'}]
> Finished chain.
'Charlie Munger, Oliver Chace, Howard Buffett, Susan Buffett, and Warren Buffett are or were working at Berkshire Hathaway.'