Reliable Graph RAG with Neo4j and Diffbot
Vložit
- čas přidán 28. 06. 2024
- We're developing a GraphRAG system using Diffbot's APIs to construct reliable knowledge graphs, which are then stored in a Neo4j graph database for efficient querying and information retrieval.
0:00 intro
0:22 brief overview of graph rag and knowledge graphs
0:50 potential pitfalls of vector-based rag
1:29 graph rag research by microsoft
2:09 potential pitfalls of llms constructing knowledge graphs
2:20 brief intro of entity resolution
3:03 entity resolution problem with gpt-4
3:41 entity resolution handled by Diffbot
3:51 Graph RAG demo + article importer
4:37 web scraping without worrying about hallucinated sources
5:24 KG construction from news article
5:37 enrich the knowledge graph with Enhance API
5:57 final network graph
6:09 question answering (vector vs. vector+kg)
7:26 more examples (vector vs. vector+kg)
7:35 skip the outro and have fun with the repo!
7:52 attribution to Tomaž Bratanic and Anej Gorkič
Get your free Diffbot token to start building graph rag at:
app.diffbot.com/get-started
Github repo for this Graph RAG project:
github.com/tomasonjo/diffbot-...
#graphrag #knowledgegraphs #llms - Věda a technologie
Excellent presentation. Neo4j has done a great job of integrating vector embeddings with knowledge graphs. The neo4j LLM Knowledge Graph Builder for extracting entities and relationships from video transcripts, Wikipedia articles, and pdf format files is impressive. It also merges the extracted knowledge into a graph structure, and then provides an interface to query the LLM about knowledge in the graph.
Amazing video really. and jokes are definitely necessary. 🤓
Thanks for the knowledge (graph) sharing :)
Awesome indeed 😍Thank you!
omg I was trying to hold it together at the introduction when I saw the large soup bowl but I broke down when it said "whatever, y'all get replaced by AI in 5 yrs" 😂 Great video overall. I hope I am right about the size of the soup bowl and you are not 3 feet tall.😅
Awesome!
Hey. Thx for the awesome content. Would it be possible to actually show a fully working large scale graph (like proper prod scale thing), and also discuss pros and cons of the approach, and when did you find KGs working well and not so well? The reason I asked is that my own KG experiments worked perfectly for me at smaller scale, but then the speed was so slow, that it was killing my M2 mac all together.
Once the entities are extracted, can the model then be prompted to write a graph query (that could then be executed)? I’m thinking in particular of the “knowing A is B but not B is A” problem, such as when you ask an LLM “who is Mary Pfeiffer’s son?” and it does not say “Tom Cruise” but can answer “who is Tom Cruise’s mother?” just fine?
Hi it would be great if you could please make longer videos explaining how you did each of these transition for example entity extraction, relationship extraction and so on and then how you did the neo4j integration. Maybe you can make a short video like this out of the original video to attract customers while the long video would still serve as a promising directions for the developer/researcher. I love the output produced from your system, but there’s no way to reproduce what you are doing. Reproducibility is a major concern in KGC.
how does this compare to llamaindex property graph ?
DiffBot - sound not so good for privacy.
That is the role of the DiffBot? Entities extraction?
Are there any selfhosted replacements?
Hi, could you convert complex PDF documents (with graphics and tables) into an easily readable text format, such as Markdown? The input file would be a PDF and the output file would be a text file (.txt).
Maybe the pages have to be turned into images using Poppler and you could use an LLM that allows image inputs like GPT4Vision and Claude3 along with Function Calling to get the entities, objects and relations.
be my teacher,and advisor please connect
too much information on the screen at once ... really painful to follow ... be more sober and straight to the point. Jokes are not necessary
Thanks for the feedback!
Why so serious? 😅