- 31
- 441 967
Denys on Data
Germany
Registrace 27. 04. 2015
I share my thoughts on data engineering, architecture, analytics, and machine learning
Query structured data with LLM: LlamaIndex with RAG
Query structured data with LLM: LlamaIndex with RAG
Are you ready to take your data querying skills to the next level 🤩🤩🤩? In this video, we dive deep into the powerful combination of LlamaIndex and Retrieval-Augmented Generation (RAG) techniques to revolutionize how you interact with structured data. Discover how Large Language Models (LLMs) can transform your data analysis and querying processes.
Here is a detailed walk through of how exactly querying a structured data source works with LLMs and Llamaindex.
🔍 In This Video, You'll Discover:
LlamaIndex Uncovered: Understand how this innovative tool can streamline and enhance your data querying process.
The Magic of RAG: Learn how Retrieval-Augmented Generation can supercharge your data analysis and improve accuracy.
Step-by-Step Tutorials: Watch real-time demonstrations showing how to leverage these technologies for practical, real-world applications.
👉🏻👉🏻👉🏻Don’t forget to like, comment, subscribe and hit the bell icon for more insightful content on data analytics and advanced querying techniques!
🤝🤝🤝Join us and unlock the full potential of your data with LlamaIndex and RAG! Here is the link to the code repo that has the notebook used in the tutorial: github.com/denysthegitmenace/aws-bedrock.
Are you ready to take your data querying skills to the next level 🤩🤩🤩? In this video, we dive deep into the powerful combination of LlamaIndex and Retrieval-Augmented Generation (RAG) techniques to revolutionize how you interact with structured data. Discover how Large Language Models (LLMs) can transform your data analysis and querying processes.
Here is a detailed walk through of how exactly querying a structured data source works with LLMs and Llamaindex.
🔍 In This Video, You'll Discover:
LlamaIndex Uncovered: Understand how this innovative tool can streamline and enhance your data querying process.
The Magic of RAG: Learn how Retrieval-Augmented Generation can supercharge your data analysis and improve accuracy.
Step-by-Step Tutorials: Watch real-time demonstrations showing how to leverage these technologies for practical, real-world applications.
👉🏻👉🏻👉🏻Don’t forget to like, comment, subscribe and hit the bell icon for more insightful content on data analytics and advanced querying techniques!
🤝🤝🤝Join us and unlock the full potential of your data with LlamaIndex and RAG! Here is the link to the code repo that has the notebook used in the tutorial: github.com/denysthegitmenace/aws-bedrock.
zhlédnutí: 356
Video
Amazon Bedrock Agent: LLM-Powered Text-to-SQL for Data Analytics
zhlédnutí 702Před měsícem
Amazon Bedrock Agent: LLM-Powered Text-to-SQL for Data Analytics Ready to supercharge your data analytics with AI 🚀 ? In this video, we dive deep into Amazon Bedrock Agent and show how its LLM-powered Text-to-SQL feature simplifies complex data queries in just a few clicks. Say goodbye to manual SQL coding and hello to automated insights! Text2SQL has a loooong way to go to replace data analyst...
LLM for data analytics: text-to-sql 3 architecture patterns
zhlédnutí 2,2KPřed měsícem
This is the first video in a series exploring how to work with structured data using Large Language Models (LLMs). In this video, I explain the three main architectural patterns for building Text-to-SQL pipelines: 1. Prompt engineering & manual metadata retrieval (BASE) 2. BASE RAG for metadata retrieval 3. 1 or 2 using the fine-tuned model Stay tuned for more videos in this series on leveragin...
End-to-end ML pipeline with SageMaker pipelines | Quick walkthrough
zhlédnutí 1KPřed měsícem
Quick walkthrough of building an ML pipeline with SageMaker pipelines. Here is the link to the original tutorial from AWS sagemaker-examples.readthedocs.io/en/latest/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform.html #aws #sagemaker #mlops #ml
AWS Bedrock Tutorial: chat with your files in 10 min with AWS Bedrock, Streamlit, and knowledge base
zhlédnutí 2,8KPřed 2 měsíci
I show how to build an AI Agent with a knowledge base using AWS Bedrock and Streamlit in 10 minutes. The idea is to build the skeleton for the app as quickly as possible. Accuracy and deployment are not of concern in this video. Here is the repo for the Streamlit UI github.com/acwwat/amazon-bedrock-agent-test-ui #genai #ai #aiagents #awsbedrock #aws #streamlit
Langchain tutorial cite sources
zhlédnutí 3,1KPřed 11 měsíci
Here we look at what's available on the surface when it comes to citing sources with Langchain and OpenAI. 00:07 Intro to citing sources 00:49 OpenAI playground 02:48 Langchain cite sources fuzzy match chain 08:25 Langchain multiple sources #llm #langchain #openai #chatgpt #dataengineering #dataarchitecture
Querying a database with OpenAI's ChatGPT and Langchain
zhlédnutí 1,5KPřed 11 měsíci
This video is a technical deep dive into how Lanchain interacts with OpenAI API to answer the questions about relation data. If you simply want to see Langchain and OpenAI working together on top of a Postgres database to answer user questions-make sure to watch my previous video (appears as a cards in the very beginning) #llm #langchain #openai #chatgpt #dataengineering #dataarchitecture
Langchain tutorial. Query a database with OpenAI's ChatGPT
zhlédnutí 7KPřed 11 měsíci
Here is a quick overview of how to query data in a relational database (Postgres in our case) with the help of OpenAI's ChatGPT and Langchain. I double-check the results provided by the LLM and stress multiple times that outputs are non-deteministic and that you should be using all sorts of safeguards when relying on the results generated by LLMs :) #llm #langchain #openai #chatgpt #dataenginee...
Langchain chatbot
zhlédnutí 221Před rokem
00:32 Langchain Chatbot in Action 01:13 Gettting started without memory 04:00 LangChain chains brief intro Adding memory 09:41 Adding a knowledge base 17:20 LangChain with knowledge base wihtout losing general knowledge 19:26 Wrap-up into a Streamlit app In this tutorial, we'll walk through the process of building a LangChain chatbot using OpenAI's ChatGPT. Starting with a simple chatbot, we'll...
Digital twin on top of AWS IoT TwinMaker-Dashboard overview
zhlédnutí 1,1KPřed rokem
In this CZcams video, we get an overview of a dashboard built on top of a digital twin using the AWS IoT TwinMaker. The dashboard has several components, including an assets list with associated alerts, data feeds from the assets, a hierarchy of assets, a 3D view, and a live video stream. I will continue learning about the world of digital twins and share my insights with you. If you're interes...
Index-free adjacency in graph databases explained
zhlédnutí 634Před rokem
Short explanation of what an index-free adjacency is and why it makes certain graph traversals much more efficient compared to index-based lookups. Link to the Miro board used for the video: miro.com/app/board/uXjVOTZper0=/?share_link_id=139126415302
Graph Databases-Chapter 1. Introduction
zhlédnutí 71Před rokem
1. Graph is a structure that represents entities as nodes and relationships between the entities as edges or vertices. 2. Twitter data model maps nicely to a graph. Users and posts are represented as nodes. The actions of following a user and publishing a post are represented as vertices. 3. Labeled property graph-it the most popular graph data model. It’s main features are: 1. it has nodes and...
AWS Certified Solutions Architect Associate - Thoughts, impressions, tips
zhlédnutí 9KPřed 5 lety
I recently passed AWS Certified Solutions Architect Associate Certification. Here is a link to the so called "badge" www.certmetrics.com/amazon/public/badge.aspx?i=1&t=c&d=2018-12-28&ci=AWS00648127&dm=80 In this video I share my thought on AWS Certification and certifications in general, I also share preparation strategies that worked successfully for me. If you have any questions or need an ad...
AWS Fargate tutorial - Running a Docker container with a Python Flask app
zhlédnutí 24KPřed 5 lety
In this AWS Fargate tutorial we package a Flask app into a Docker container and run it on top of AWS Fargate - a new compute engine of Elastic Containers Service (ECS). A thorough walkthrough of building the mentioned Python Flask app is in this video czcams.com/video/UNrr8MneoJo/video.html A short explanation on what a Docker container is can be found here czcams.com/video/qgWLcywSsjY/video.ht...
Getting started with Docker - What is a Docker Container?
zhlédnutí 347Před 5 lety
In this video I explain what a Docker container is.
Flask Tutorial - Building a simple web app with Flask and Python
zhlédnutí 2,6KPřed 5 lety
Flask Tutorial - Building a simple web app with Flask and Python
AWS Lambda Python triggered by API Gateway
zhlédnutí 1,1KPřed 5 lety
AWS Lambda Python triggered by API Gateway
Create an RDS Postgres instance and connect with pgAdmin
zhlédnutí 36KPřed 5 lety
Create an RDS Postgres instance and connect with pgAdmin
S3 AWS - Upload local folder to AWS S3 bucket
zhlédnutí 784Před 5 lety
S3 AWS - Upload local folder to AWS S3 bucket
S3 AWS - Load files from and to AWS S3 bucket
zhlédnutí 163Před 5 lety
S3 AWS - Load files from and to AWS S3 bucket
S3 AWS - Downloading an entire AWS S3 bucket
zhlédnutí 12KPřed 5 lety
S3 AWS - Downloading an entire AWS S3 bucket
AWS CLI Tutorial - Setting up AWS Command Line Interface (AWS CLI) on your laptop
zhlédnutí 580Před 5 lety
AWS CLI Tutorial - Setting up AWS Command Line Interface (AWS CLI) on your laptop
Creating PostgreSQL tables with pgAdmin
zhlédnutí 43KPřed 7 lety
Creating PostgreSQL tables with pgAdmin
Populating PostgreSQL tables using pgAdmin
zhlédnutí 20KPřed 7 lety
Populating PostgreSQL tables using pgAdmin
Creating a PostgreSQL database with pgAdmin and logging into it
zhlédnutí 256KPřed 7 lety
Creating a PostgreSQL database with pgAdmin and logging into it
Can we not use RAG within bedrock and use the default OpenSearch vector db for this. Does that also do chunking and creating vector store similar to llama index?
Thanks man
Thank you for this tutorial!
You're using service context in vectorstoreindex in the lambda function but not using that in the notebook. Why is that? Will the output in the two cases be different. Sorry for asking many questions, but the video is really interesting, and I am trying to learn llama_index
Really good illustration Denys! Just one question, will this architecture still function well when you have too many tables with bad naming? I only see some products like AskYourDatabase work well with this situation. How should the solution fit in this architecture?
I guess the easiest/cleanest/cheapest is getting the names right. Or creating a layer of views on top. In my last video I provide an extra explanation for each table, which could also help. But if you are looking for hands-off solution that should work "out-of-the-box" on top of lots of tables, i guess having tables named nicely goes a long way. Let me know if I misunderstood the question.
Just a question about the action group: Did you build a simple lambda function from scratch or create one from a container? I am asking this question to understand how you installed the dependencies specified in the requirements file
It's a container-based lambda. The requirements are installed during the docker build stage. The whole thing was deployed/managed with terraform
Thank you. It seems like you're using terraform to push the image to ecr, right? I am wondering if it is possible to create a video about creating the action group step by step.
I don't think openai key is required since you're uisng aws bedrock models, right?
Correct. In the tutorial credentials are pull from env vars
thank you just discovered your channel
really good video. thank you!
Glad you liked it!
Thanks Denys for putting this together - can you elaborate on what goes into the "prompt template"?
Sure. Here is link to the file with a prompt template I am covering in my last video: github.com/denysthegitmenace/aws-bedrock/blob/main/query_structured_data_lambda/prompt_templates.py SQL_TEMPLATE_STR is a good example
Great video. Many thanks for sharing this
Hello Denis. Thanks for the video. I am wondering if it’s possible to add implementation details in tech stack and tools for RAG type of architecture. What framework was used to load DB schema - if Langchain, what loader and how it was vectorized, which Vector DB is good for this type of cases and Foundational modes kids from your experience for both: vectors as well as generation. Maybe some examples of code for loader, retrieval and connectors if possible. I have the case in mind to implement and puzzling on how to load structured data into vector DB as well as retrieve it for generations. Thank you in advance. ❤
Yep. Planning to publish this exact walk-through this weekend. No Langchain, though. It was done with LLamaIndex. Also, I am not using any extereanl Vector storage for this tutorial here-it's all in-memory. But I know that my collegues (and we are working primarily on AWS) started using Aurora PostgreSQL with pgvector instead of OpenSearch serverless for cost-efficiency reasons. Hope that helps and stay tuned :)
Just uploaded the video. Curious to learn what you think
Many thanks for your prompt answers. Can't wait to see the next video
Just uploaded the video. Curious to learn what you think
it seems like RAG (knowledge base) has not been used in the architecture, llama indes is used instead, so the llm model (foundation model) is building the query with the help of user NLP input + few shots examples and tables metadata, right?
Correct. There is no knowledge base, but the approach for pull tables metadata and for identifying most relevant queries is exactly the same as used in RAG-identifying similarity of the user input and various elements.
I would be also greatful if you can share a walkthrough of how to create few shot examples
Yep. Will make sure to include it.
Great video. Could you please explain here or in a separate video the glu and metadata data extraction part
Thank you. Sure. Will do so during the coming weekend :)
Good day! Thank you for all the kind words 💞 Unfortunately, I don’t have the capacity to answer specific technical questions here. If you need support with a specific problem, please consider joining my Patreon private chat (link in bio). There, I can help you with your issues and we can also schedule private sessions to address more complex problems.
InvokeAgent operation: Failed to retrieve resource because it doesn't exist. Retry the request with a different resource identifier - this is the error I am getting .. Any thoughts on this ?
I recenly created a patreon (link in bio) for providing any sort of guidance. Feel free to join for help with this and any other question
hello sir!!! I have a issue in creating the knowledge base. When I create it shows failed to create open search serverless collection.eventhough I gave the full access access for bedrock and opensearch service for the user and made the s3 bucket to be accessed by the opensearch service, the issues is not fixed. can you help me to clear that issue? I'm struggling with that issue !!! please help me !!!!
Hi Sanjaimi! I just created a patreon for exactly such questions (link in bio). Feel free to join and get some help with this and any other questions you might have in the future.
@@DenysonData will do Sir !!!
i've down all 3 of architectures you've mentioned, but still not getting the ideal results. The main issues i've encountered: 1. lack of text2sql pairs, i've collected all of the sql queries succeed in our database, but it's incredibly hard to inference back to the original query in human language. 2. it's almost impossible to help llm understand the relation between business info(usually used in human language) to actual data structure. 3. the information dense is quite low when export database scheme, table structure, we used lots of nested json stored in single column, also enums with no detailed discription. but it was done months ago, today i might have some new ideas on issue 1&3, but 2 remains to be seemingly impossible.
RE: "i've collected all of the sql queries succeed in our database, but it's incredibly hard to inference back to the original query in human language" Good approach! However, i guess, with txt2sql more than ever you need to start with the end user questions-and from my experience there is usually a VERY limited set. RE: "it's almost impossible to help llm understand the relation between business info(usually used in human language) to actual data structure. " 100% that's also my main argument against the hype around "genbi" and AI will replace data analysts. RE: "we used lots of nested json stored in single column, also enums with no detailed discription." as with efficient data analytics pre-processing according the END business needs is your best friend here. Point the smartest person at the complex schema with dozens of caveats and they would trow their hands up rather sooner than later
Oh... I was hoping you would cover a bit more, like if I have the source in the Documents metadata, how can I get that to be used for citations
Sorry for the late reply. Figuring out the "unanswered comments" functionality just now. Let me know if you still have questions in this directions. Also, if you need support with a specific problem, please consider joining my Patreon private chat (link in bio). There, I can help you with your issues and we can also schedule private sessions to address more complex problems.
I agree with what you were saying about langchain hiding things lol. Even with Debug and verbose on, with everything set to "with_config(run_name="blah blah")", while also reading through the source code, it's hard to really trace what's going on in langchain :')
how to we coming to this page the first is how we registry to get user and password
Hi! I just created a patreon for exactly such questions (link in bio). Feel free to join and get some help with this and any other questions you might have in the future.
Hey Denys, great video! Is it possible to query a postgreSQL database?
Thank you. Yes!
Just one question. How about the database which has around 1000 table? how it will handle prompt and tokens? will it send 1000 table schema each time the query is passed by the user? will appreciate your prompt reply. thank you.
Good question, Mihir! I covered this in my latest video-you would vectorize your tables schema so that LLM would decide on the go which tables are most appropriate for answering user question. I am planning to publish anothe video on this topic this week as well. Also, you could subscribe to the patreon I just created (link in bio), where you could get my personal take on any future questions you might have.
Excellent. Thank you for this tutorial.
I would never trust the LLm to write the sql query. I would create all the sql queries behind a JDBC and write restful api with search parameters. Can langchain do that?
Providing examples of the queries is one of the best practices these days. It is then used for the few-shot-prompt creation. I explain it in my latest video. If this is of interest to you, I just created a patreon account (link in bio), where you could get my personal take on any future questions you might have.
thank you
thank you
how about context memory? the uer asks question A and then a follow up question that needs to use question A to answer?
Context memory can be implemented using conversationBufferMemory... please understand LLMs are Stateless by design
Good point.
thank you !!
How dobi add memory? For example Question 1. Who how many salesmen do we have Answer 3 Question2. Can i get their names? As you can see, the second question rely on the first .. How do we achieve that? Thanks
Hi! I just created a patreon for exactly such questions (link in bio). Feel free to join and get some help with this and any other questions you might have in the future.
have you been able to get this working with Llama through langchain?
Never got to playing with open source models. But many people do that!
Finally someone who is not indian, thank you Denys!!!
Is there anything simlar for no-sql?
For sure. This approach would work for any db engine. LLM only need to generate the correct syntax. Sorry for the delayed reply. Figuring out the "Unanswered comments" functionality only now -_-
does it still work with the current updates? been having trouble downloading mine, this tutorial just made it all possible.
In 1:57 i try to save but i got an error "Unable to connect to server: connection failed: FATAL : password authentication failed for user "postgres"
very good video, will watch the next one about a deeper sql context.
This is so helpful thanks! I myself am struggling with the /files endpoint. I want to upload a pdf but it only accepts jsonl? Any advice
Sounds like you want to impelemnt RAG use-case, but the OpenAI /files endpoint for now is intended to upload data for fine-tuning, which is a completely different use-case.
how can this work with FAISS vector databases?
This use-case starts being implemented in a variety of products these days. For example-Microsoft allows you to search your Sharepoint documents (which of course under the hood are vectorized). Here is an example repo-github.com/Azure-Samples/azure-search-openai-demo. We played around with it in our company and it works smoothly.
Can you make video using Local LLM (open Source instead of Open AI) to do the same. TIA
Good point. Thank you for the suggestion. Will do!
Sorry. Never got to this one. The world of data is so unpredictable 🙈
Is it possible to cite the document sources?
Yes. Will make a short video on this soon 🙂
Here we go czcams.com/video/MOawB4k9-jk/video.html
very short video. please upload with full detail
Sure. Let me know what exactly you would like to learn about.
thank you so much....I was struggling for hours ....yet you made it very simple.
Glad I was able to help :)
these two graph videos were some of the best I've seen yet, keep going sir, amazing channel
Thank you! Let me what topics you would be intested to dive into and I will look into it.
Denys I have an issue that my task definition keeps stopping. exec /usr/local/bin/flask: exec format error. I need help man. Is it possible to check it via discord?
Sorry for the late reply and thank you for the great suggestion! I just added a link to a Telegram public group where you can ask questions like the one above to the channel description.
THANKYOU SO MUCH! I CAN FINALLY STOP CRYING OVER THIS SHIT NOW. 😭😭
Ayyyeee! a well deserved like! Congrats for the great job! It helped me! It took weeks for me to get to deploy my flask app, the issue was the security group as well. Thank you so much again!
noticed your github link isnt working anymore in the description
Sorry about. Had to do a major GitHub cleanup. Let me know if there is anything I could help you with.
Thanks bro