LangChain & GPT 4 For Data Analysis: The Pandas Dataframe Agent

Rabbitmetrics

zhlédnutí 57 404

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 29. 08. 2024

Komentáře • 102

@vineetbabhouria6504 Před rokem ⁺¹²
This is what I was searching for. Keep it up. Very informative no bullshit
@rabbitmetrics Před rokem ⁺¹
Great to hear! Thanks for watching
@mikehynz Před rokem ⁺⁵
I just started a master's in data analytics (I'm actually a teacher tho). I'm so glad I found this channel. So effing interesting. Seems like a hell of a time to get into this space.
@bibhutibaibhavbora8770 Před rokem
Found a gem channel, will learn so many new things now.
@rajuchoudhari2409 Před měsícem
Thanks for sharing. Awesome!
@avidlearner8117 Před rokem ⁺¹
Fantastic stuff!!! Can be applied to so many things…… thanks for enlightening us with such fantastic content, it’s a lightning speed growing technology and there’s not a lot of information on the subject….. what I’d like to see is proper fine tuning via conversation history that gets saved and referenced in a separate vector database from the document analysis…. Reminds me of the early web! Everything was to be done….
@AdrienSales Před rokem ⁺¹
Very well explained. Very compact tutorial. Keep going !
@rabbitmetrics Před rokem ⁺¹
Appreciate the support! Thanks for watching
@deekshitht786 Před 2 měsíci
Please upload more videos regarding langchain plss❤
@thedonflo Před rokem
I came across your channel and it is exactly what i have been searching for. Keep up the great work. Small request. Can we get a similar video but for pdf?
@HoGSwain Před rokem ⁺¹
Nice work.
However, for newbies like me, PLEASE EXPLAIN HOW YOU GOT TO THAT .ENV FILE SECTION WHERE YOU INPUTTED YOUR API KEY.
@johnpoc6594 Před rokem
That is crazy good, thanks for the video. New sub here!
@DeepakSingh-ji3zo Před rokem ⁺²
Does langchain send this entire csv file to openai?
@rajupresingu2805 Před rokem ⁺⁵
Great Video, can you make one that uses an open source LLMs instead of GPT4 for handling larger pandas datasets having hundreds of thousands of records as in actual production scenarios for orders. Thanks!
@screweddevelopment12 Před rokem ⁺¹⁷
I personally feel like ChatGPT is not the best AI tool for data analysis work. Writing documentation for code and then having copilot write the actual code goes like a million mph, and you don’t pay per token.
@rabbitmetrics Před rokem
I agree copilot is superior right now, but things are moving fast
@Jesse-rm4xo Před rokem ⁺⁴
isn't copilot powered by OpenAI codex?
@urvog Před rokem
We need to consider the application of these tools by analysts who may not possess programming skills. This is where their usefulness truly shines
@samueltallman7317 Před rokem
Chatgpt ≠ GPT4
If you studied, you’d understand this
@samueltallman7317 Před rokem
@@rabbitmetricsI’m a little disappointed you couldn’t point out here how chatgpt is a demo implementation of GPT-4 and not the same as openAI apis for it where you set your own temps
@rafaeldelrey9239 Před rokem
It is an interesting concept and I hope it improves with time. Currently, It just dont work for so many examples. A lot of parsing errors, log chains of retries, plain wrong answers.
@tonymusk Před rokem ⁺³
Great video, really informative! I have a question regarding the dataframe - does Open AI have access to the data? I'm curious if a company has data and wants to use this kind of process, does Open AI have access to the data? Or does this process adhere to GDPR regulations?
@memesofproduction27 Před rokem
A random anecdote: in order to move yourself up on the waitlist for access to bing chat (gpt4), you should set Microsoft as your default for everything, starting with your browser, then with Microsoft Wallpapers. Then the app on your phone etc... what would a pesky GDPR reg do once the ai has has root acces to all machines because its gatekept otherwise?
@armaanchawdhary9427 Před rokem
Same question. But what if all the data is stored on Azure cloud. In a way, Microsoft has access to all our data.
@bwilliams060 Před rokem
These are really excellent videos thank you. It's just a shame you are not sharing the workbooks. It really helps to learn when you can process and adjust the code as you go!
@ramp2011 Před rokem ⁺²
Great video. Will this also work with GPT 3.5 API? Or it needs 4? Thanks
@helllton Před rokem
Great video.
@usoppgostoso Před rokem
I believe the output parser error is related to the format of the output that it's attempting to parse. Unless you have set up the proper tools to handle some specific formats (like graphs), it might fail.
@paaabl0. Před rokem ⁺¹
Thing is, that these are still very basic queries that any human can quickly write a pandas code for. For complex queries it's getting lost. Moreover, both gpt3 and 4 are prone to do basic math mistakes.
But of course the overall direction is pretty awesome, I'd love an agent to write reliably buch of pandas and sql boilerplate code for me a daily basis.
@rabbitmetrics Před rokem ⁺¹
Agree, but I expect the LLMs to improve to the point where it will write accurate queries consistently
@cristian15154 Před rokem
That was great!
@Mactuarchitect Před rokem ⁺¹
If the DataFrame is too long for the chatgpt UI prompt, does that mean by using Langchain you can bypass this limit?
@Mrlemar1 Před rokem ⁺¹
Very interesting. Does giving it a specific file to analyze solve the hallucination problem?
@TheMagmarunning Před rokem
Insane!
@MogulSuccess Před rokem ⁺²
How does an organization share proprietary data with OpenAI and have the LLM do work? We need a middleware obfusticating the data by some distributing normalization such that OpenAI can't reverse engineer the context over time as well as take the secret, top secret data, otherwise none of this is scalable
@RutvikPatel2611 Před rokem ⁺¹
It's not, your best bet would be an implementation of local version alphacah llm or something use it ,
Even then i don't think this is the best approch may be it can take coloumn name and datatype (+metadata) and spits out a formula to which operation is performed on local machine rather openai for both data security and answer integrity also what if the file is extremely large like a parquet file which even gpt 4 can t process in which case something like spark can do the transformation or calculation for us, it be great product tbh
@RutvikPatel2611 Před rokem ⁺²
And yes pricing on data operations on open ai server is definitely not sustainable
@mattforsythe5037 Před rokem
A company called Palantir does this
@MogulSuccess Před rokem
@@mattforsythe5037Palantir created a Data Security Middleware to communicate with external LLMs using NLP apis?? Whoa!
@bharadwazsripada5843 Před rokem
HI, In this approach is the data being shared with OpenAI? My understanding is we are using pretrained model and creating an agent for the environment.
@joseluisbeltramone599 Před rokem
Thank you for the excellent video. Doing analytics on a dataframe os my own, with 3 thousand columns, I came accross the tokens limit for the model I used (chatgpt 3.5). Is there anyway to overcome it?
@pauldriessens715 Před 10 měsíci
What are the advantages of using this method over using OpenAI's advanced analytics plugin?
@rabbitmetrics Před 8 měsíci ⁺¹
Currently not much. Today I would look into using AutoGen for automating data analysis with OpenAI
@kingmouli Před 6 měsíci
there was one catch while using gpt-4, if we pass multiple dataframes it just considering header in the prompt and thinking those are the rows, in all dataframes , could you please do a video on how to pass multi dataframes to gpt-4 pandas data agent?
@rabbitmetrics Před 4 měsíci
I'm exploring different ways to work with Pandas efficiently at the moment, will make a video about this at some point
@4p4k Před rokem
So does langchain use GPT to type a sql query, queries the database, then outputs the result? Thats pretty impressive.
@yookoT Před rokem ⁺¹
Could you please tell me how much did the GPT4 API cost for this task? I have only used 3.5 before and heard that GPT4 is much more expensive.
@Fordtruck4sale Před rokem
It's like 30X more expensive than the 3.5 turbo model... curious how many tokens these requests soak up!
@eddyvu8109 Před rokem
$20/month for chatgpt plus
@Fordtruck4sale Před rokem ⁺³
Thanks so much! Do you have a github or colab link for the file?
@rabbitmetrics Před rokem ⁺¹
Your welcome! Don’t have a repo yet but will post a link
@vilmorevilladolid527 Před rokem
hello! would love that!
@davidmichaelcomfort Před rokem
Looks interesting. One question I have is whether there will be substantial costs for using the OpenAI's models for large data sets?
@rabbitmetrics Před rokem
I’d err on the side of caution when using a service with this pricing model. This wasn’t a problem but using OpenAI embeddings can get pricy if you’re processing large amounts of textual data
@Maisonier Před rokem
Amazing! now OpenAI just included in their "code interpreter", there is any way to use Panda Dataframe with a local model, like stablevicuna, redpajama or mpt-7b? thank you. Liked and subscribed.
@method341 Před rokem
Will your dataset be uploaded to OpenAI if you do this? If so, how do I keep my dataset private?
@user-yg6fr6jy3d Před rokem
I have a table with hundrands of rows and 20 columns. I even created a smaller table with only the first 5 rows for testing and I still get this annoying error:
InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 12432 tokens. Please reduce the length of the messages.
It's impossible for me to work with any csv file like this. What can I do?
@ShaharDS Před rokem
Hey man than's for your video!
I'm getting an error saying
AuthenticationError:
Output is truncated
do you know how to fix it?
@yatinahuja802 Před rokem
Can we use this to get answers from a set of questions if we have customer reviews instead of sales data.
Like we can ask any question related to a product or summary of the reviews to thousands of comments.
@rabbitmetrics Před rokem
Indeed, have a look at this video czcams.com/video/UO699Szp82M/video.html
@sodasundae9009 Před rokem
Does these order data gets sent to chatgpt? Is there anyway to keep it local? Vicuna?
@dimitriosmolfetas4711 Před rokem
Hey you video is very informative and a great tutorial, i have a question if i use Visual Studio will the code work inside VS as it does in Jupyter. Or i should write the commands in CLI since i have python on PATH? I'm very new to coding and python i hope my question makes sense. Anyway thank you for the great videos!!
@devinwalker9202 Před rokem ⁺¹
VS Code supports Jupiter, so you can run the notebook directly in VSCode. I do it all the time.
@dimitriosmolfetas4711 Před rokem ⁺¹
@@devinwalker9202 thanks so much dude I literally found out about that an hour ago and then I saw your comment. I wish you the best thank you.
@theguildedcage Před rokem
Does pinecone or any other service store and have access to your data? This would be important to know for the use of enterprise applications.
@rabbitmetrics Před rokem
Yes, they have access to the embedding vectors and the metadata about each embedding
@ronakdinesh Před rokem
curious does it also give graphs if you ask it?
@ambrosionguema9200 Před rokem
We have problems with the limit of tokens?
@surajkhan5834 Před rokem
How can we save the df to pinecone and query them
@pulkitkp Před rokem
can we give multiple datframes as input?
@FREELEARNING Před rokem
Thanks for the video. Based on my understanding, the openAI GPT is able to do the task solely based on the file name and informative column names, because as you might know these models are constrained by the context length and so they aren't able to parse the whole file and really analyse the data. In my opinion, we aren't yet doing something magical heare. We can get most of the results only using some basic pandas functions like df.describe() or df["Column"].value_counts(). What do you think of this?
@startlingbird Před rokem
I think you can combine his video with this one, czcams.com/video/6WE09Ihdn9M/video.html you can get around the plugin waiting list problem.
@kentml6856 Před rokem
Great stuff, have you been successful with using sklearn with this methodology?
@youwang9156 Před rokem
just wonder why you can use gpt-4 for model name ?
@johnwallis1626 Před rokem
anyone tried getting the agent to create graphs in say matplotlib? im getting 'OutputParserException: Could not parse LLM output' error. i can do it using exec on python code generated using normal chat completion but not this way. good vid tho.
@StephenRayner Před rokem
You found anything similar but using SQL yet?
@marcomaiocchi5808 Před rokem
Great video. But no one is going to work with this workflow
@prasanthkumar7328 Před 10 měsíci
i see a error in the last step Must provide an 'engine' or 'deployment_id' parameter to create a
@Ramipineappl3 Před rokem
how to do it with nested json instead of CSV?
@sanesanyo Před rokem
There is a new package called panda ai which does effectively the same thing but in fewer lines of code. Under the hood, it is probably doing the same thing.
@rabbitmetrics Před rokem
Nice, thanks. WiIl check it out
@rolandheinze7182 Před rokem
@@rabbitmetrics this dude is right, in my opinion this seems to better than what langchain currently offers through pandas_dataframe_agent. the behavior of pandas dataframe agent is very inconsistent, especially when Action: print(pyton_repl_ast(...)) is called (I often get is not a valid tool). I imagine both are doing the same thing with recursive calls to refine the dataframe operations being called and passed to the python repl. I am going to investigate the pandasai documentation as it seems to be much more straightforward and tractable for a non-contributor
@noktuz Před rokem
Where do I get the file with the code?
@SusobhanDas Před rokem
is it working with any model other than OpenAI models ?
@rabbitmetrics Před rokem
Yes. Langchain provides wrappers around various models, see python.langchain.com/en/latest/modules/models/llms/integrations.html
@doords Před rokem
Can you display the results in html tags
@ramp2011 Před rokem
Link to the notebook? Thanks
@geekyprogrammer4831 Před rokem
can you please post the dataset?
@robbieturtle6218 Před rokem
langchain charged me $7 in api calls in 30 minutes of testing because I forgot to specify a stop string :(
@xubruce Před rokem ⁺¹
I got ‘False’ at the very beginning
@rabbitmetrics Před rokem ⁺¹
Check if the keys are loaded using os.getenv('API_KEY')
@rajivraghu9857 Před rokem
Excemm
@vijaysurya6696 Před rokem
ImportError: cannot import name 'create_pandas_dataframe_agent' from 'langchain.agents'
@ericbroun4657 Před rokem

Další v pořadí

Automatické přehrávání

LangChain In Action: Real-World Use Case With Step-by-Step Tutorial