LangChain & GPT 4 For Data Analysis: The Pandas Dataframe Agent

Sdílet
Vložit
  • čas přidán 29. 08. 2024

Komentáře • 102

  • @vineetbabhouria6504
    @vineetbabhouria6504 Před rokem +12

    This is what I was searching for. Keep it up. Very informative no bullshit

  • @mikehynz
    @mikehynz Před rokem +5

    I just started a master's in data analytics (I'm actually a teacher tho). I'm so glad I found this channel. So effing interesting. Seems like a hell of a time to get into this space.

  • @bibhutibaibhavbora8770

    Found a gem channel, will learn so many new things now.

  • @rajuchoudhari2409
    @rajuchoudhari2409 Před měsícem

    Thanks for sharing. Awesome!

  • @avidlearner8117
    @avidlearner8117 Před rokem +1

    Fantastic stuff!!! Can be applied to so many things…… thanks for enlightening us with such fantastic content, it’s a lightning speed growing technology and there’s not a lot of information on the subject….. what I’d like to see is proper fine tuning via conversation history that gets saved and referenced in a separate vector database from the document analysis…. Reminds me of the early web! Everything was to be done….

  • @AdrienSales
    @AdrienSales Před rokem +1

    Very well explained. Very compact tutorial. Keep going !

  • @deekshitht786
    @deekshitht786 Před 2 měsíci

    Please upload more videos regarding langchain plss❤

  • @thedonflo
    @thedonflo Před rokem

    I came across your channel and it is exactly what i have been searching for. Keep up the great work. Small request. Can we get a similar video but for pdf?

  • @HoGSwain
    @HoGSwain Před rokem +1

    Nice work.
    However, for newbies like me, PLEASE EXPLAIN HOW YOU GOT TO THAT .ENV FILE SECTION WHERE YOU INPUTTED YOUR API KEY.

  • @johnpoc6594
    @johnpoc6594 Před rokem

    That is crazy good, thanks for the video. New sub here!

  • @DeepakSingh-ji3zo
    @DeepakSingh-ji3zo Před rokem +2

    Does langchain send this entire csv file to openai?

  • @rajupresingu2805
    @rajupresingu2805 Před rokem +5

    Great Video, can you make one that uses an open source LLMs instead of GPT4 for handling larger pandas datasets having hundreds of thousands of records as in actual production scenarios for orders. Thanks!

  • @screweddevelopment12
    @screweddevelopment12 Před rokem +17

    I personally feel like ChatGPT is not the best AI tool for data analysis work. Writing documentation for code and then having copilot write the actual code goes like a million mph, and you don’t pay per token.

    • @rabbitmetrics
      @rabbitmetrics  Před rokem

      I agree copilot is superior right now, but things are moving fast

    • @Jesse-rm4xo
      @Jesse-rm4xo Před rokem +4

      isn't copilot powered by OpenAI codex?

    • @urvog
      @urvog Před rokem

      We need to consider the application of these tools by analysts who may not possess programming skills. This is where their usefulness truly shines

    • @samueltallman7317
      @samueltallman7317 Před rokem

      Chatgpt ≠ GPT4
      If you studied, you’d understand this

    • @samueltallman7317
      @samueltallman7317 Před rokem

      @@rabbitmetricsI’m a little disappointed you couldn’t point out here how chatgpt is a demo implementation of GPT-4 and not the same as openAI apis for it where you set your own temps

  • @rafaeldelrey9239
    @rafaeldelrey9239 Před rokem

    It is an interesting concept and I hope it improves with time. Currently, It just dont work for so many examples. A lot of parsing errors, log chains of retries, plain wrong answers.

  • @tonymusk
    @tonymusk Před rokem +3

    Great video, really informative! I have a question regarding the dataframe - does Open AI have access to the data? I'm curious if a company has data and wants to use this kind of process, does Open AI have access to the data? Or does this process adhere to GDPR regulations?

    • @memesofproduction27
      @memesofproduction27 Před rokem

      A random anecdote: in order to move yourself up on the waitlist for access to bing chat (gpt4), you should set Microsoft as your default for everything, starting with your browser, then with Microsoft Wallpapers. Then the app on your phone etc... what would a pesky GDPR reg do once the ai has has root acces to all machines because its gatekept otherwise?

    • @armaanchawdhary9427
      @armaanchawdhary9427 Před rokem

      Same question. But what if all the data is stored on Azure cloud. In a way, Microsoft has access to all our data.

  • @bwilliams060
    @bwilliams060 Před rokem

    These are really excellent videos thank you. It's just a shame you are not sharing the workbooks. It really helps to learn when you can process and adjust the code as you go!

  • @ramp2011
    @ramp2011 Před rokem +2

    Great video. Will this also work with GPT 3.5 API? Or it needs 4? Thanks

  • @helllton
    @helllton Před rokem

    Great video.

  • @usoppgostoso
    @usoppgostoso Před rokem

    I believe the output parser error is related to the format of the output that it's attempting to parse. Unless you have set up the proper tools to handle some specific formats (like graphs), it might fail.

  • @paaabl0.
    @paaabl0. Před rokem +1

    Thing is, that these are still very basic queries that any human can quickly write a pandas code for. For complex queries it's getting lost. Moreover, both gpt3 and 4 are prone to do basic math mistakes.
    But of course the overall direction is pretty awesome, I'd love an agent to write reliably buch of pandas and sql boilerplate code for me a daily basis.

    • @rabbitmetrics
      @rabbitmetrics  Před rokem +1

      Agree, but I expect the LLMs to improve to the point where it will write accurate queries consistently

  • @cristian15154
    @cristian15154 Před rokem

    That was great!

  • @Mactuarchitect
    @Mactuarchitect Před rokem +1

    If the DataFrame is too long for the chatgpt UI prompt, does that mean by using Langchain you can bypass this limit?

  • @Mrlemar1
    @Mrlemar1 Před rokem +1

    Very interesting. Does giving it a specific file to analyze solve the hallucination problem?

  • @TheMagmarunning
    @TheMagmarunning Před rokem

    Insane!

  • @MogulSuccess
    @MogulSuccess Před rokem +2

    How does an organization share proprietary data with OpenAI and have the LLM do work? We need a middleware obfusticating the data by some distributing normalization such that OpenAI can't reverse engineer the context over time as well as take the secret, top secret data, otherwise none of this is scalable

    • @RutvikPatel2611
      @RutvikPatel2611 Před rokem +1

      It's not, your best bet would be an implementation of local version alphacah llm or something use it ,
      Even then i don't think this is the best approch may be it can take coloumn name and datatype (+metadata) and spits out a formula to which operation is performed on local machine rather openai for both data security and answer integrity also what if the file is extremely large like a parquet file which even gpt 4 can t process in which case something like spark can do the transformation or calculation for us, it be great product tbh

    • @RutvikPatel2611
      @RutvikPatel2611 Před rokem +2

      And yes pricing on data operations on open ai server is definitely not sustainable

    • @mattforsythe5037
      @mattforsythe5037 Před rokem

      A company called Palantir does this

    • @MogulSuccess
      @MogulSuccess Před rokem

      @@mattforsythe5037Palantir created a Data Security Middleware to communicate with external LLMs using NLP apis?? Whoa!

  • @bharadwazsripada5843
    @bharadwazsripada5843 Před rokem

    HI, In this approach is the data being shared with OpenAI? My understanding is we are using pretrained model and creating an agent for the environment.

  • @joseluisbeltramone599

    Thank you for the excellent video. Doing analytics on a dataframe os my own, with 3 thousand columns, I came accross the tokens limit for the model I used (chatgpt 3.5). Is there anyway to overcome it?

  • @pauldriessens715
    @pauldriessens715 Před 10 měsíci

    What are the advantages of using this method over using OpenAI's advanced analytics plugin?

    • @rabbitmetrics
      @rabbitmetrics  Před 8 měsíci +1

      Currently not much. Today I would look into using AutoGen for automating data analysis with OpenAI

  • @kingmouli
    @kingmouli Před 6 měsíci

    there was one catch while using gpt-4, if we pass multiple dataframes it just considering header in the prompt and thinking those are the rows, in all dataframes , could you please do a video on how to pass multi dataframes to gpt-4 pandas data agent?

    • @rabbitmetrics
      @rabbitmetrics  Před 4 měsíci

      I'm exploring different ways to work with Pandas efficiently at the moment, will make a video about this at some point

  • @4p4k
    @4p4k Před rokem

    So does langchain use GPT to type a sql query, queries the database, then outputs the result? Thats pretty impressive.

  • @yookoT
    @yookoT Před rokem +1

    Could you please tell me how much did the GPT4 API cost for this task? I have only used 3.5 before and heard that GPT4 is much more expensive.

    • @Fordtruck4sale
      @Fordtruck4sale Před rokem

      It's like 30X more expensive than the 3.5 turbo model... curious how many tokens these requests soak up!

    • @eddyvu8109
      @eddyvu8109 Před rokem

      $20/month for chatgpt plus

  • @Fordtruck4sale
    @Fordtruck4sale Před rokem +3

    Thanks so much! Do you have a github or colab link for the file?

  • @davidmichaelcomfort
    @davidmichaelcomfort Před rokem

    Looks interesting. One question I have is whether there will be substantial costs for using the OpenAI's models for large data sets?

    • @rabbitmetrics
      @rabbitmetrics  Před rokem

      I’d err on the side of caution when using a service with this pricing model. This wasn’t a problem but using OpenAI embeddings can get pricy if you’re processing large amounts of textual data

  • @Maisonier
    @Maisonier Před rokem

    Amazing! now OpenAI just included in their "code interpreter", there is any way to use Panda Dataframe with a local model, like stablevicuna, redpajama or mpt-7b? thank you. Liked and subscribed.

  • @method341
    @method341 Před rokem

    Will your dataset be uploaded to OpenAI if you do this? If so, how do I keep my dataset private?

  • @user-yg6fr6jy3d
    @user-yg6fr6jy3d Před rokem

    I have a table with hundrands of rows and 20 columns. I even created a smaller table with only the first 5 rows for testing and I still get this annoying error:
    InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 12432 tokens. Please reduce the length of the messages.
    It's impossible for me to work with any csv file like this. What can I do?

  • @ShaharDS
    @ShaharDS Před rokem

    Hey man than's for your video!
    I'm getting an error saying
    AuthenticationError:
    Output is truncated
    do you know how to fix it?

  • @yatinahuja802
    @yatinahuja802 Před rokem

    Can we use this to get answers from a set of questions if we have customer reviews instead of sales data.
    Like we can ask any question related to a product or summary of the reviews to thousands of comments.

    • @rabbitmetrics
      @rabbitmetrics  Před rokem

      Indeed, have a look at this video czcams.com/video/UO699Szp82M/video.html

  • @sodasundae9009
    @sodasundae9009 Před rokem

    Does these order data gets sent to chatgpt? Is there anyway to keep it local? Vicuna?

  • @dimitriosmolfetas4711

    Hey you video is very informative and a great tutorial, i have a question if i use Visual Studio will the code work inside VS as it does in Jupyter. Or i should write the commands in CLI since i have python on PATH? I'm very new to coding and python i hope my question makes sense. Anyway thank you for the great videos!!

    • @devinwalker9202
      @devinwalker9202 Před rokem +1

      VS Code supports Jupiter, so you can run the notebook directly in VSCode. I do it all the time.

    • @dimitriosmolfetas4711
      @dimitriosmolfetas4711 Před rokem +1

      @@devinwalker9202 thanks so much dude I literally found out about that an hour ago and then I saw your comment. I wish you the best thank you.

  • @theguildedcage
    @theguildedcage Před rokem

    Does pinecone or any other service store and have access to your data? This would be important to know for the use of enterprise applications.

    • @rabbitmetrics
      @rabbitmetrics  Před rokem

      Yes, they have access to the embedding vectors and the metadata about each embedding

  • @ronakdinesh
    @ronakdinesh Před rokem

    curious does it also give graphs if you ask it?

  • @ambrosionguema9200
    @ambrosionguema9200 Před rokem

    We have problems with the limit of tokens?

  • @surajkhan5834
    @surajkhan5834 Před rokem

    How can we save the df to pinecone and query them

  • @pulkitkp
    @pulkitkp Před rokem

    can we give multiple datframes as input?

  • @FREELEARNING
    @FREELEARNING Před rokem

    Thanks for the video. Based on my understanding, the openAI GPT is able to do the task solely based on the file name and informative column names, because as you might know these models are constrained by the context length and so they aren't able to parse the whole file and really analyse the data. In my opinion, we aren't yet doing something magical heare. We can get most of the results only using some basic pandas functions like df.describe() or df["Column"].value_counts(). What do you think of this?

    • @startlingbird
      @startlingbird Před rokem

      I think you can combine his video with this one, czcams.com/video/6WE09Ihdn9M/video.html you can get around the plugin waiting list problem.

  • @kentml6856
    @kentml6856 Před rokem

    Great stuff, have you been successful with using sklearn with this methodology?

  • @youwang9156
    @youwang9156 Před rokem

    just wonder why you can use gpt-4 for model name ?

  • @johnwallis1626
    @johnwallis1626 Před rokem

    anyone tried getting the agent to create graphs in say matplotlib? im getting 'OutputParserException: Could not parse LLM output' error. i can do it using exec on python code generated using normal chat completion but not this way. good vid tho.

  • @StephenRayner
    @StephenRayner Před rokem

    You found anything similar but using SQL yet?

  • @marcomaiocchi5808
    @marcomaiocchi5808 Před rokem

    Great video. But no one is going to work with this workflow

  • @prasanthkumar7328
    @prasanthkumar7328 Před 10 měsíci

    i see a error in the last step Must provide an 'engine' or 'deployment_id' parameter to create a

  • @Ramipineappl3
    @Ramipineappl3 Před rokem

    how to do it with nested json instead of CSV?

  • @sanesanyo
    @sanesanyo Před rokem

    There is a new package called panda ai which does effectively the same thing but in fewer lines of code. Under the hood, it is probably doing the same thing.

    • @rabbitmetrics
      @rabbitmetrics  Před rokem

      Nice, thanks. WiIl check it out

    • @rolandheinze7182
      @rolandheinze7182 Před rokem

      @@rabbitmetrics this dude is right, in my opinion this seems to better than what langchain currently offers through pandas_dataframe_agent. the behavior of pandas dataframe agent is very inconsistent, especially when Action: print(pyton_repl_ast(...)) is called (I often get is not a valid tool). I imagine both are doing the same thing with recursive calls to refine the dataframe operations being called and passed to the python repl. I am going to investigate the pandasai documentation as it seems to be much more straightforward and tractable for a non-contributor

  • @noktuz
    @noktuz Před rokem

    Where do I get the file with the code?

  • @SusobhanDas
    @SusobhanDas Před rokem

    is it working with any model other than OpenAI models ?

    • @rabbitmetrics
      @rabbitmetrics  Před rokem

      Yes. Langchain provides wrappers around various models, see python.langchain.com/en/latest/modules/models/llms/integrations.html

  • @doords
    @doords Před rokem

    Can you display the results in html tags

  • @ramp2011
    @ramp2011 Před rokem

    Link to the notebook? Thanks

  • @geekyprogrammer4831
    @geekyprogrammer4831 Před rokem

    can you please post the dataset?

  • @robbieturtle6218
    @robbieturtle6218 Před rokem

    langchain charged me $7 in api calls in 30 minutes of testing because I forgot to specify a stop string :(

  • @xubruce
    @xubruce Před rokem +1

    I got ‘False’ at the very beginning

    • @rabbitmetrics
      @rabbitmetrics  Před rokem +1

      Check if the keys are loaded using os.getenv('API_KEY')

  • @rajivraghu9857
    @rajivraghu9857 Před rokem

    Excemm

  • @vijaysurya6696
    @vijaysurya6696 Před rokem

    ImportError: cannot import name 'create_pandas_dataframe_agent' from 'langchain.agents'

  • @ericbroun4657
    @ericbroun4657 Před rokem