Pandas DataFrame Agent... the future of data analysis?

Sdílet
Vložit
  • čas přidán 21. 06. 2023
  • 👉🏻 Kick-start your freelance career in data: www.datalumina.io/data-freela...
    Let's dive into the Pandas DataFrame Agent from the LangChain library to see how we can integrate analytical capabilities into LLM apps. We use the OpenAI API to ask questions about an Excel/CSV dataset and experiment with the possibilities and limitations of this LangChain Toolkit.
    🔗 Links
    github.com/daveebbelaar/langc...
    ⚙️ Copy my VS Code Setup • How to Set up VS Code ...
    👋🏻 About Me
    Hey there, my name is @daveebbelaar and I work as a freelance data scientist and run a company called Datalumina. You've stumbled upon my CZcams channel, where I give away all my secrets when it comes to working with data. I'm not here to sell you any data course - everything you need is right here on CZcams. Making videos is my passion, and I've been doing it for 18 years.
    While I don't sell any data courses, I do offer a coaching program for data professionals looking to start their own freelance business. If that sounds like you, head over to www.datalumina.io/ to learn more about working with me and kick-starting your freelance career.
  • Věda a technologie

Komentáře • 45

  • @daveebbelaar
    @daveebbelaar  Před rokem +3

    👉🏻Kick-start your freelance career in data: www.datalumina.io/data-freelancer
    👉🏻Learn more about data science and AI: www.datalumina.io/newsletter

    • @igoweiqibaduk8283
      @igoweiqibaduk8283 Před rokem

      Hi Dave, could not find your email. The tool of booking a call in /data-freelancer page step 2 after video is not working, just wrote that July is unavailable, but month switch does not work. Regards, George.

    • @daveebbelaar
      @daveebbelaar  Před rokem +1

      @@igoweiqibaduk8283 Hey George, thanks for your message. It is correct that the calendar is fully booked right now. I am expecting to take some more calls in 2-3 weeks. You are welcome to subscribe to our newsletter to stay updated on availability.

    • @rajatkumarsinha2159
      @rajatkumarsinha2159 Před 9 měsíci

      Hi Dave,
      Can you guide how to give my CSV file in Dolly 2.0 with langchain to have a question answer like above?

  • @RanaGustico
    @RanaGustico Před rokem +2

    About the calculations: Have you tried the prompt:
    - "Act as an expert matematician. . Explain this step by step (that last words are sometimes is required) "
    I've read about this workaround to make AI self correct before responses. Happy to watch you update and review with the new stuff. Nice content sir!

  • @camilocampos5900
    @camilocampos5900 Před rokem +3

    Every day I am more impressed by the llm potential with langchain, I am a fan of knowledge thank you for your content

    • @wongyithong9838
      @wongyithong9838 Před 11 měsíci

      Exactly the same feeling, every time I see the title of these videos, wondering what apps I can build to solve real world problem.

  • @irvinJoelBanta
    @irvinJoelBanta Před rokem

    Love your videos, keep it up

  • @joseluisbeltramone599
    @joseluisbeltramone599 Před 10 měsíci +2

    Hi Dave: Thank you very much for the excellent explanation. Now, would you please do a video where you meet with the tokens limitation of the LLM? I would like to see how to overcome this. Thanks in advance!

  • @shikharvarshney7010
    @shikharvarshney7010 Před rokem

    Awesome Explanation !!

  • @DK-dp3kk
    @DK-dp3kk Před 6 měsíci

    Thank you. Nice video. Do you know if you can summarize text within a cell in the data frame? If you have a dataset that includes blog posts and you want a new column that has a 2 line summary. Ideas?

  • @streetcodenate
    @streetcodenate Před 6 měsíci

    Perfect, my dawg!

  • @micbab-vg2mu
    @micbab-vg2mu Před rokem

    Great - Thank you

  • @AwB
    @AwB Před rokem +1

    Great video. The 2 dataframes part was interesting. I was hoping I can pass in a summary dataframe and a raw dataframe, tell the LLM what is in each dataframe, and then ask it to write an article using both dataframes. "Write an article in this months results (which are in the summary dataframe), and also don't forget too mention some interesting related facts from the raw dataframe. This would require it to join the dataframes together.
    Do you think this is possible yet? I see lots of chatGPT with your database but I'm curious how it can work with multiple tables of data.

  • @kumargaurav2170
    @kumargaurav2170 Před 11 měsíci

    I think using memory component from Langchain will help overcoming bottleneck of memory management for operations requiring more than 1 step.

  • @MikeRhodesIdeas
    @MikeRhodesIdeas Před 6 měsíci +1

    @daveebbelaar any plans to update this for langchain 0.1.0 ?? Maybe in the members' area??

  • @tommyharlim276
    @tommyharlim276 Před 9 měsíci

    how do i put this sort of application to a website so that i can upload my own data on the website and enter a prompt and have it displayed on the website ?

  • @prateekkeshari
    @prateekkeshari Před rokem

    It's interesting to play with it - have tried it out multiple times - but i do see limitations of it. Someitmes it also outputs wrong answers. What (in your opinion) would it take for it to be production ready?

  • @onangarodney7746
    @onangarodney7746 Před rokem

    Would it be more accurate if you added the Wolfram OpenAi plugin to the mix?

  • @nerding_io
    @nerding_io Před rokem

    Very awesome!

  • @Canna_Science_and_Technology

    Just an idea, a video using the new function feature would be great. ;-)

  • @JT-Works
    @JT-Works Před 10 měsíci +1

    I am building a Streamlit app with the Panda Dataframe Agent, and for the life of me, I cannot get the chatbot to have any memory context in chat. Is there a tutorial where you cover this?

  • @xanderklein3356
    @xanderklein3356 Před 11 měsíci

    Awesome video. Can you do this with Node js?

  • @quickandsmart6298
    @quickandsmart6298 Před rokem

    I've actually looked at this dataset before and one thing I noticed was that the agent actually made another error at 11:30. It found the median salary using the salary column and not the salary_in_usd column so for example the Head of Machine Learning role only had a single person who lived in india, so when converting 6,000,000 indian rupees it only ends up being 76k USD, far from what the results show. While the agent is very powerful, clearly it's not perfect and you have to make sure the questions provided are specific enough and double check the actual code it provides. Regardless, great video and it's definitely a tool I'll look to be using in later projects!

    • @daveebbelaar
      @daveebbelaar  Před rokem

      Ahh, good one! And thanks! Definitely something I missed

  • @waddaa
    @waddaa Před 11 měsíci

    I have been looking for a chain or agent that can work with tools and your own files as well but I couldn't find. Is this even possible?

  • @HazemAzim
    @HazemAzim Před 9 měsíci

    nice but did you try that with chat models ChatOpenai and use gpt-turbo-3.5 which is much cheaper ? I think the pandasDatframe agent will not work properly though !

  • @user-ib8qm8eh3q
    @user-ib8qm8eh3q Před 8 měsíci +1

    Hi Dave, pls can I use an open source model for this instead of Open ai?

  • @gamerwager5317
    @gamerwager5317 Před rokem

    My suggestion as a CZcams make the video smaller ur voice is great for background track but add more info into the video , which add value to views time .😊

  • @nanto88
    @nanto88 Před rokem

    awesome

  • @RyanScottForReal
    @RyanScottForReal Před rokem +1

    You need to apply memory agent

  • @madhu1987ful
    @madhu1987ful Před 2 měsíci

    Can this work on big data frames? Say 1 million rows of Data ?

  • @johnbrisbin3626
    @johnbrisbin3626 Před rokem

    I note that again you use text-davinci which openai claims is just a slower and more expensive way of getting what got 3.5 gives you for a fraction of the price.
    Have you found differently in real use?

    • @daveebbelaar
      @daveebbelaar  Před rokem

      You're right, for real use-case I would use gpt-3.5 or 4. These are a little different to configure because they are chat-based models, but it would indeed be the preferred option.

  • @justinchung982
    @justinchung982 Před 8 měsíci

    Please show doing this with Llama2!

  • @SMCGPRA
    @SMCGPRA Před 3 měsíci

    Can we use opensource LLM

    • @girishnaik6433
      @girishnaik6433 Před 2 měsíci

      did you get the answer? I'd really like to know it

  • @alchemication
    @alchemication Před rokem

    Thanks for sharing. The reason this can fail in real
    World is that biz is way more complex and a ton of jargon is used. After spending 100s of hours on this topic I can conclude it’s a good start but for real world scenarios on complex data, we need to be way more creative. Best!

  • @temp911Luke
    @temp911Luke Před měsícem

    Would be more interested if you could use the REAL open AI models (open source models) instead of gpt4 .

  • @ajaypranav1390
    @ajaypranav1390 Před 3 měsíci

    why not use PandasAI

  • @klammer75
    @klammer75 Před rokem +1

    Does the pandas agent take a memory parameter? Really like these agents when they can hold a little chat history….I had issues getting their csv agent to hold onto the current convo as it wouldn’t take a ‘working memory’ parameter like some of the other agents would….great video🥳🦾🤓