"Apache Arrow and the Future of Data Frames" with Wes McKinney

Sdílet
Vložit
  • čas přidán 16. 07. 2024
  • Title: Apache Arrow and the Future of Data Frames
    Speaker: Wes McKinney, Director, Ursa Labs
    Date: July 8, 2020
    ABSTRACT
    In this talk I will discuss the background and motivation for the Apache Arrow project, which contains a columnar in-memory data standard and an expanding set of supporting libraries for a variety of programming languages. We will look at the relationship between data frame libraries and database systems and explore the ways in which analytics systems are likely to evolve to be more "Arrow-native" over the coming years.
    SPEAKER
    Wes McKinney
    Director, Ursa Labs
    Wes McKinney is an open source software developer focusing on analytical computing. He created the Python pandas project and is a co-creator of Apache Arrow, his current focus. He authored two editions of the reference book Python for Data Analysis. Wes is a Member of The Apache Software Foundation and also a PMC member for Apache Parquet. He is the director of Ursa Labs, a not-for-profit development group focused on data science tools for Python and R powered by Apache Arrow, built in partnership with RStudio. Previously, he worked for Two Sigma, Cloudera, and AQR Capital Management, and he was co-founder and CEO of the startup DataPad.
    MODERATOR
    Larisa Sawyer
    Two Sigma Investments; ACM Practitioner Board
    Larisa Sawyer is a software engineering manager and Vice President at Two Sigma Investments. Her educational background is in Computer Science and Applied Mathematics. The opportunity to blendmath and CS drew her to the realm of finance. Her career began at investment banks, building algorithmic trading platforms. Larisa has been at Two Sigma for the past seven years, and has worked on distributed time series analysis and platform technologies to increase research productivity and collaboration. Larisa also serves on the ACM Practitioner Board, as well as the advisory board for Data Clinic, Two Sigma’s data and tech for good program that leverages employees’ data science skills and technological know-how to support charities and non-profits.
  • Věda a technologie

Komentáře • 9

  • @danielcomptonnz
    @danielcomptonnz Před 2 lety +6

    Talk begins at 5:22.

  • @BillyClever
    @BillyClever Před 2 lety +2

    Thanks Wes McKinney for your effort to build Pandas!!! you are a Hero!

  • @rodolforochaurrutia4127

    Thanks a lot master Wes !!!

  • @mikesopko7374
    @mikesopko7374 Před 2 lety

    Hey quick question: so with arrow and pyarrow, does this pretty much mean that as long as the HDD (disk) has room for a dataset, it will not have an issue with RAM? IE in current Python/Pandas say I have a 20GB file and only 16GB RAM.. I think it is recommended to have 40GB RAM to load in memory (without something like pyspark).
    But once Arrow is widely adopted and part of Pandas (or especially if Arrow is used directly), this means that pretty much if you have the disk (HDD) space, you should be fine here right?

  • @higheringai68
    @higheringai68 Před 2 lety

    The arrow in the FedEx logo comes to one's mind...

  • @haythamal-dokanji9547
    @haythamal-dokanji9547 Před 3 lety +18

    Feedback: never start a video with a promotional ramble. People click away.