Doing More with Data: An Introduction to Arrow for R Users

Sdílet
Vložit
  • čas přidán 7. 07. 2024
  • Speaker: Danielle Navarro, Developer Advocate at Voltron Data
    As datasets become larger and more complex, the boundaries between data engineering and data science are becoming blurred. Data analysis pipelines with larger-than-memory data are becoming commonplace, creating a gap that needs to be bridged: between engineering tools designed to work with very large datasets on the one hand, and data science tools that provide the analysis capabilities used in data workflows on the other.
    One way to build this bridge is with Apache Arrow, a multi-language toolbox for working with larger-than-memory tabular data. Arrow is designed to improve performance and efficiency, and places emphasis on standardization and interoperability among workflow components, programming languages, and systems.
    This talk gives an introduction to the Arrow package in R, a mature interface to Apache Arrow, that provides an appealing solution for data scientists working with large data in R. It introduces the core concepts behind Apache Arrow and the Arrow package in R, provides a walkthrough of a sample data analysis using a large tabular data set (containing about 1.7 billion rows), and highlights possible pain points for an R user new to the Arrow ecosystem.

Komentáře • 8

  • @tamararodrigues3471
    @tamararodrigues3471 Před 3 měsíci

    Greaaaat video, thanks!!

  • @nndegwa1
    @nndegwa1 Před 4 měsíci

    Love it!

  • @user-gg5fc6yg9f
    @user-gg5fc6yg9f Před rokem

    Thank you Danielle Navarro !

  • @dasrotrad
    @dasrotrad Před rokem

    Super tutorial Danielle. Thank you.

  • @jorgenengmann4856
    @jorgenengmann4856 Před rokem

    super! thanks for this very useful tutorial.

  • @arturocdb
    @arturocdb Před 11 měsíci

    Incredible useful thank you so much!…

  • @robinkohrs8097
    @robinkohrs8097 Před 2 lety

    That looks fantastic! But what if I do not have my date as cleanly organzied in many "smaller" files, but rather one giant csv. Does arrow still have benefits?:)

  • @tarasst6887
    @tarasst6887 Před 10 měsíci

    🎉🎉🎉😊