Voltron Data
Voltron Data
  • 40
  • 47 084
Querying Sales Data With Ibis
Patrick provides a quick overview of a small project he worked on to familiarize himself with Ibis. Using mock sales data, he constructs a sales data query from scratch using fStrings and Ibis expressions and then discusses how Ibis can be used to make parameterizing queries much easier.
More information: github.com/p-a-a-a-trick/ibis-sales-query
zhlédnutí: 716

Video

Powering Data-Centric AI with Arrow
zhlédnutí 594Před 2 lety
Speaker: Henry Ehrenberg, Co-Founder at Snorkel AI Snorkel AI recently adopted Arrow to help power Snorkel Flow, their data-centric development platform which helps enterprise data science teams build high quality training datasets and ML models quickly. In this talk, Henry will cover how the Snorkel AI team evolved their data and compute architecture to leverage Arrow. With Arrow under the hoo...
How to Use the New Contributor’s Guide to Start Contributing to Apache Arrow (Part 2 - Demo)
zhlédnutí 246Před 2 lety
Speaker: Alenka Frim, Open Source Apprentice at Voltron Data This is the demo component of another talk, which can be found here: czcams.com/video/a-nhqPhYWGE/video.html Original contribution in GitHub: github.com/apache/arrow/pull/13329 Resources: Apache Arrow: arrow.apache.org/ Project Documentation: arrow.apache.org/docs/ The New Contributor's Guide: arrow.apache.org/docs/developers/guide/in...
All in on Apache Arrow
zhlédnutí 2,3KPřed 2 lety
Speaker: Randy Zwitch, Head of Developer Relations, Streamlit
The Data Thread Conference: Live Broadcast Sessions (Previously Recorded)
zhlédnutí 1KPřed 2 lety
Recorded on June 23, 2022 Agenda: Welcome - Marlene Mhangami, Developer Advocate, Voltron Data Keynote - Apache Arrow co-creators Wes McKinney & Jacques Nadeau High Performance Computing Panel Discussion with moderator Jing Brewer, VP of Product Strategy, Voltron Data Fireside Chat with Peter Wang, CEO, Anaconda & Josh Patterson, CEO, Voltron Data DataStax Featured Talk with Sebastián Estévez Q...
Building the First GPU Visual Graph AI Platform with End to End Apache Arrow
zhlédnutí 461Před 2 lety
Speaker: Leo Meyerovich Founder, Graphistry, Inc.
When Data Engineering Meets Security Analytics
zhlédnutí 204Před 2 lety
Speaker: Matthias Vallentin, CEO and Co-Founder, Tenzir In this talk Matthias presents his group's highly pluggable C engine for security telemetry data that builds on top of Arrow. He show their wins where they can leverage drop-in functionality, as well where they face challenges.
Time Series Data Transformation with Arrow Compute Engine
zhlédnutí 702Před 2 lety
Speaker: Li Jin, Software Developer, Two Sigma
Using Arrow, with Numba KerneIs, to Generate AI Workflows
zhlédnutí 486Před 2 lety
Speaker: John Murray, Director, Fusion Data Science and Visiting Professor, Data Science Lab, University of Liverpool In this session, John demonstrates the use of the Numba Python compiler, to create custom kernel functions, on top of Arrow tables, to generate end to end AI workflows with TensorFlow (training), TensorRT (inference), and RAPIDS cuml (clustering). The talk is based on his group'...
Velox: An Open-Source Unified Execution Engine
zhlédnutí 3,1KPřed 2 lety
Speaker: Pedro Pedreira, Software Engineer at Meta
Everyone Should Use Apache Arrow for Data Systems Research
zhlédnutí 989Před 2 lety
Speaker: Andrew Crotty, Assistant Professor at Northwestern University The conventional wisdom is that it takes about a decade of effort to build a stable, full-featured data analytics system. Unfortunately, this type of systems work does not translate well to academic environments, where resources are more constrained and research progress (e.g., tenure review, funding duration) is typically m...
Why Apache Arrow is Important for Ruby
zhlédnutí 308Před 2 lety
Speaker: Sutou Kouhei, Co-Founder & President at ClearCode It is widely known in the data processing world that Apache Arrow is important. Sutou shares why Apache Arrow is important especially for Ruby community. He also introduces Apache Arrow features that the Ruby community is working on.
Arrow and Substrait: Better Together
zhlédnutí 3KPřed 2 lety
Speaker: Ian Cook, Product Manager at Voltron Data Links from the slides: Substrait project: github.com/substrait-io/substrait DSLs producing Substrait: Python Ibis Substrait compiler: github.com/ibis-project/ibis-substrait R dplyr Substrait compiler: github.com/voltrondata/substrait-r SQL Substrait compiler (Isthmus): github.com/substrait-io/substrait-java/tree/main/isthmus Engines consuming S...
Accelerating Geospatial Computing in R and Python Using Apache Arrow
zhlédnutí 1,6KPřed 2 lety
Speakers: Dewey Dunnington, Senior R Developer at Voltron Data and Joris Van den Bossche, Software Engineer at Voltron Data The Apache Arrow and Apache Parquet ecosystems provide a flexible and efficient in-memory and on-disk format for tabular data. With implementations in most languages, Apache Arrow supports a growing set of tools and analytical workflows. In this talk, Dewey and Joris intro...
Ibis and Substrait: Standardized Analytics
zhlédnutí 769Před 2 lety
Speaker: Hussain Sultan, Field Engineering Director at Voltron Data The code used in this demonstration can be found at gist.github.com/gforsyth/496d680e1e29f0876df937ee5091e1b8
Apache Arrow on the Web and Beyond
zhlédnutí 1,3KPřed 2 lety
Apache Arrow on the Web and Beyond
Mainlining Databases : Supporting Fast Transactional Workloads on Apache Arrow
zhlédnutí 242Před 2 lety
Mainlining Databases : Supporting Fast Transactional Workloads on Apache Arrow
An Introduction to Arrow for Python Programmers
zhlédnutí 3,9KPřed 2 lety
An Introduction to Arrow for Python Programmers
Put Your Cassandra Python Driver On Steroids With Apache Arrow
zhlédnutí 348Před 2 lety
Put Your Cassandra Python Driver On Steroids With Apache Arrow
How to Use the New Contributor’s Guide to Start Contributing to Apache Arrow (Part 1)
zhlédnutí 216Před 2 lety
How to Use the New Contributor’s Guide to Start Contributing to Apache Arrow (Part 1)
Navigating the San Francisco Art Scene with Ibis
zhlédnutí 370Před 2 lety
Navigating the San Francisco Art Scene with Ibis
A New Hope For The Big Data Divergence
zhlédnutí 337Před 2 lety
A New Hope For The Big Data Divergence
Microkernel Notebooks
zhlédnutí 174Před 2 lety
Microkernel Notebooks
Maximizing the Performance of DNA Analysis Using Apache Arrow
zhlédnutí 173Před 2 lety
Maximizing the Performance of DNA Analysis Using Apache Arrow
GraphQL and Apache Arrow: A Match Made in Data
zhlédnutí 933Před 2 lety
GraphQL and Apache Arrow: A Match Made in Data
A Developers' Journey Using Arrow with Tableau
zhlédnutí 226Před 2 lety
A Developers' Journey Using Arrow with Tableau
Apache Arrow and DataFusion: Changing the Game for Implementing Database Systems
zhlédnutí 2,3KPřed 2 lety
Apache Arrow and DataFusion: Changing the Game for Implementing Database Systems
Torch Arrow Performant ML Preprocessing
zhlédnutí 998Před 2 lety
Torch Arrow Performant ML Preprocessing
PyFroid: Scaling Data Preparation Using Database
zhlédnutí 290Před 2 lety
PyFroid: Scaling Data Preparation Using Database
What Is Ibis + Simple Demo
zhlédnutí 2,7KPřed 2 lety
What Is Ibis Simple Demo

Komentáře

  • @multitaskprueba1
    @multitaskprueba1 Před 2 měsíci

    You are a genius! Fantastic video! Thanks!

  • @gloriamacia1120
    @gloriamacia1120 Před 2 měsíci

    amazing video!!

  • @pablomoretto8443
    @pablomoretto8443 Před 3 měsíci

    great explanation, thank you 👍🙏

  • @tamararodrigues3471
    @tamararodrigues3471 Před 4 měsíci

    Greaaaat video, thanks!!

  • @vikramsinhshinde8789
    @vikramsinhshinde8789 Před 4 měsíci

    This looks really promising!

  • @nndegwa1
    @nndegwa1 Před 4 měsíci

    Love it!

  • @pookiepats
    @pookiepats Před 7 měsíci

    What is so funny?

  • @xkarika
    @xkarika Před 7 měsíci

    Hi Matt - I think what you've done here is absolutely brilliant!!!! I think you solved the one major limitation of GraphQL that's been preventing it from taking over the data world. Have you thought about open-sourcing this?

  • @carvalhoribeiro
    @carvalhoribeiro Před 8 měsíci

    Awesome presentation. Thanks for sharing this

  • @javierparra3234
    @javierparra3234 Před 8 měsíci

    This was really helpful, quite a full and easy to understand introduction, thanks!

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 Před 9 měsíci

    Tom, I am not clear what specific data (from nflverse) you choose to work with. In your code what exactly is "data" ??

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 Před 9 měsíci

    A brief discussion of what the parquet file format is and what the advantages are over regular flat files.

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 Před 9 měsíci

    An excellent case for Arrow and DuckDB

  • @AshishSharma-pm1dc
    @AshishSharma-pm1dc Před 10 měsíci

    Thank you for the session. Is there a detailed documentation for JS support. The apache arrow site points to a blank page

  • @tarasst6887
    @tarasst6887 Před 10 měsíci

    🎉🎉🎉😊

  • @zhitaoli4702
    @zhitaoli4702 Před 11 měsíci

    Hi, is it possible to share link to the slides used the presentation? Thanks

  • @user-vp7wp7dt7m
    @user-vp7wp7dt7m Před 11 měsíci

    Great video - thanks Tom!

  • @arturocdb
    @arturocdb Před rokem

    Incredible useful thank you so much!…

  • @matattz
    @matattz Před rokem

    So to summarize it, using SQL is getting old very fast! So would you suggest that a beginner should learn the very basics in SQL and focus more on ibis or now ponder for example? I don’t see why you would need SQL on a very high level when the playground is changing so rapidly nowadays. I get that many users with 7+ years of SQL knowledge are very frustrated that you basically could just use something like ibis but it is what it is

  • @pparsons12
    @pparsons12 Před rokem

    Thank you! I enjoyed your presentation and learned a few important “missing pieces” in my understanding of how these tools can work together.

  • @tomanizer
    @tomanizer Před rokem

    Great talk and great initiative. Could you point out how and where to find out when the arrow timeseries compute functions come online?

  • @jorgenengmann4856
    @jorgenengmann4856 Před rokem

    super! thanks for this very useful tutorial.

  • @kamicheung4021
    @kamicheung4021 Před rokem

    great video, thank you for such a detailed comparison

  • @umitekmekci503
    @umitekmekci503 Před rokem

    I don't know much about arrow but if it is lazy the first timing can be wrong because arrow may not do the calculation until you need to use the output

  • @coolsameer9661
    @coolsameer9661 Před rokem

    Huge thanks for this talk! And for uploading it :)

  • @user-gg5fc6yg9f
    @user-gg5fc6yg9f Před rokem

    Thank you Danielle Navarro !

  • @dasrotrad
    @dasrotrad Před 2 lety

    Super tutorial Danielle. Thank you.

  • @ibananti
    @ibananti Před 2 lety

    Very insightful talk, thank you!

  • @robinkohrs8097
    @robinkohrs8097 Před 2 lety

    That looks fantastic! But what if I do not have my date as cleanly organzied in many "smaller" files, but rather one giant csv. Does arrow still have benefits?:)

  • @jayjeetchakraborty9542

    Can we replace Arrow Compute Engine (Acero) with Velox ?

    • @iancook1361
      @iancook1361 Před 2 lety

      The goal of the Velox and Acero open source development teams is to enable users to choose whichever execution engine offers the functionality and performance they need. Our shared vision is that these two engines (and other tooling around them, including user APIs and storage systems) should be highly modular, so that users can pick and choose components and have them all interoperate. We aim to achieve this interoperability through the Arrow and Substrait standards. There is work underway now to achieve this, by building more Arrow and Substrait integrations and by evolving the Arrow and Substrait standards to accommodate more needs.

    • @haishais5867
      @haishais5867 Před rokem

      @@iancook1361 where to find those slides. Thanks

  • @vishcanaran5139
    @vishcanaran5139 Před 2 lety

    Thanks Tianyu, we are focused on a similar scaled down approach regarding transactional Arrow for production workloads and your talk and reference papers put it all together. Thank you again.

  • @twainyoung
    @twainyoung Před 2 lety

    thanks for sharing this video.😁

  • @koutousu
    @koutousu Před 2 lety

    If you have any questions, please ask me on Twitter! twitter.com/ktou (English is OK!)