In-Process Analytical Data Management with DuckDB - posit::conf(2023)

Sdílet
Vložit
  • čas přidán 14. 12. 2023
  • Presented by Hannes Mühleisen
    This talks introduces DuckDB, an in-process analytical data management system that is deeply integrated into the R ecosystem.
    DuckDB is an in-process analytical data management system. DuckDB supports complex SQL queries, has no external dependencies, and is deeply integrated into the R ecosystem. For example, DuckDB can run SQL queries directly on R data frames without any data transfer. DuckDB uses state-of-the-art query processing techniques like vectorised execution and automatic parallelism. DuckDB is out-of-core capable, meaning that it is possible to process datasets far bigger than main memory. DuckDB is free and open source software under the MIT license.

    In this talk, we will describe the user values of DuckDB, and how it can be used to improve their day-to-day lives through automatic parallelisation, efficient operators, and out-of-core operations.
    Materials:
    - duckdb.org
    - duckdb.org/docs/api/r.html
    - github.com/duckdb/duckdb-r
    Presented at Posit Conference, between Sept 19-20 2023,
    Learn more at posit.co/conference.
    --------------------------
    Talk Track: Databases for data science with duckdb and dbt.
    Session Code: TALK-1099

Komentáře • 6

  • @caty863
    @caty863 Před 6 měsíci +1

    I find it a bit dishonest that you intentianally chose to use *RPostgreSQL* in your demo instead of the faster *RPostgres.*
    In addition, what's wrong with plain old SQLite? You should have elaborated a little deeper on what exact problem DuckDB is solving here.

    • @HarmonicaTool
      @HarmonicaTool Před 6 měsíci +1

      There are many talks on DuckDB. As much as I could take away: They claim to be faster than SQLite for typical analytical purposes because they store data in columns, not in rows. It is optimized for different purposes. It reads SQLite data directly.

    • @ravishmahajan9314
      @ravishmahajan9314 Před 6 měsíci +1

      ​@@HarmonicaTool Its basically called Analytical SQLite. So SQLite is a lightweight database of applications from an OLTP perspective.
      Whereas DuckDB is for OLAP perspective. Means if you have lots and lots of application data and you want to analyze it in seconds, you can do it in duckdb as it is in memory analytics database.

    • @chalimsupa6603
      @chalimsupa6603 Před 29 dny

      duckdb is designed for analytical workflows compared to sqlite which is a transactional db.

    • @chuckbecker4983
      @chuckbecker4983 Před 22 dny +1

      Accusing someone of dishonesty should be the last resort, used only after every other reasonable explanation has been identified and investigated. Going to dishonesty first exposes as much about the accuser as the accused.

    • @chalimsupa6603
      @chalimsupa6603 Před 22 dny

      @@chuckbecker4983 you are right.... as you said, it reveals more about the accuser. I think he could not find a better word to express his point. "Dishonest" is a very strong word and is not suitable in this context