DuckDB: Supercharging Your Data Crunching by Richard Wesley

Dask DataFrame is Fast Now

DuckDBT: Not a database or a dbt adapter but a secret third thing - DuckCon #3 (San Francisco)

Attack a Terezka jdou na rande… FOTBALOVÝ ZÁPAS

МАИНКРАФТ В РЕАЛЬНОЙ ЖИЗНИ!🌍

He bought this so I can drive too🥹😭 #tiktok #elsarca

Spark, Dask, DuckDB, Polars: TPC-H Benchmarks at Scale

Coiled

zhlédnutí 7 073

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 12. 09. 2024

Komentáře • 16

@randywilliams7696 Před 8 měsíci ⁺³
Great video! Recently switched from Dask to Duckdb on my ~1TB workloads, interesting to see some of the same issues I found brought up here. One gotcha I've found is that it is REALLY easy to blunder your way into making non-performant queries in dask (things that end up shuffling, partitioning, etc. a lot behind the scenes). It was more straightforward for my use case to write performant SQL queries for duckdb since that is much more of a common, solved problem. The scale-out feature of Dask and Spark is interesting too, as we are considering the merits of a natively clustered solution vs just breaking up our queries into chunks that can fit on multiple single instances for duckdb.
@MatthewRocklin Před 8 měsíci ⁺¹
Yup. Totally agreed. The query optimization in Dask Dataframe should handle what you ran into historically. The problem wasn't unique to you :)
@ravishmahajan9314 Před 7 měsíci
But what about distributed databases. Is DuckDB able to query distributed databases?
Is this technology replacing spark framework??
@andrewm4894 Před 10 měsíci ⁺²
Great talk, thanks
@FabioRBelotto Před 2 měsíci ⁺¹
My main issue with dask is the lack of support of the community (very different from pandas!)
@rjv Před 8 měsíci
Such a good video! So many good insights clearly communicated with proper data. Also love the interfaces you've built, very meaningful, clean and minimalistic.
Have you got comparison benchmarks where cloud cost is the only constraint and the number of machines or their size and type (GPU machines with cudf) is not restricted?
@mooncop Před 10 měsíci
you are most welcome (suffered well)
worth it for the duck
@richerite Před 2 měsíci
Great talk! What would you recommend for ingesting about 100-200GB of geospatial data on premise?
@o0o0oo00oo00 Před 10 měsíci ⁺²
I don’t see duckdb and polars kick spark dask ass on 10gb level in my practical usage.😅 we can’t always trust TPC-H benchmarks.
@taylorpaskett3703 Před 8 měsíci
What software did you use for generating / displaying your plots? It looked really nice
@taylorpaskett3703 Před 8 měsíci ⁺¹
Nevermind, if I just kept watching you showed the GitHub where it says ibis and altair. Thanks!
@ravishmahajan9314 Před 7 měsíci
But DuckDB is good if your data fits one single machine. But the benchmarks shows different story when data is distributed. What about that?
@kokizzu Před 6 měsíci
Clickhouse ftw
@bbbbbbao Před 10 měsíci
It's not clear to me if you can use autoscaling with coiled.
@Coiled Před 10 měsíci ⁺²
You can use autoscaling with Coiled. See the `coiled.Cluster.adapt` method.
@maksimhajiyev7857 Před 5 měsíci
The problem is that in fact RUST based tooling actually wins and all the paid promotions just suck . The actual reason why RUST based tooling is sort of suppressed is very simple , hyperscalers (big cloud tech) earn a lot of money and if things are faster there is no huge bills for your spark clusters 😊)) , I was playing with RUST and huge datasets myself without external benchmarks course I don t trust all this market shit .Rust based EDA is maybe witch kraft but this thing runs as beast . try yourself guys with a huge datasets .

Další v pořadí

Automatické přehrávání

DuckDB: Supercharging Your Data Crunching by Richard Wesley

DuckDB: Supercharging Your Data Crunching by Richard Wesley

Dask DataFrame is Fast Now

Dask DataFrame is Fast Now

DuckDBT: Not a database or a dbt adapter but a secret third thing - DuckCon #3 (San Francisco)

DuckDBT: Not a database or a dbt adapter but a secret third thing – DuckCon #3 (San Francisco)

Attack a Terezka jdou na rande… FOTBALOVÝ ZÁPAS

Attack a Terezka jdou na rande… FOTBALOVÝ ZÁPAS

МАИНКРАФТ В РЕАЛЬНОЙ ЖИЗНИ!🌍

МАИНКРАФТ В РЕАЛЬНОЙ ЖИЗНИ!🌍

He bought this so I can drive too🥹😭 #tiktok #elsarca

He bought this so I can drive too🥹😭 #tiktok #elsarca

Apple peeling hack @scottsreality

Apple peeling hack @scottsreality

Coiled Overview

Coiled Overview

Dask - A Faster Alternative to Pandas: Performance Comparison and Analysis

Dask - A Faster Alternative to Pandas: Performance Comparison and Analysis

Why should you care about DuckDB? ft. Mihai Bojin

Why should you care about DuckDB? ft. Mihai Bojin

Thomas Bierhance: Polars - make the switch to lightning-fast dataframes

Thomas Bierhance: Polars - make the switch to lightning-fast dataframes

Big Data is Dead | MotherDuck

Big Data is Dead | MotherDuck

Gábor Szárnyas - DuckDB: The Power of a Data Warehouse in your Python Process

Gábor Szárnyas - DuckDB: The Power of a Data Warehouse in your Python Process

Dask DataFrames Tutorial: Best practices for larger-than-memory dataframes

Dask DataFrames Tutorial: Best practices for larger-than-memory dataframes

DuckDB & Python | End-To-End Data Engineering Project (1/3)

DuckDB & Python | End-To-End Data Engineering Project (1/3)

Polars: The Next Big Python Data Science Library... written in RUST?

Polars: The Next Big Python Data Science Library... written in RUST?

Jak udělat VACUUM 🔥🔥

Jak udělat VACUUM 🔥🔥

Komu Přeteče Sklenička, Dostane Šlehačku do Obličeje!

Komu Přeteče Sklenička, Dostane Šlehačku do Obličeje!

PÁRTY VE 20 vs VE 30 LETECH 😅😂

PÁRTY VE 20 vs VE 30 LETECH 😅😂

Wait for it… 😱 #shorts

Wait for it… 😱 #shorts

大家都拉出了什么#小丑 #shorts

大家都拉出了什么#小丑 #shorts

Little Yamal is so CUTE 😍 #fcbarcelona #shorts #lamineyamal

Little Yamal is so CUTE 😍 #fcbarcelona #shorts #lamineyamal

Předělal Jsem Mojí Postel Na Aquárium!

Předělal Jsem Mojí Postel Na Aquárium!