123
342 653

57:20

Dask Demo Day - 2024-01-18

45:19

Dask Demo Day - 2023.10.19

47:16

Dask Demo Day - 2023-09-21

37:19

Dask Demo Day - 2023-08-17

28:45

How to Install Dask

5:39

Dask Demo Day 2024-03-21

Today's Talks:
00:00 Intro
00:38 Dask DataFrame is Fast - @fjetter
14:15 Large scale population of vector databases for RAG - @mrocklin
26:36 Easy GPU access with Coiled - @jrbourbeau
Next Demo Day is April 18th, sign up here: github.com/dask/community/issues/307
---
What is Dask Demo Day?
Each month we solicit 5-10 minute demos that show off ongoing and/or lesser-known work. Meetings will be recorded and advertised on social. Hopefully, this helps educate folks on some of the great work people do.
If you're interested, please reply to this issue with a brief (a couple sentences) description. If you have colleagues who you think should be interested please let them know. If you would like to present but not this month, check out the dates and signup for an upcoming one:
coiled.io/dask-demo-days
----
What is Dask?
Dask is a free and open-source library for parallel computing in Python. Dask is a community project maintained by developers and organizations.
Share your feedback on this video in the comments and let us know:
- Did you find this video helpful?
- Have you used Dask before?
Learn more at dask.org

zhlédnutí: 740

Video

57:20

Dask Demo Day - 2024-02-15

zhlédnutí 578Před 4 měsíci

Today's Talks: 00:00 Intro 01:18 One trillion row challenge - @mrocklin 06:20 Deploying Dask on Databricks - @jacobtomlinson 15:09 Deploying Prefect workflows on the cloud with Coiled - @jrbourbeau 29:22 Scaling embedding pipelines (LlamaIndex Dask) - @quasiben 46:45 Using AWS Cost Explorer to see the cost of public IPv4 addresses - @ntabris Next Demo Day is March 21st, sign up here: github.com...

45:19

Dask Demo Day - 2024-01-18

zhlédnutí 646Před 5 měsíci

Today's Talks: 00:00 Intro 00:47 Apache Beam DaskRunner - @cisaacstern 15:45 Array expressions - @mrocklin 26:27 One billion row challenge - @scharlottej13 What is Dask Demo Day? Each month we solicit 5-10 minute demos that show off ongoing and/or lesser-known work. Meetings will be recorded and advertised on social. Hopefully, this helps educate folks on some of the great work people do. If yo...

47:16

Dask Demo Day - 2023.10.19

zhlédnutí 777Před 8 měsíci

October 19th, 2023 Today's Talks: 00:00 Intro 00:31 @jacobtomlinson - "Who uses RAPIDS?" 10:51 @mrocklin - TPC-H benchmarks for Spark, Dask, Polars, DuckDB 24:27 @jhamman Dask - Arraylake integration 37:24 @mrchtr - Fondant We'd like to solicit 5-10 minute demos that show off ongoing or lesser-known work. I hope to have 3-5 of these during the meeting. Meetings will be recorded and advertised o...

37:19

Dask Demo Day - 2023-09-21

zhlédnutí 394Před 9 měsíci

Today's Talks 00:00 Intro 00:21 @fjetter - Performance with P2P array rechunking 14:04 @phofl - Dask expressions 27:07 @sjcharlotte13 @dcherian - Processing a quarter petabyte geospatial dataset in the cloud We'd like to solicit 5-10 minute demos that show off ongoing or lesser-known work. I hope to have 3-5 of these during the meeting. Meetings will be recorded and advertised on social. Hopefu...

28:45

Dask Demo Day - 2023-08-17

zhlédnutí 355Před 10 měsíci

Last Dask Demo Day of the summer! Todays Talks: @fjetter - Memray Integration for Memory Management @mrocklin - Some new updates and news @ jrbourbeau - Analyzing Sea Levels in the Cloud with Earthaccess and Coiled We'd like to solicit 5-10 minute demos that show off ongoing or lesser-known work. I hope to have 3-5 of these during the meeting. Meetings will be recorded and advertised on social....

5:39

How to Install Dask

zhlédnutí 856Před 10 měsíci

Learn how to install Dask and the Dask JupyterLab extension with either conda or pip. This video goes through how to set up with a clean working environment with Dask 00:00 Introduction 00:51 Pip install Dask 02:21 Create LocalCluster 03:27 Use Dashboard in JupyterLab

39:18

Dask Demo Day - 2023-07-20

zhlédnutí 549Před 11 měsíci

Today's talks @hendrikmakait - Shuffle resilience @Matt711 - Dask-Kubernetes update @GueroudjiAmal - External tasks in Dask distributed (github.com/GueroudjiAmal/distributed) @skrawcz Dask - Hamilton integration We'd like to solicit 5-10 minute demos that show off ongoing or lesser-known work. I hope to have 3-5 of these during the meeting. Meetings will be recorded and advertised on social. Ho...

48:59

Dask Demo Day - 2023-06-15

zhlédnutí 517Před rokem

Today's Talks dask-geopandas demo by @martinfleis Fine performance dask metrics and spans @crusaderky (10-15 min) Gil monitoring on dask @milesgranger We'd like to solicit 5-10 minute demos that show off ongoing or lesser-known work. I hope to have 3-5 of these during the meeting. Meetings will be recorded and advertised on social. Hopefully, this helps educate folks on some of the great work p...

45:26

Dask Demo Day 2023-05-18

zhlédnutí 443Před rokem

These are 5-10 minute demos that show off ongoing or lesser-known work. We hope to have 3-5 of these during the meeting. Meetings will be recorded and advertised on social. Hopefully, this helps to educate folks on some of the great work people are up to. Meetings are 3rd Thursday of every month at 11am EDT on zoom, Zoom link: us06web.zoom.us/j/89383035703?pwd=WkRJSzNnRTh4T2R1ZjJuVVdJWlMxQT09 W...

48:46

Dask Demo Day 2023-04-20

zhlédnutí 483Před rokem

Talks: Lindsey Gray - dask-awkward and dask-histogram for high energy physics analysis Amine Diro - daskqueue : a dask-based distributed task queue James Bourbeau - Pyarrow strings in Dask DataFrames Jacob Tomlinson - Launching a Jupyter/Dask cluster on NVIDIA Base Command Platform Want to present in one of the upcoming Dask Demo Days? Sign up here: github.com/dask/community/issues/307 Key Mome...

52:13

Dask Demo Day - 2023-03-16

zhlédnutí 443Před rokem

Dask Demo Days Talks: Analyzing Terabytes of Ocean Simulation model output with Xarray, xgcm and xhistogram - Tom Nicholas P2P shuffling - Hendrik Makait Scaling weather radar data analysis with Dask - Max Grover Automatic package synchronization in Coiled Dask Clusters - David Chudzicki Graph Neural Networks training with Dask - Vibhu Jawa Want to present at one of the upcoming Dask Demo Days?...

56:49

Dask Demo Day - 2023-02-16

zhlédnutí 482Před rokem

Monthly Dask Demo Day: February 2023 Talks: 00:00 Intro 00:28 New Dask integration in Flyte - Bernhard Stadlbauer 11:37 Parallelizing FTP downloads from a janky government server - Paul Hobson 22:45 Configurable Dataframe backends - Rick Zamora 34:36 Parallelize HPO of XGBoost with Optuna and Dask (multi-cluster) - Guido Imperiale 43:20 Accelerated Jaccard similarity using RAPIDS and Dask - Jiw...

1:04:10

Dask Demo Day - 2022-11-16

zhlédnutí 1KPřed rokem

Monthly demo day for Dask for November 2022 Github Issue: github.com/dask/community/issues/286 Talks: 00:00 Intro 03:05 2,000,000,000 lightning flashes - @ktyle 14:44 Dask CLI - @douglasdavis 21:44 Optuna - @jrbourbeau 32:00 Community Interlude - @mrocklin 34:02 Dask Awkward - @douglasdavis 46:02 Dask PySpy - @gjoseph92 01:03:30 Closing Follow us on twitter @dask_dev or sign up for the newslett...

57:46

Dask Demo Day - 2022-10-27

zhlédnutí 1,5KPřed rokem

Dask Demo Days - October 2022. Five quick talks using and developing Dask. Talks: 00:00 Intro 01:43 Scraping arXiv to determine Matplotlib popularity - Matthew Rocklin 08:36 Reducing memory use with task queuing - Florian Jetter 20:54 Kubernetes Operator and KubeFlow - Jacob Tomlinson 33:23 Prometheus - Nat Tabris 42:46 Apache Beam on Dask - Alex Merose 54:52 Conclusion github.com/dask/communit...

Dask in Production | How Dask Can Help in Production

0:59

Dask in Production | How Dask Can Help in Production

zhlédnutí 523Před rokem

Dask in Production | How Dask Can Help in Production

0:35

Dask Use Case | Who Uses Dask: GrubHub

zhlédnutí 281Před rokem

Dask Use Case | Who Uses Dask: GrubHub

Dask Use Case | Who Uses Dask: CapitalOne

1:55

Dask Use Case | Who Uses Dask: CapitalOne

zhlédnutí 289Před 2 lety

Dask Use Case | Who Uses Dask: CapitalOne

Dask Use Case | Who Uses Dask: Geophysical Sciences Studying Ocean Currents

0:59

Dask Use Case | Who Uses Dask: Geophysical Sciences Studying Ocean Currents

zhlédnutí 297Před 2 lety

Dask Use Case | Who Uses Dask: Geophysical Sciences Studying Ocean Currents

Dask Use Case | Who Uses Dask: UK Meteorology Office

2:01

Dask Use Case | Who Uses Dask: UK Meteorology Office

zhlédnutí 183Před 2 lety

Dask Use Case | Who Uses Dask: UK Meteorology Office

1:28

Dask Use Case | Who Uses Dask: WalMart

zhlédnutí 300Před 2 lety

Dask Use Case | Who Uses Dask: WalMart

Dask Use Case | CapitalOne: Adding Dask to Your Existing Pipeline

2:10

Dask Use Case | CapitalOne: Adding Dask to Your Existing Pipeline

zhlédnutí 313Před 2 lety

Dask Use Case | CapitalOne: Adding Dask to Your Existing Pipeline

Dask Scientific Libraries | Scaling Science | Genevieve Buckley

5:00

Dask Scientific Libraries | Scaling Science | Genevieve Buckley

zhlédnutí 332Před 2 lety

Dask Scientific Libraries | Scaling Science | Genevieve Buckley

New Dask Branding | Dask Gets an Upgrade

2:01

New Dask Branding | Dask Gets an Upgrade

zhlédnutí 1,1KPřed 2 lety

New Dask Branding | Dask Gets an Upgrade

Dask Use Case | Who Uses Dask: Financial Institutions

1:25

Dask Use Case | Who Uses Dask: Financial Institutions

zhlédnutí 496Před 2 lety

Dask Use Case | Who Uses Dask: Financial Institutions

Dask Best Practices | Scaling Up Science | Genevieve Buckley

2:31

Dask Best Practices | Scaling Up Science | Genevieve Buckley

zhlédnutí 3KPřed 2 lety

Dask Best Practices | Scaling Up Science | Genevieve Buckley

Dask for Science | Dask Example | Genevieve Buckley

1:08

Dask for Science | Dask Example | Genevieve Buckley

zhlédnutí 324Před 2 lety

Dask for Science | Dask Example | Genevieve Buckley

Scientific Computing & Dask | Leveraging Dask for Life Sciences | Genevieve Buckley

26:54

Scientific Computing & Dask | Leveraging Dask for Life Sciences | Genevieve Buckley

zhlédnutí 632Před 2 lety

Scientific Computing & Dask | Leveraging Dask for Life Sciences | Genevieve Buckley

1:13

What is Dask? A Brief Introduction

zhlédnutí 1,8KPřed 2 lety

What is Dask? A Brief Introduction

Scalable Machine Learning with Data Scientist Eric Ma

59:07

Scalable Machine Learning with Data Scientist Eric Ma

zhlédnutí 263Před 2 lety

Scalable Machine Learning with Data Scientist Eric Ma

Komentáře

@gemini_537 Před měsícem
Gemini 1.5 Pro: The video mentions that group by operations can fail due to large datasets and unsorted data. Here are the reasons for failure and how to compensate for them: * **Large datasets:** When dealing with large datasets, it is recommended to tune the split-out parameter. This parameter determines the size of the partitions, and a good starting point is to target 100 megabyte partitions. You can estimate the split-out value by considering the number of groups in your data and the size of each group. * **Unsorted data:** Dask performs better when the data is sorted by the group by fields. If your data is not sorted, Dask will shuffle the data to group it, which can be expensive. There are two ways to address this: * Sort your data before performing the group by operation. * Use math partitions. Math partitions can be used when your data is already sorted by an index matching one of your group-by fields. In this case, Dask can perform the group by operation on each partition without shuffling the data. Here are additional tips to improve the performance of group by operations in Dask: * **Optimize memory usage:** * Use pandas string dtype instead of object dtype for strings. * Use categorical data types when applicable. Categoricals are efficient when you have a small number of unique strings and the strings are large. * Drop unnecessary columns before performing the group by operation. * **Repartition your data:** Repartitioning your data ensures that the partitions are uniform in size. This can improve the performance of group by operations by avoiding situations where some partitions are significantly larger than others. * **Prioritize reductions before group by:** Perform any filtering or data reduction operations before the group by operation. This will reduce the amount of data that needs to be shuffled or grouped by.
@gemini_537 Před měsícem
Gemini 1.5 Pro: This video is about Dask Bag, a library for processing large datasets in parallel. The video starts with a basic introduction to Dask Bag. It explains that Dask Bag is a library that is useful for doing embarrassingly parallel analyses and a lot of pre-processing especially the text JSON or Avro data. Then the video dives into details with an example. The speaker constructs a bag with ten elements separate into four different partitions to demonstrate what a bag is. A bag is like a bunch of lists. Users can perform map, filter and reduce functions on the bag. For instance, the speaker uses map function to square every element in the bag, and filter function to get only the even elements. Next, the video shows how to use Dask Bag on real data. The data used in the example is a bunch of JSON files from a web service called MyBinder. The speaker reads the data using the read text function from Dask Bag. Then the speaker uses map function to convert the JSON encoded text into Python dictionaries. After converting the data into Python dictionaries, the speaker uses frequencies function to count how many times each Github repository shows up. The result shows that ipython is the most common repository that showed up in the data. The video then talks about how to use Dask Bag to pre-process data. The speaker filters out data that does not have "task" in the "spec" field and convert the data back into JSON format. Finally, the speaker writes the data to a text file. The last part of the video talks about the data frame. The speaker mentioned that Dask Bag may not be the right choice for complex analyses. Dask Dataframe might be a better option for such cases. The speaker also mentioned that Dask Bag can be converted to a Dask Dataframe using the to_dataframe function.
@miriamdixon1870 Před měsícem
The resolution is very bad.
@JohnMatthew-dt1vq Před 2 měsíci
Excellent video, I wish all tech videos were this good.
@apachaves Před 3 měsíci
Very interesting. Thank you for this view on the new dask_databricks functionalities.
@mwd6478 Před 4 měsíci
Dask on Databricks is really cool. There's so many times you're on Databricks doing Python data science and don't want to use Spark.
@DanielJahn-fu2ev Před 5 měsíci
Question regarding Array Expressions: how do they play together with the Dask (high-level) graph? A concrete xarray example: a problem with very large arrays is that even just their computational graph is too large to be materialized. A strategy is to read them without Dask (chunks=None), slice, and then again turn them into a dask-backed array by chunking. Would Array Expression simplify this, pushing the slicing before the graph materialization, or are those operating at different levels?
@Coiled Před 5 měsíci
Expressions will eventually replace high-level graphs. They generate low-level task graphs directly. Slicing is definitely pushed through before graph generation, which will likely help reduce overall graph generation overhead. It's still possible to create large graphs though, just less likely. We're also shipping the expressions directly to the scheduler, so there will be less pain to large graphs (they won't have to travel over a wire).
@DanielJahn-fu2ev Před 5 měsíci
@@Coiled Thanks for the answer! That actually sounds great, would help our workflows quite a bit.
@lalitchoudhary1095 Před 5 měsíci
Show its use with xarray
@rodrigoluca6296 Před 6 měsíci
Obrigado por ter legendas em Português .
@pingzhong-pl5sb Před 7 měsíci
Where can I get Paul Hobson's source code ?
@sagniksarkar506 Před 7 měsíci
Awesome video Trevor. Do you have any idea about the resources that I can use to learn more about the Zarr and its inbuilt configurations? I have seen the documentation, but it seems little overwhelming to me.
@DrTallin Před 8 měsíci
Nice video. Is there a detailed review how your colleagues are analyze billions of records? you've mentioned it here: czcams.com/video/8aQ3xcX8e9Y/video.htmlsi=0FRQOT9TEnDz9FUs&t=1621
@dogosousa Před 9 měsíci
@martinfleis can we access your notebook?
@AlverGant Před 10 měsíci
Had some issues with Ray, but Dask worked out of the Box! Congratulations to the Developers!
@cleitonluiz7136 Před 10 měsíci
What is the name of this enviromnet where you are running this commands?
@djstacktrace Před 7 měsíci
It's a Jupyter notebook
@aria_nukil Před 10 měsíci
Great intro. Also, how do I show those additional panes on the right shows an 2:05 to display memory usage and progress etc. That is pretty awesome. Thanks so much
@be12 Před rokem
Great work you guys
@habruti7215 Před rokem
1:08:00
@loveyou-pi5gj Před rokem
Could I use async/await with dask?
@loveyou-pi5gj Před rokem
55:15
@Kai-iy7pe Před rokem
Is there an official Dask community channel?
@kristiantorres1080 Před rokem
Hi Matt. Amazing stuff as always. Do you know if there is something similar for VScode? Thank you!
@Dynamitegaming125 Před rokem
pin me
@jacobgomez_ Před rokem
Kept checking my Slack because I didn't realize it was coming from the video...
@billyblackburn864 Před rokem
is the notebook for the local gpu availablr
@jijie133 Před rokem
Thank you!
@RobertAlbrecht-mw7er Před rokem
Dask is the bomb.
@RoguesAndEvolution Před rokem
Hiya, you mentioned Xarray in passing. Is there a multi-demensional equivalent to cudf?
@alexanderlyapin8057 Před rokem
Please correct if I am wrong, but maybe it is better to open file for writing at 4:48 with 'a' mode or every worker will override the data inside and you will have only the result of the last firing worker.
@parikannappan1580 Před rokem
where can we download the CSV files?
@jylpah Před rokem
Highest resolution available is 360p. It’s hard to read the code
@samsammurphy Před rokem
These videos are fantastic but sometimes difficult to hear (even with my volume set to max)
@paveevad Před rokem
Hi, since this video was posted, the dask-report.html page has an extra tab called "Summary" - is there a doc where I can read what the various stats in that summary mean?
@iamworstgamer Před rokem
cant even create dataframe from python list. need to create a pandas dataframe first. which kinda defeats the whole purpose.
@shpundk Před rokem
Thank you for this recorded Dask Demo Day! Are these Jupyter NB available for users?
@fjetter4295 Před rokem
We don't have a single repo for this, yet. My notebooks are available here github.com/fjetter/dask-demo
Před rokem
Hi! Thank you for this! Regards
@k1zmt Před rokem
Does the Dask have some kind of linter?
@annawilson3824 Před rokem
Really a great talk!
@PalataoArmy Před rokem
Thank you for the explanation. Now it clears up my confusion on compute() vs persist()
@carlosmateosamudiolezcano2463 Před rokem
Where can I get access to the notebooks used here?
@Dask-dev Před rokem
Hi Carlos, you can find them here: github.com/quasiben/rapids-dask-summit-2021
@carlosmateosamudiolezcano2463 Před rokem
The quality of this video makes it impossible to read the code
@585ghz Před rokem
This lib is awesome!!! Thanks a lot 😍😍
@hamidrezahosseinkhani5980 Před rokem
what a boring speaker, such a disgusting english!
@MsStoCa Před rokem
Thanks for the great explanation!
@Queeno11 Před 2 lety
This is without doubt the best short guide on dask futures. Been reading lots of documentation but this video makes it so simple yet so powerful. Thanks a lot!
@MatthewRocklin Před 2 lety
Thanks!
@arkadipbasu828 Před 2 lety
Thank you Dask Team, will explore this and join the community
@sebbie2e Před 2 lety
That's a quality video, well done
@danielabatalha5434 Před 2 lety
Hello. The notbooks are available somewhere ?
@Amapramaadhy Před 2 lety
Dask and all the python magic aside, Matt should hold master classes in delivering public lecture ♥ Also +100 on the "mature deployment" issue.
@antribera2138 Před 2 lety
💔 🄿🅁🄾🄼🄾🅂🄼