Garbage Collection in Python: Speed Up Your Code

Argument Parsing with argparse in Python

Using the {arrow} and {duckdb} packages to wrangle medical datasets that are Larger than RAM

Sergei Barracuda - Ostrava (OFFICIAL VIDEO)

나랑 아빠가 아이스크림 먹을 때

Touching Act of Kindness Brings Hope to the Homeless #shorts

Speed Up Data Processing with Apache Parquet in Python

NeuralNine

zhlédnutí 8 594

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 7. 09. 2024

Komentáře • 18

@islam9212 Před 10 měsíci ⁺⁸
It hurt my eyes when I saw the calculator even though a python console exists. For a future video it would be interesting to include a comparison with the pickle, feather and jay formats.
@chndrl5649 Před 10 měsíci ⁺²
The reason why the memory taken for both dataframe is because of the datatypes. Csv will convert most predefined datatypes into string which is much larger than numeric datatypes
@tb9359 Před 10 měsíci ⁺²
Had never heard of Parquet. Thank you. It looks very useful.
@jeremiahhauser7148 Před 10 měsíci ⁺²
Interesting, but I am not convinced. If I got it correctly, when selecting columns the time went down by a factor of 3 for both methods (4->1.3s and 0.24->0.08s). So parquet is better anyway, but whether it is specifically better for column-wise access still needs to be demonstrated.
As the other commenter, I would also be interested in a broader comparison with other formats.
Great channel, keep up the good work.
@multitaskprueba1 Před 4 měsíci
You are a genius! Fantastic video! Thanks!
@dana-pw3us Před 6 měsíci ⁺¹
Why not compare sizes of files on a disk? Are they different?
@JeremyLangdon1 Před 9 měsíci
I think pandas tried to infer data types from CSV and often defaults to string. This takes much more space and CPU. Parquet has data types built in to the file so pandas does not need to infer anything. What would be more interesting is when reading the CSV, specify the data types to make it a more “even” comparison.
@Gabriel-cf3bw Před 9 měsíci
Nice tutorial! Very introductory!
@slothner943 Před 10 měsíci
Usually go for feather format. Never understood the difference - just that for me and the data im handling (few columns) feather seems to be quicker.
@JLSXMK8 Před 10 měsíci ⁺¹
I have a related question: Since parquet files are "column-oriented", do you think they would be a good way to store database backups?
Example scenario: Let's say you want to store a database backup, assuming that the data in the database is in a stable state; it contains a large number of product records; maybe their IDs, descriptions, how many purchases for a product, the product prices, etc. Would it be a good idea to store a backup of this database using a parquet file since the backups would be faster to load in case of the data becoming unstable via a transaction in the future? You could rollback the transactions too; however, what if too many of them fail, and all of them need to be rolled back?
@KingOfAllJackals Před 10 měsíci
Parquet isn’t a generic file format. It IS a table so you’re not “store backups” in a Parquet file. I guess you could backup each table independently but nearly every real DB has much more efficient and powerful native backup infrastructure.
Parquet however is where a lot of transactional data ends up for analytics. Columnar storage is more suited to large analytic workloads. Row stores are more suited for OLTP workloads. You would never want to use Parquet for things like “deduct $7.83 from customer 1234’s checking account”.
@JLSXMK8 Před 10 měsíci
@@KingOfAllJackals That is exactly what I thought of possibly using it for; I could use it to back up tables in the database. You did interpret that correctly. I would NOT edit the contents of the parquet backups.
@N0rberK Před 10 měsíci
Tnx Capt.
@farshidzamanirad9691 Před 10 měsíci
Awesome!
@julianreichelt1719 Před 10 měsíci
nice
@codewithmajid4841 Před 10 měsíci
ok Boss
@codewithmajid4841 Před 10 měsíci ⁺¹
I am Junior data scientist From Pakistan

Další v pořadí

Automatické přehrávání

Garbage Collection in Python: Speed Up Your Code

Garbage Collection in Python: Speed Up Your Code

Argument Parsing with argparse in Python

Argument Parsing with argparse in Python

Using the {arrow} and {duckdb} packages to wrangle medical datasets that are Larger than RAM

Using the {arrow} and {duckdb} packages to wrangle medical datasets that are Larger than RAM

Sergei Barracuda - Ostrava (OFFICIAL VIDEO)

Sergei Barracuda - Ostrava (OFFICIAL VIDEO)

나랑 아빠가 아이스크림 먹을 때

나랑 아빠가 아이스크림 먹을 때

Touching Act of Kindness Brings Hope to the Homeless #shorts

Touching Act of Kindness Brings Hope to the Homeless #shorts

PÁRTY VE 20 vs VE 30 LETECH 😅😂

PÁRTY VE 20 vs VE 30 LETECH 😅😂

This INCREDIBLE trick will speed up your data processes.

This INCREDIBLE trick will speed up your data processes.

This Is Why Python Data Classes Are Awesome

This Is Why Python Data Classes Are Awesome

Create Stunning Python GUIs in 10 Minutes With Drag & Drop

Create Stunning Python GUIs in 10 Minutes With Drag & Drop

The Biggest Issues I've Faced Web Scraping (and how to fix them)

The Biggest Issues I've Faced Web Scraping (and how to fix them)

The columnar roadmap: Apache Parquet and Apache Arrow

The columnar roadmap: Apache Parquet and Apache Arrow

Structural Pattern Matching in Python: Not Your Average Switch-Case

Structural Pattern Matching in Python: Not Your Average Switch-Case

Test-Driven Development in Python: Test First Code Later

Test-Driven Development in Python: Test First Code Later

Tkinter Layout Managers - Simple Crash Course

Tkinter Layout Managers - Simple Crash Course

JSON Schema Validation in Python: Bring Structure Into JSON

JSON Schema Validation in Python: Bring Structure Into JSON

so trueee😂 #nevada #tiktok

so trueee😂 #nevada #tiktok

Ne vždycky to jde napoprvé😅💕

Ne vždycky to jde napoprvé😅💕

PÁRTY VE 20 vs VE 30 LETECH 😅😂

PÁRTY VE 20 vs VE 30 LETECH 😅😂

Running With Bigger And Bigger Feastables

Running With Bigger And Bigger Feastables

هذه الحلوى قد تقتلني 😱🍬

هذه الحلوى قد تقتلني 😱🍬

Cola + Mentos = Exploze

Cola + Mentos = Exploze

The dog made the right choice#Short #Officer Rabbit #angel

The dog made the right choice#Short #Officer Rabbit #angel

How Far Would You Make It? 😳

How Far Would You Make It? 😳