Doug Mercer
Doug Mercer
  • 13
  • 638 855
How Fast can Python Parse 1 Billion Rows of Data?
To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/DougMercer .
You’ll also get 20% off an annual premium subscription.
-------------------------------
Sign up for 1-on-1 coaching at dougmercer.dev
-------------------------------
The 1 billion row challenge is a fun challenge exploring how quickly we can parse a large text file and compute some summary statistics. The coding community created some amazingly clever solutions.
In this video, I walk through some of the top strategies for writing highly performant code in Python. I start with the simplest possible approach, and work my way through JIT compilation, multiprocessing, and memory mapping. By the end, I have a pure Python implementation that is only one order of magnitude slower than the highly optimized Java challenge winner.
On top of that, I show two much simpler, but just as performant solutions that use the polars dataframe library and duckdb (in memory SQL database). In practice, you should use these, cause they are incredibly fast and easy to use.
If you want to take a stab at speeding things up further, you can find the code here github.com/dougmercer-yt/1brc.
References
------------------
Main challenge - github.com/gunnarmorling/1brc
Ifnesi - github.com/ifnesi/1brc/tree/main
Booty - github.com/booty/ruby-1-billion/
Danny van Kooten C solution blog post - www.dannyvankooten.com/blog/2024/1brc/
Awesome duckdb blog post - rmoff.net/2024/01/03/1%EF%B8%8F%E2%83%A3%EF%B8%8F-1brc-in-sql-with-duckdb/
pypy vs Cpython duel blog post - jszafran.dev/posts/how-pypy-impacts-the-performance-1br-challenge/
Chapters
----------------
0:00 Intro
1:09 Let's start simple
2:55 Let's make it fast
10:48 Third party libraries
13:17 But what about Java or C?
14:17 Sponsor
16:04 Outro
Music
----------
"4" by HOME, released under CC BY 3.0 DEED, home96.bandcamp.com/album/resting-state
Go buy their music!
Disclosure
-----------------
This video was sponsored by Brilliant.
#python #datascience #pypy #polars #duckdb #1brc
zhlédnutí: 163 938

Video

STOP Using Plain Python Scripts! Do this instead (5 reasons)
zhlédnutí 13KPřed 5 měsíci
Sign up for the totally free tier of Prefect Cloud here: prefec.tv/doug-mercer Sign up for 1-on-1 coaching at dougmercer.dev One of the most frustrating parts of the workday is doing something that you know could be automated, but just… isn’t yet. In this video, we use Prefect to schedule a Python script to run every week. After, we find that scheduling was really only the first of five problem...
Write Python code people actually want to use
zhlédnutí 12KPřed 9 měsíci
Sign up for 1-on-1 coaching at dougmercer.dev Writing code that "works" shouldn't be your end goal. It's really just the start... In this video, we adapt a clumsy, non-Pythonic API into an easy to use, easy to understand Pythonic one. We use magic methods such as getitem , len , enter , and exit to make our objects a context manager and support the len() function and square bracket indexing. An...
Compiled Python is FAST
zhlédnutí 96KPřed rokem
Sign up for 1-on-1 coaching at dougmercer.dev Python has a bit of a reputation fast to write, but slow to run. In this video, we focus on a simple to understand dynamic programming problem that would be terribly slow in native Python or numpy. We show that Python can achieve (and actually exceed) C level performance with the help of just-in-time and ahead-of-time compilers such as mypyc, Cython...
ChatGPT made an iPhone app for me
zhlédnutí 22KPřed rokem
Sign up for 1-on-1 coaching at dougmercer.dev ChatGPT and I create an IOS SwiftUI app, but I'm just the idea guy. Since I have no idea how to write Swift, ChatGPT helps me write an basic fitness app with pretty interesting animations without even knowing the language. #chatgpt #ios #swift Chapters 0:00 Intro 0:17 Hello World 1:04 Buttons and Counters 2:25 Title and Success Indicator 3:26 Fixing...
Never use PowerPoint again
zhlédnutí 286KPřed rokem
Sign up for 1-on-1 coaching at dougmercer.dev In this video, we use Marp to create a presentation from our text editor using Markdown syntax. Marp is an excellent way of quickly creating beautiful presentations with syntax highlighted code blocks, TeX typeset math, and stylized images without having to worry about a finicky WYSIWYG editor like PowerPoint. #markdown #marp #marpit I use Marp a lo...
What's new in Python 3.11?
zhlédnutí 1,4KPřed rokem
Sign up for 1-on-1 coaching at dougmercer.dev Python 3.11 is out! In this video, I show you some of the cool features and improvements made in this release. Release notes - docs.python.org/3.11/whatsnew/3.11.html Other notes: - At the time of this video's release, MyPy doesn't support most of the new typing features. If you're eager to use them right away, consider using pyright. - pyperformanc...
Scary good voice cloning | Python Text to Speech
zhlédnutí 6KPřed rokem
Sign up for 1-on-1 coaching at dougmercer.dev In this video, Doug is lazy and lets TorToiSe text-to-speech do his voice-over for him. Specifically, we create 11 custom voices and illustrate how they sound while teaching you how to do the same. We also discuss some of the ethical considerations of such a powerful model, and highlight the precautions taken by the TorToiSe author. Thanks (and sorr...
OpenAI Whisper: Incredible Automatic Speech to Text!
zhlédnutí 10KPřed rokem
Sign up for 1-on-1 coaching at dougmercer.dev Whisper, OpenAI's new automatic speech recognition model, is *awesome*. In this video, I show you how to use it and present a few interesting examples of transcribing English and translating spoken German and Japanese to English. github.com/openai/whisper #machinelearning #ai #openai #nlp #python Chapters 00:00 Introduction 00:22 Installing Whisper ...
The Magic that Makes Python Tick
zhlédnutí 2,5KPřed rokem
Sign up for 1-on-1 coaching at dougmercer.dev In this video, we implement Python's range object in pure Python. We use this as a case study to explore Python's data model and learn how to create rich objects that interact naturally with Python's built-in operators and functions. #python #objectorientedprogramming Chapters 0:00 Introduction 1:02 What does range() do? 2:17 How will we test our im...
Your Python code is almost entirely untested.
zhlédnutí 1,8KPřed rokem
Sign up for 1-on-1 coaching at dougmercer.dev Property-based testing is an alternative to example-based testing where tests verify that properties of the output are satisfied given randomly generated input. This randomized approach allows us to more thoroughly test against a broad range of input and can detect hard-to-find bugs better than example-based testing. In this video, we discuss the mo...

Komentáře

  • @testkj
    @testkj Před 12 hodinami

    What If you just read every x+n(core) line according to available cores and calc min/max and avg? On a 8 Core CPU Core 1 calcs line 1, 9, 17... Core 2 calcs line 2, 10, 18... And so on. Skipping lines won't cost that much Processing Power. No need to chunk it and calculate where the chunk ends and stuff. Also: is it allowed to create a DB out of the Data?

    • @dougmercer
      @dougmercer Před 11 hodinami

      Give it a shot! Repo is in the description. As for a DB, I'd say you need to include creating the database and ingesting the data in your timing to be fair

  • @mircea-pircea
    @mircea-pircea Před 15 hodinami

    I do wonder if using normal C++ arrays rather than std::vector would have made the C++ the approach faster. Also, I think it could be faster if dp would also be passed by reference just as a and b are.

  • @mircea-pircea
    @mircea-pircea Před 15 hodinami

    Considering the effort put to make the cython code run, i'd much rather just write C or C++ from the start

  • @lukaslarsson3136
    @lukaslarsson3136 Před dnem

    I find NumPy to be very fast and optimized. Much faster than the code i am able to write in R atleast

  • @elver5041
    @elver5041 Před dnem

    The song at 9:16 goes hard, anyone knows how it's called?

  • @robinpage2730
    @robinpage2730 Před 2 dny

    Match statements would be faster than all those if/else statements

    • @dougmercer
      @dougmercer Před 2 dny

      Maybe... Clone the repo in the description and give it a shot!

  • @BrainStroming1789
    @BrainStroming1789 Před 3 dny

    one optim. If a number is Max, it can not be Min. Add an Else will save more NOP that not relevante test 😉

    • @dougmercer
      @dougmercer Před 2 dny

      Clone the repo in the description and give it a shot!

  • @alexanderpotts8425
    @alexanderpotts8425 Před 3 dny

    I'm actually surprised the simple cpython script you started with was under 10 minutes

  • @ihebbendebba2978
    @ihebbendebba2978 Před 3 dny

    Jeffrey Dahmer explains why you suck at programming colorized circa 2023.

    • @dougmercer
      @dougmercer Před 3 dny

      If by Dahmer you mean Evan Peters I guess I'll take it ¯\_(ツ)_/¯

  • @russianyoutube
    @russianyoutube Před 5 dny

    If you want to write fast code, don't use python. It's great for short scripts that do something simple, but anything more serious than that will be better to do in some other programming language

  • @isaacking4555
    @isaacking4555 Před 5 dny

    “Just kidding, I beat that shit.”

  • @ihtime
    @ihtime Před 6 dny

    pypy?

    • @dougmercer
      @dougmercer Před 5 dny

      Check out my latest video on "How fast can Python parse 1 billion rows of data". I cover pypy in that

  • @Oaisus
    @Oaisus Před 6 dny

    The only real answer to any question that starts with "How fast can Python ..." Is "Slower than C"

  • @hugoturbill6067
    @hugoturbill6067 Před 6 dny

    Surely there's a service that just translates python to c++

  • @abdimajid684
    @abdimajid684 Před 6 dny

    For someone that did a huge DP coding in python. One common problem with like cython is that it does not support so many functions from numpy (good luck using argmax)

  • @N____er
    @N____er Před 6 dny

    How do you optimise python performance without any external libraries or programs? Just native python3 with the standard pre-installed libraries.

    • @dougmercer
      @dougmercer Před 6 dny

      Hmm, I guess the only way would be to write efficient code. I'd profile the code to see what functions are taking the most time, and then focus on improving the slow/frequently called ones Use the right data structures/algorithms. consider using functools.cache to memoize anything that would benefit from caching. reprofile your code after each change to quantify what changes were helpful. You can technically write your own c extensions if your system has a c compiler, but that's probably not what you want.

  • @joaovitormeyer7817
    @joaovitormeyer7817 Před 7 dny

    for anyone interested, I copyed his c++ code and his example with 30000 elements in each vector, and in my computer it ran in ~25 seconds (my PC is slow). By simply compiling with -Ofast, it got down to ~5 seconds, still without modifying the code at all. I'm not hating in the content of the video, wich in fact is great

    • @dougmercer
      @dougmercer Před 7 dny

      I compiled the C++ with -O3 for this video gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3

    • @joaovitormeyer7817
      @joaovitormeyer7817 Před 6 dny

      @@dougmercer yeah I tought of that, but as (I think, it's been a while since I watched your video, and I really can't check now) you didn't mention anything, I assumed not. Well this makes it more impressive then for python, nice!

    • @dougmercer
      @dougmercer Před 6 dny

      You're right, I didn't mention anything. Next time I'll be sure to show these sort of details in video. Was your 25s measurement with no optimization? I wonder what the difference between -O3 and -OFast are for a problem like this. (I'm admittedly not a C++ guy, so OFast is new to me!)

    • @joaovitormeyer7817
      @joaovitormeyer7817 Před 6 dny

      @@dougmercer yeah my 25s was with no optimization, and I guess -O3 and -Ofast shouldn't really make the difference here. As far as I know, -Ofast's main difference from -O3 is that it can change the order of float operations, wich is kinda illegal since a + (b + c) != (a + b) + c for floats, but this code in particular does not have any float calculations so it should be basically the same speed

  • @Saddl3r
    @Saddl3r Před 7 dny

    Great editing. What programs do you use? A "making-of" video would be very interesting!

    • @dougmercer
      @dougmercer Před 7 dny

      Thanks! This video I used a mixture of Manim to render the syntax highlighted code and Davinci Resolve to do the call outs. In later videos I use exclusively Davinci Resolve (+ a Python script to turn my code into a syntax highlighted fusion composition). I'm actually working on writing my own animation library that's similar to manim, but is more tailored for me/the sort of videos I make. I plan on making a video about it as one of my next two videos. so, stick around for that =]

    • @Saddl3r
      @Saddl3r Před 7 dny

      @@dougmercer Thanks!! 🌟

  • @Macatho
    @Macatho Před 9 dny

    I'm curious how fast this would run with a GPU implementation. I loved this video, hope you'll extend it with a GPU implementation :)

    • @dougmercer
      @dougmercer Před 9 dny

      Someone did a Dask + cuDF implementation. Seems super fast github.com/gunnarmorling/1brc/discussions/487

  • @Macatho
    @Macatho Před 9 dny

    You cant possibly be such a purist that you dont use numpy or pandas 😅😅

    • @dougmercer
      @dougmercer Před 9 dny

      I tried them! Way slower for this task Using this pandas implementation github.com/Butch78/1BillionRowChallenge/blob/main/python_1brc%2Fmain.py takes about 150s, but polars takes 11-12s

  • @SimpaTheImba
    @SimpaTheImba Před 10 dny

    14:10 my man, can't agree more

  • @arnabchatterjee2094
    @arnabchatterjee2094 Před 11 dny

    basically prefect is celery with steroids

    • @dougmercer
      @dougmercer Před 11 dny

      That's typically how I use the open source library! I recently spoke with someone from Prefect at PyCon, and they said the automation features and more celery-esque features are coming to the OS library soon. So keep an eye on that =]

  • @ryanshea5221
    @ryanshea5221 Před 12 dny

    How to write fast Python: Write C

    • @dougmercer
      @dougmercer Před 11 dny

      Well... I write Python, and let PyPy compile itself down to C =]

  • @nathan22211
    @nathan22211 Před 13 dny

    I feel like you could get similar performance using lupa + lua_importer or nimporter/nython. Both lua and nim are similar in difficulty to python, though I think nim is somewhat like rust when it comes to how to code it.

    • @dougmercer
      @dougmercer Před 12 dny

      This is my first time hearing about either of those. Very interesting 🤔

  • @AlexanderHyll
    @AlexanderHyll Před 14 dny

    I mean all this is saying is if python wraps a low level language it gets low level performance. Not that huge of a revelation. Thats how pythons entire data science world runs.

    • @dougmercer
      @dougmercer Před 13 dny

      That, *and* users don't necessarily need to write their own low level code to get that speed up

  • @Big_bangx
    @Big_bangx Před 14 dny

    What about optimizing your C++ implementation instead to go faster ?

  • @gawwad4073
    @gawwad4073 Před 14 dny

    Nice video. VERY good writing and editing. Smooth as hell, keep it up!

  • @tangerie1284
    @tangerie1284 Před 14 dny

    I wonder how fast an AOT compiler like nuitka would make it

    • @dougmercer
      @dougmercer Před 14 dny

      Yeah, I wonder that too. I'm also curious if anyone has a Cython solution...

  • @affrokilla
    @affrokilla Před 15 dny

    A 50x speedup between a for loop and a polars dataframe is really significant, great video!

    • @dougmercer
      @dougmercer Před 14 dny

      Polars/Duckdb are crazy fast. Thanks for watching!

  • @user-uc6wo1lc7t
    @user-uc6wo1lc7t Před 16 dny

    Well... Your python code is not well "optimized" too =) 1. a, b - np.array, so you use numpy in your pure python code (that is not fair). Moreover, switching to lists gives you speedup (I'm shocked too). But, maybe you did so... You did not show us "all" code. 2. max of two elements is not faster than pure if-else. And if you use it - you need to extract dp[prev_i, j] and dp[i, prev_j] to variables to not duplicate accessing to elements. 3. Your i-1 and j-1 can be evaluated in corresponding loops and saved to variables. The same argument for len(a|b) + 1. Can be evaluated once. With all these I could speedup pure python up to +70% performance on my Ryzen 5 3600X. And it was consistant through N = 3_000, N = 10_000, so all my tests with numpy was with N = 3_000 to not waste too much time. Therefore optimized python code faster than not optimized numpy code by ~82.2%. If you apply 2) and 3) optimizations to numpy version, optimized python code faster by ~77.5%. I even tried different logic for numpy and python versions where dp is 1d array, but python version became slower, while numpy version made some poor progress (optimized numpy with different logic faster than optimized numpy by ~5%). Well, it was intresting. Never expirienced numpy code slower than pure python. Maybe because operations are so fast that switching to numpy types make it worse...

    • @dougmercer
      @dougmercer Před 16 dny

      I didn't use numpy in the pure Python implementation... See the code at 1:59. That's why I mentioned numpy was surprisingly so much slower in the numpy section. As for the other stuff... You're right that there might be other small tweaks you could make to the various implementations. Thanks for pointing them out

    • @user-uc6wo1lc7t
      @user-uc6wo1lc7t Před 16 dny

      ​@@dougmercer From the timestamp you gave I can't deduct if a,b are lists or np.array s because they were created by numpy functions as you have shown at 2:10 . But I'm pretty sure you DID convert a, b to lists, but it could be actually seen when you used mypy and created type hints. I just forgot about it because I was fully invested in pure python and numpy. Sorry, I didn't mean to say your benchamrks are bad, they are actually pretty good. Your video is a lifechanger for me. Previously, I used numpy without any hesitation, even in loops that doesn't leverage vectorization. And, strangly, numpy ave me speedup. So I tried to code your example in disbelive. But, undoubtfully, your code is a good example that I was wrong. But that leaves me with questions how could I get speedup in my previous code...? Maybe I was useing vectorization... Can't remember. Seems like I'm gonna benchmark my old code and test it against pure python implementation. Recently I watched one guy who was benchmarking nested loops. He was trying to proove that smaller outer loops gives +30% performance. But he didn't even realized what exactly he was benchmarking and fooled a lot of people and ignoring me even I gave him math proof of results he got. And conditions to get 30+% boost are extremely vague. So I lost my beliefs in such type of videos. But your... is really good. Thanks a lot!

    • @dougmercer
      @dougmercer Před 16 dny

      @@user-uc6wo1lc7t Ah I see what you're saying. I believe I coerced them to lists but I'm not 100% sure. I'll have to double check later. And definitely agree- I always reach for numpy, and it's surprising when it doesn't help! In this case, indexing into the arrays to do a lot of scalar operations is slower. We typically get the speed up when we can do vectorization. Thanks for watching and the thoughtful comments =]

  • @UndyingEDM
    @UndyingEDM Před 16 dny

    The video editing is top notch too!

  • @Slightlymisshapen
    @Slightlymisshapen Před 16 dny

    Beautiful presentation. I love your descriptions on libraries. The illustrative code break down puts context and concrete examples on what would otherwise be yet more abstract documentation. It helps to grasp the utility of the libraries uou describe not just in this video but all of them. Fantastic work.

    • @dougmercer
      @dougmercer Před 16 dny

      Aw, thanks! glad you enjoyed it, and thanks for your super nice comment =]

  • @silloo2072
    @silloo2072 Před 16 dny

    Why no c++ optimisation

    • @dougmercer
      @dougmercer Před 16 dny

      I did -O3 compile the C++, but this channel is mostly focused on Python devs. So the idea is, what would someone unfamiliar with C++ write... That said, structuring as a 1D array does achieve some speed up, but still only slightly faster than Numba/Taichi , so it doesn't really change the story

  • @ogrp5777
    @ogrp5777 Před 17 dny

    What about pypy?

    • @dougmercer
      @dougmercer Před 17 dny

      I covered PyPy in my latest video ("How fast can Python parse 1 billion rows of data")

    • @ogrp5777
      @ogrp5777 Před 17 dny

      @@dougmercer I just saw the video, that was brilliant! Thanks

  • @ali.moumneh
    @ali.moumneh Před 19 dny

    your video quality is top notch, im sure it will soon equate to video views if you keep this up, good luck.

  • @DepressedMusicEnjoyer

    As much as I like the video the 6 times slower not being a problem kinda drives me crazy as a person doing low level coding 😭 Like yeah you’re right for a lot of stuff it doesn’t matter, but then imagine 5 vs 30 fps, and then if it’s nested within another thing you may get exponentially slower. I have been trying to make a 480mhz mcus cpu do rendering and it’s difficult to get 30 fps ish and well if using python would mean 6 times slower I woudlnt be happy no matter how much easier my code would be to understand

    • @dougmercer
      @dougmercer Před 20 dny

      That's fair. Some things are worth squeezing all the performance out It will be nice when PyPy supports sub-interpreters or no-GIL... it will eventually be possible to close the gap more

  • @0MVR_0
    @0MVR_0 Před 20 dny

    should not have watched this video, considering the fact that I have been waiting two days for my implementation of kendal's Tau to finish,

    • @dougmercer
      @dougmercer Před 20 dny

      Oh no 💀 Numba + numpy or PyPy would probably help

  • @rverm1000
    @rverm1000 Před 24 dny

    That's nice of you to point these libraries out.

  • @bbajr
    @bbajr Před 24 dny

    is pandas slow?

    • @dougmercer
      @dougmercer Před 24 dny

      It would be for something like this! Using this pandas implementation github.com/Butch78/1BillionRowChallenge/blob/main/python_1brc%2Fmain.py takes about 150s, but polars takes 11-12s

  • @gencurrent
    @gencurrent Před 25 dny

    It's hard to watch this with all the whistles and blowers and another hangers, glitters and so on. Kindly, stop using those! The interesting video turns into a torture session!!

    • @dougmercer
      @dougmercer Před 25 dny

      Still trying to find my style/voice ¯\_(ツ)_/¯ Check out my two more recent videos. They have less intrusive editing

    • @gencurrent
      @gencurrent Před 24 dny

      @@dougmercer Thank you Doug : )

  • @joaoguerreiro9403
    @joaoguerreiro9403 Před 25 dny

    Damn, this was an instant follow! Hope to see more computer science content 🙏🏼 Great video :)

  • @Jp-ue8xz
    @Jp-ue8xz Před 28 dny

    c++ -O3 flag: am i a joke to you?

    • @dougmercer
      @dougmercer Před 28 dny

      It was -O3 optimized gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3

  • @RiadAhmed-ce6qo
    @RiadAhmed-ce6qo Před 29 dny

    python use third party compiler as well to execute fast.

  • @MaxiveLegend
    @MaxiveLegend Před měsícem

    1. "Some of these solutions are gonna be as fast or even faster than my C++ implementation" just means your C++ is bad lol. The python interpreter was built in C, so by definition a python program can never be faster than , or even as fast as C. C++ is SLIGHTLY slower than C, but python would never fall in between them. 2. Making python "faster" by transpiling it to C isn't making PYTHON faster, it's just converting it to C. It's basically like saying "I'm going to upgrade my Renault Twingo" and then getting a Bugatti Veyron

  • @smithdoesstuff
    @smithdoesstuff Před měsícem

    Kinda bummed I wasn’t sub 10k

    • @dougmercer
      @dougmercer Před měsícem

      Hah! I'm at 9,991 subs, so good news! Hopefully crossing 10k mark soon =] 🤞

  • @CottidaeSEA
    @CottidaeSEA Před měsícem

    I shit on Python a lot for being slow, but honestly, 8-10 seconds to read 1 billion rows is sufficient in most scenarios.

  • @joseduarte9823
    @joseduarte9823 Před měsícem

    Depending on how large the total sum actually is, using an incremental mean may yield better performance since python won’t need to upgrade the number to a big int

    • @dougmercer
      @dougmercer Před měsícem

      Neat idea... It's worth a shot! Feel free to fork the repo and give it a try

  • @nikilragav
    @nikilragav Před měsícem

    are you allowed to use numpy or gpu (torch, cupy, etc)

    • @dougmercer
      @dougmercer Před měsícem

      You could use numpy, but I don't think it would help (the bottleneck is reading the data in). I did see some use Dask + cuDF (CUDA) and that was very fast. However, it wasn't allowed in the challenge because the evaluation system didn't have a GPU

    • @nikilragav
      @nikilragav Před měsícem

      @@dougmercer ah. Reminds me of another challenge I saw where IO is the bottleneck. Even there I'm wondering if writing the content to the GPU memory and back is too slow