The effect of Random UUID on database performance

Sdílet
Vložit
  • čas přidán 12. 06. 2024
  • In this video I whiteboard how UUIDs hurt write (and read performance) when used on secondary and primary indexes. UUIDv4 are the most popular but they are truly random. Compared to snowflakes, ULID or UUIDv7 or even UUIDv1.
    0:00 UUIDv4
    2:30 B+Tree Indexes and UUID
    5:30 Random UUIDv4 Insert Workload
    12:40 Ordered Insert Workload (UUID7/ULID, Sequence)
    14:00 Shared buffer pool flushes
    15:00 Shopify ULID use case
    17:00 URL shortner UUIDs?
    Discovering Backend Bottlenecks: Unlocking Peak Performance
    performance.husseinnasser.com
    Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon)
    backend.husseinnasser.com
    Fundamentals of Networking for Effective Backends udemy course (link redirects to udemy with coupon)
    network.husseinnasser.com
    Fundamentals of Database Engineering udemy course (link redirects to udemy with coupon)
    database.husseinnasser.com
    Follow me on Medium
    / membership
    Introduction to NGINX (link redirects to udemy with coupon)
    nginx.husseinnasser.com
    Python on the Backend (link redirects to udemy with coupon)
    python.husseinnasser.com
    Become a Member on CZcams
    / @hnasr
    Buy me a coffee if you liked this
    www.buymeacoffee.com/hnasr
    Arabic Software Engineering Channel
    / @husseinnasser
    🔥 Members Only Content
    • Members-only videos
    🏭 Backend Engineering Videos in Order
    backend.husseinnasser.com
    💾 Database Engineering Videos
    • Database Engineering
    🎙️Listen to the Backend Engineering Podcast
    husseinnasser.com/podcast
    Gears and tools used on the Channel (affiliates)
    🖼️ Slides and Thumbnail Design
    Canva
    partner.canva.com/c/2766475/6...
    Stay Awesome,
    Hussein
  • Věda a technologie

Komentáře • 157

  • @hnasr
    @hnasr  Před 9 měsíci +15

    database fundamentals course database.husseinnasser.com

    • @MrAtomUniverse
      @MrAtomUniverse Před 9 měsíci

      This is good , i'm on udemy business, do keep more courses open to business plan users! thank you! great videos you have there

  • @bishalkandel9976
    @bishalkandel9976 Před 9 měsíci +45

    Let's make it bigger.

  • @vasiliynet3425
    @vasiliynet3425 Před 9 měsíci +1

    Awesome! Thanks. Dialectic of randomness and order. Very beautiful.

  • @ahmedalaaeldin7362
    @ahmedalaaeldin7362 Před 9 měsíci +7

    Actually, I was thinking about that question a couple of days ago
    Thank you :)

  • @berndeckenfels
    @berndeckenfels Před 8 měsíci +5

    Yes when you age them out, they might leave gaps in the index tree but also reuse the gaps regularly. They might not “nowhere near each other” but always in between. Sp with larger pages the splits are less likely. (Of course IOT is to a good usecase)

  • @AffyisAffy
    @AffyisAffy Před 9 měsíci +12

    The whole time I was wonder why not just insert on a new index rather than pointing to a UUID, that way it's always ordered. But then you said MySQL defaults to this at the end of the video. That's critical for the 'why' of this video

    • @tirthwork9288
      @tirthwork9288 Před 9 měsíci

      He said that in the start of the video as well

    • @berndeckenfels
      @berndeckenfels Před 8 měsíci

      You can use a sequence but keeping them has its own scalability problem, plus the pk won’t be stable

  • @usamaabubakar3061
    @usamaabubakar3061 Před 8 měsíci +1

    Will this concept be applied to columns that are Varchar and indexed? Because strings are also random like username, email, url slugs, etc.. If so, what is the workaround for storing string values as indexed?

  • @consistentlearners
    @consistentlearners Před 9 měsíci

    Happy Teachers Day Husseinn

  • @user-fl6up9jo1w
    @user-fl6up9jo1w Před 9 měsíci +3

    so what about storing uuid in hashmap index?

  • @xD-saleem
    @xD-saleem Před 9 měsíci

    thanks so much its make sense now

  • @mikestaub
    @mikestaub Před 9 měsíci +3

    Please do a video on vector databases.

  • @tempaccount8256
    @tempaccount8256 Před 9 měsíci +3

    Never thought, Hussein will come with a pun in the video "that's what she said"

  • @therealtuyen
    @therealtuyen Před 9 měsíci +2

    I just randomly stumbled upon your video, and I've always had this itch about UUID performance. Your video really boosted my confidence in what I was thinking. Thanks a lot!
    Oh, and quick question: Do databases like MariaDB, MySQL, or PostgreSQL automatically play nice with ULIDs? And how do they know to sort 'em out for indexing?

    • @nirnullz
      @nirnullz Před 9 měsíci +3

      Postgresql, mysql 8++, mariadb 10.7++ support uuid internally. Postgresql have best implementation since it’s saved as 128bit binary but can be accessed as string without custom function on client.
      MariaDB/MySql client need to call certain function if you want to query a binary uuid field.

    • @therealtuyen
      @therealtuyen Před 9 měsíci +3

      @@nirnullz I asked about ULID, bro ?

  • @buddy.abc123
    @buddy.abc123 Před 9 měsíci +4

    He made it bigger.

  • @alegon2007
    @alegon2007 Před 9 měsíci +6

    What about distributed NO SQL DBs (document, key-value, etc) where the recommendation is to use partition keys with high cardinality and avoid sequential values because that will create “hot” partitions for inserts at least? Maybe its a different use case but it would be an interesting topic too

    •  Před 9 měsíci

      where did you read that? Could you share the article?

    • @mishikookropiridze
      @mishikookropiridze Před 9 měsíci +1

      Different scenario.

    • @ehfoss
      @ehfoss Před 9 měsíci +1

      There are 3 new uuids in rfc draft. ulid is equivalent to uuidv7. uuidv8 let's you define your own data. And uuidv6 has a machine key followed by a timestamp akin to ulid which can help with the situation you're describing.

    • @mishikookropiridze
      @mishikookropiridze Před 9 měsíci

      Also possible problem is that ULID doesn't have RFC, so implementation can differ.

    • @avinashboppudi
      @avinashboppudi Před 9 měsíci +2

      @alegon2007, that is true for NO SQL DB's as it does not use B+ tree datastructure to store data. If I take DynamoDB as an example,
      It relies on something like partition key to hash and figure out the partition. This makes sure your single partition is not going to be overloaded with requests. While for sort key, it's recommended to keep it sequential if possible as internally in the shard it must have been using a B+ tree to sort on the sort keys

  • @gam1l
    @gam1l Před 7 měsíci

    This one cracked me up 09:59 😂
    Great work as usual 👏
    Please consider making a video on Pinecone and vector databases 🙏

  • @utkarshsaxena8034
    @utkarshsaxena8034 Před 9 měsíci

    How about when we go ahead and distribute the data. Won't this make the last shard hot all the time?

  • @ddanielsandberg
    @ddanielsandberg Před 9 měsíci +16

    MySQL 8 uses UUID v1 (kind of compatible with UUID v2).
    By using the UUID_TO_BIN()/BIN_TO_UUID() function with the optional second argument 'swap_flag' it will reformat the UUID before converting it to a BINARY(16). This will make them sequential (since UUID v1/v2 is based on the timestamp).

    • @someoneanonymous6707
      @someoneanonymous6707 Před 9 měsíci

      How costly is that operation tho?

    • @cedricbrisson7240
      @cedricbrisson7240 Před 9 měsíci +1

      uuidv1 is not secure tho

    • @joesb
      @joesb Před 9 měsíci +6

      @@cedricbrisson7240 "What security and why should it apply for this case?" uuidv1 has some characteristic that make it possible for someone who want to *intentionally* clash/guess your id, they might be able to. If YOUR OWN service is creating this id, why would you intentionally do it?
      Don't just parrot that something is insecure. Knows if it applies to your use case and attack vector as well.

    • @cedricbrisson7240
      @cedricbrisson7240 Před 9 měsíci

      Jesus chill lol It's simply good practice to avoid using vulnerable services/tools when safer alternatives can be used. As someone on a CSEC team I'd rather pre-emptively protect myself than wait for someone to find a use case for a specific flaw :) But sure, I'm just "parroting". You're the kind that praises himself as a software engineer thinking their above "programmers" right?

    • @joesb
      @joesb Před 9 měsíci

      @@cedricbrisson7240 do you think “integer” is secure? Do you avoid using integer? It’s nonsense statement. Uuid v1 is insecure for a certain use case. Using it as an id you generate yourself is not one of them.

  • @julianpurse956
    @julianpurse956 Před 9 měsíci +1

    What about database performance of using a uuid stored in varchar column vs new db versions which support a native uuid type

    • @jacob_90s
      @jacob_90s Před 9 měsíci +1

      Doesn't change the problem, and in fact you'll probably get even worse performance since you'll have a less compact data type.

  • @PabloGnesutta
    @PabloGnesutta Před 9 měsíci +1

    What about NOSql databases like mongo db?

  • @RealEvangelizer
    @RealEvangelizer Před 8 měsíci +1

    Ive used UUID generated outside the database and stored as primary key varchar within the db. Its worked a lot better for over 100 million devices that I was working with. Primary goal was to use it as hashtables when use other systems like redis/dynamo etc. For lookups, with a bit off magic in UUID generation which is not being generated by DB, you can apply Bloom filters as well. I've been out of engineering field for a decade+ but these sort of videos are always much needed to discuss the fundamentals of how things work.

  •  Před 9 měsíci +3

    does it mean that, generally, indexing on a random value column is hurt?

    • @fullstackdotnet
      @fullstackdotnet Před 9 měsíci +2

      it is, but dont start avoiding it and enforcing new guidelines yet, wait till you reach shopify level scale where you can benefit from that sort of performance improvement. its all about ROI

  • @usun_development3947
    @usun_development3947 Před 9 měsíci

    So what are we supposed to use as a primary key identifier in MySQL, if we use distributed database setup?

    • @ferrysuhandri
      @ferrysuhandri Před 9 měsíci +1

      using UUID for distributed database is fine. the video is about using identifier in general case. using distributed database is a special case.

  • @Iandavidandrino
    @Iandavidandrino Před 9 měsíci

    I was just thinking about this 😵

  • @alfrendosilalahi6652
    @alfrendosilalahi6652 Před 9 měsíci +3

    Any one can explain about what is page in that explanation? i don't really understand

    • @muratdemirturk2847
      @muratdemirturk2847 Před 9 měsíci +8

      it is like a file that DB put indexes in. Lets say you want to get a record from DB with id number 80. if indexing in pages were not used it would have to scan entire DB to find where a record with id=80 saved. This wont be an eficient way of getting data. this is actualy how file system work. In DB it uses indexes that points which sector that record are saved. And pages are basicly where those indexesare saved and its pointers are where actual data/record are saved. So DB first goes to pages and finds index. then looks for the exact record from where index is pointing. Long story short pages are phonebook for DB...

  • @OzoneGrif
    @OzoneGrif Před 8 měsíci +3

    The worst is using UUID with SQL Server; since the PK is Clustered; performance becomes awful very quickly.
    I use Postgresql and never had performance issues with UUIDs.

    • @ShinigamiZone
      @ShinigamiZone Před 8 měsíci

      Can you tell me a bit more about this? What means clustered in that case for SQL Server?
      I just structured a whole DB with PK UUIDv4 in SQL Server but it’s not shipped to prod yet 👀

    • @ShinigamiZone
      @ShinigamiZone Před 8 měsíci

      At least the prod system is not high-performance required
      I expect around 2000 writes per day and 1_000_000 reads per day
      And let’s say 10_000_000 rows in total with around 10 tables that are read mainly, 5 more tables for not so much read tables and 2 views (maybe need to be materialized views)

    • @OzoneGrif
      @OzoneGrif Před 8 měsíci +1

      @@ShinigamiZone Clustered means physically ordered on the storage. Random UUID used as a Clustered PK (the default on SQL Server) means your records will be fragmented everywhere on your storage, leading to awful performance.

  • @vmaxxk
    @vmaxxk Před 9 měsíci +10

    I've seen DBs with millions of records across various tables perform just fine with UUIDs leveraged everywhere.
    Either the performance difference is negligible or it only affects extremely large datasets.

    • @willi1978
      @willi1978 Před 9 měsíci +3

      i think so too. usually data has no correlation between each other for example customers in a table. for timeseries data uid v7 can be used

    • @IvanRandomDude
      @IvanRandomDude Před 9 měsíci +1

      Databases are much more powerful than we think.

  • @redpillsatori3020
    @redpillsatori3020 Před 9 měsíci +4

    Pro tip: watch at 1.5x speed. You're welcome

  • @anuragbhagsain9724
    @anuragbhagsain9724 Před 9 měsíci +1

    lowkey "That's what she said" reference at 9:57 😂

  • @jameswestbrook5709
    @jameswestbrook5709 Před 9 měsíci +3

    how about ULID thought?

    • @mardix
      @mardix Před 9 měsíci

      ULID leaks timestamp. For sensitive data, you may not want to expose this info.

  • @RZhuAmpere
    @RZhuAmpere Před 9 měsíci +2

    10:00 would have never expected that joke from you 😂

  • @simon3121
    @simon3121 Před 9 měsíci +2

    Missing the point. A UUID allows an application to create the ID rather having a two step approach of asking the DB for an ID first.

    • @AnOmegastick
      @AnOmegastick Před 9 měsíci +1

      Applications can create a ULID.

  • @AnkurVermachords
    @AnkurVermachords Před 9 měsíci

    I am confused at the statement that indexes are b+ tree and it gets indexed and it gets balanced but if gets balanced isnt the order gets messed up...i am a bit confused 😐 🤔

    • @Aspartame12
      @Aspartame12 Před 9 měsíci +1

      The tree gets balanced by shifting which node is treated as root , balanced binary search tree concept.

    • @AnkurVermachords
      @AnkurVermachords Před 9 měsíci

      @@Aspartame12 but if change the root isnt the root becomes the first node then where is the order still same confuision

    • @Aspartame12
      @Aspartame12 Před 9 měsíci +1

      @@AnkurVermachords order in a balanced BST is never there, it's for search only , in a balanced bst search takes less time compared to unbalanced that's the whole premise, it also supports less than and greater than queries easily.

    • @AnkurVermachords
      @AnkurVermachords Před 9 měsíci

      ​@@Aspartame12he mentioned the ds to be ordered at 3.40 .. aaaaahhhhhhh i am confused now, had the idea of tree now doesnt have the idea of plant as well😂

    • @Aspartame12
      @Aspartame12 Před 9 měsíci

      @@AnkurVermachords Okay, so a BST is ordered in the sense that there is a order in which the nodes are arranged left child is smaller right child is bigger and so on, if you balance the tree your root value changes , that means you do not have ready access to smallest value or largest value, but the middle value ( approx ) , so while searching for something you start from middle value ( root ) and go left or right to find it.
      You can also write a simple recursive/stack based code to print contents of a BST in ascending sorted order or descending sorted order.

  • @rollinOnCode
    @rollinOnCode Před 9 měsíci

    how do you detect and troubleshoot bad page splits?

  • @ankurdutta3277
    @ankurdutta3277 Před 8 měsíci

    I learned this the hard way. In mongodb I keep getting 'document not found' even though it does.

  • @malibudancer8472
    @malibudancer8472 Před 9 měsíci +1

    This is very narrowly scoped on dbs that don't optimize for distributed cases...

  • @CaleMcCollough
    @CaleMcCollough Před 9 měsíci +1

    Great info. I need a tutorial on how to do this with PlanetScale. I ordered your Udemy class. You should add it to that.

  • @yadneshkhode3091
    @yadneshkhode3091 Před 9 měsíci +2

    👍

  •  Před 9 měsíci

    if a B-tree index on UUID is hurt, will the hash index on UUID be better?

    • @ferrysuhandri
      @ferrysuhandri Před 9 měsíci

      hash index for secondary index is possible, but for primary key not all systems can set it with hash. also, uuid is large, we should only use it as primary key when we really need.

    • @amrojjeh
      @amrojjeh Před 8 měsíci

      My guess is that it'd be better but would still suffer because inserts would be stored far apart, and so you'd still have the caching issue

    • @michawhite7613
      @michawhite7613 Před měsícem

      Updates are slower on hash indices in the worst case. When a bucket becomes full, you now have to update every single bucket, since ID % (number of buckets) is now resulting in a different value for every single ID

  • @bernard-ng
    @bernard-ng Před 9 měsíci

    I messed up 😂😂😂 gotta talk to my CTO as soon as possible

  • @shammahrashad
    @shammahrashad Před 8 měsíci

    His videos are so interesting but so slow. I can easily watch it at 2x speed.

  • @MrAtomUniverse
    @MrAtomUniverse Před 9 měsíci +7

    For postgresql, why would anyone use UUID , SERIAL is the default method for incremental ID right?

    • @dz5483
      @dz5483 Před 9 měsíci +17

      Perhaps you don't want to reveal the IDs of your records such that it becomes easy to query your records incrementally. This could be a security risk in API available publicly

    • @gamemoves2415
      @gamemoves2415 Před 9 měsíci +2

      ​@@dz5483I'm thinking when you get the record from the DB on the front end rather you show the UUID.

    • @gamemoves2415
      @gamemoves2415 Před 9 měsíci

      ​@@dz5483I'm thinking when you get the record from the DB on the front end rather you show the UUID.

    • @sasonbraha4599
      @sasonbraha4599 Před 9 měsíci

      I remember reading about an incident in GithHub involving auto-incremented counters:
      "For example, in one incident at GitHub [13], an out-of-date MySQL follower was promoted to leader. The database used an autoincrementing counter to assign primary keys to new rows, but because the new leader’s counter lagged behind the old leader’s, it reused some primary keys that were previously assigned by the old leader. These primary keys were also used in a Redis store, so the reuse of primary keys resulted in inconsistency between MySQL and Redis, which caused some private data to be disclosed to the wrong users."
      Kleppmann, Martin. Designing Data-Intensive Applications (p. 250). O'Reilly Media. Kindle Edition.

    • @MrAtomUniverse
      @MrAtomUniverse Před 9 měsíci

      @@dz5483 in this case, people use slug which is unique , otherwise it's like youtube watch?v=qRv5g5MEF2k , let me know i anyone has a better method.

  • @tmbarral664
    @tmbarral664 Před 9 měsíci +4

    Hum…Hussein, I’m a kind of surprised here…. To a point I maybe doubtful….
    Here you’re showing the ordering of the whole rows ( as stating item as key value) when we’re talking about indexes (indices).
    Indexes are ordered (by nature ;)) and are pointing to the value, no matter where the value is.
    But here, you’re saying the primary index, the one for the primary key, is not behaving as a normal index….
    If that’s the case, that’s a major (I should use uppercase here) problem.
    That means that we shouldn’t be allowed, in any case(uppercase again) to use anything but autoincrement number !
    See, that’s so big it makes me doubt.

    • @berndeckenfels
      @berndeckenfels Před 8 měsíci

      For clustered index, also called IOT the value is in the index - however even if the value is not there you need to shuffle them

  • @Notion615
    @Notion615 Před 9 měsíci +9

    set page fill to 71% and periodically rebuild your index at low fragmentation percent and problem solved for indexes on uuid4

    • @durgeshchoudhary
      @durgeshchoudhary Před 9 měsíci

      can you please point to documentation for these? would be helpful.

    • @ehfoss
      @ehfoss Před 9 měsíci

      What shower of a db requires that you rebuild it all the time haha

    • @Notion615
      @Notion615 Před 9 měsíci

      @@ehfoss set up a service that runs every so often that rebuild your index programatically during periods of low load.

    • @Notion615
      @Notion615 Před 9 měsíci

      ​@@durgeshchoudhary To my fellow neckbeards who wish to know the black arts.
      czcams.com/video/jx-FuNp4fOA/video.html

  • @saggitt
    @saggitt Před 8 měsíci +1

    If your IDs are public and you make them sequential, be aware that third parties will know the number of entities created over a time period.

  • @michawhite7613
    @michawhite7613 Před měsícem

    If you need an index on usernames, won't you have the same problem?

  • @experimentalhypothesis1137
    @experimentalhypothesis1137 Před 7 měsíci

    does this apply also for uuid5?

    • @philheathslegalteam
      @philheathslegalteam Před 6 měsíci

      Yes.
      UUIDv6 is the implementation which is proposed to fix this

  • @AbhishekSingh-pu6dg
    @AbhishekSingh-pu6dg Před 8 měsíci

    The silent joke at 10:00 That's what she said. Haha, classic.

  • @SmartK8
    @SmartK8 Před 9 měsíci +17

    You never had to distribute databases into multiple locations, that insert independently and then access global data. I use UUIDs for all the data tables. You have 100 databases with millions of IDs. Just insert independently whenever you want, the worst thing that can happen is a rare collision hypothetically (never happened to me yet). They will still be OK with UUIDs. Now imagine this with auto-increment integer. Insanity. How would you keep track of this auto-increment ID and ensure it's unique across all databases, other then synchronizing them. Auto-increment integers as a key are for script kiddie databases, when you have one local database. I still wouldn't use integers even then. Because you never know, whether you'll need distributed database later.

    • @ConsuelaPlaysRS
      @ConsuelaPlaysRS Před 9 měsíci +1

      "Auto-increment integers as a key are for script kiddie databases" I'm willing to bet the house that most tables in most databases powering most applications use auto-incrementing PKs lol

    • @SmartK8
      @SmartK8 Před 9 měsíci +1

      @@ConsuelaPlaysRS That doesn't automatically mean there's a good reason to have such keys. Also most databases are probably local file databases. Everyone can do whatever they want. I'm just saying after some gathered experience, you realize that little gain in being sequential is worth sacrificing for a more robust and secure database.

    • @shubhambravo
      @shubhambravo Před 9 měsíci

      I work for a major b2c org and we use auto increment IDs to store order info 😅

    • @Ryan-gf1sz
      @Ryan-gf1sz Před 9 měsíci +2

      @@ConsuelaPlaysRS While I agree that's true and it's convenient in many case,it just won't work in large distributed system.

    • @omniphage9391
      @omniphage9391 Před 9 měsíci

      for the longest time you wouldnt distribute your databases if you could avoid it, i know that sql sharding exists but that is something you would have to build your persistence layer around. With the rise of nosql databases this has changed, although that in turn also has implications for design.
      Its engineering. The perfect solution is the one that works.

  • @VTOLfreak
    @VTOLfreak Před 9 měsíci

    And now you get locking contention on that last page. Fix one problem, create another.

  • @mitchdigitalnetwork
    @mitchdigitalnetwork Před 9 měsíci +1

    😂😂😂😂
    Seems to be a little bit small, so let's make it bigger............ that's what she said
    9:56

  • @banterfc2073
    @banterfc2073 Před 7 měsíci

    Why not use auto incremented IDs instead

    • @michawhite7613
      @michawhite7613 Před měsícem

      My understanding is that's not a problem in-of-itself. But when you have another vulnerability, this will make it way worse. For example, say I found a way to see a user's phone number. I do this for user 1, then user 2, etc. I can run the attack on every single user rather quickly. If the IDs are random, then I have to guess and check user IDs, and I won't be able to affect as many people.

  • @kraigochieng6395
    @kraigochieng6395 Před 9 měsíci +1

    10:02

  • @shubhambravo
    @shubhambravo Před 9 měsíci

    Use escalidraw

  • @ttst3
    @ttst3 Před 8 měsíci

    10:01 we said it at the same time 😂

  • @DavisTibbz
    @DavisTibbz Před 9 měsíci +1

    What about turning the uuid to binary(16) ?

  • @ArunGordon
    @ArunGordon Před 9 měsíci +1

    That's what she said 🤭

  • @jwbonnett
    @jwbonnett Před 9 měsíci

    UULD is not 100% accurate, the issue is the datetime ticks, not every generation, generates a tick. So you can still have this shuffling, though it won't be to the same extent.

  • @CaimAstraea
    @CaimAstraea Před 9 měsíci

    They never made sense to me. There's no force powerful enough in the universe to convience me I need them over autoincremented integers.

  • @adarshkr532
    @adarshkr532 Před 9 měsíci +1

    did he just make a that's what she said joke at 10:00 😂

  • @DirkSteinkamp
    @DirkSteinkamp Před 8 měsíci +2

    Sorry, but having a B+ tree with a page size of 2 is very close to degenerate the B+ tree to a binary tree -- and binary trees either need to be rebalanced regularly, or might degenerate into a linear list.
    Rebalancing is an expensive operation, especially if it involved a secondary storage like HDD / SSD. And that's what B+ trees are optimized for: to minimize HDD access, leveraging having some leeway for inserts and deletes without "reshuffling", and generally keep the tree depth low. This requires the page size to be considerably larger than two, and be harmonized with the block size on the disk.
    Besides that, different DBMS might offer different variants of indexes, e.g. hash indexes, that are not implemented as trees, but as hash tables. Depending on the use case this might be the index type to choose, and your argument becomes void.
    I personally don't like UUIDs very much in many use cases, but for other reasons: first is they are not easily graspable for humans, they don't tell a story, don't have some inherent use case information in their structure, ... yet they are very useful when randomness is required, because the systems involved only synchronize their data eventually. But besides that: too much cognitive load ...
    Anyhow: I don't buy your argument, sorry.

    • @amrojjeh
      @amrojjeh Před 8 měsíci

      Hash table would still suffer from performance as it would have to load from disk if inserts aren't stored close to each other

  • @hipertracker
    @hipertracker Před 9 měsíci

    Using numbers as the example is misleading. UUID are not ordered by design so you cannot sort them. I do not understand why the index has to be sorted? What about hash indexes?

    • @berndeckenfels
      @berndeckenfels Před 8 měsíci

      You can sort them by comparing their raw byte or hex string representations, both is used depending on your db type

    • @hipertracker
      @hipertracker Před 8 měsíci

      @@berndeckenfels yes, but it doesn't make sense, you have no control over the sort order.

    • @berndeckenfels
      @berndeckenfels Před 8 měsíci

      @@hipertracker for an index which is supposed to find a specific UUID the sort order does not matter as long as it can be used to find it.

    • @michawhite7613
      @michawhite7613 Před měsícem

      Hash indices are even slower to update. Instead of updating a few nodes, you now have to update every single bucket, since ID % (number of buckets) is now resulting in a different value for every single ID.

  • @ByteBite_Tech
    @ByteBite_Tech Před 9 měsíci

    the speed of the video is 1.25x

  • @BrianWoodruff-Jr
    @BrianWoodruff-Jr Před 8 měsíci

    I get the sense you hate indexed random UUIDs in databases.

  • @Cthulch
    @Cthulch Před 8 měsíci

    Do you need to have your index ordered? A linked list? TERRIBLE. A tree? gosh! I always imagined random uuids would be utilizing hash table indexes, which is completely useless to sort. Using random numbers with a sorted index seems like a pretty dumb mistake. I feel like when you say "index", you don't even consider that there's more than one index type.

    • @michawhite7613
      @michawhite7613 Před měsícem

      Hash indices are really fast for select queries, but they make the updates even worse. If a page gets full, now instead of just updating a few nodes, you have to update every single bucket, since the ID % (number of buckets) will result in a different value now.

    • @Cthulch
      @Cthulch Před měsícem

      @@michawhite7613 If it was a student who wrote the hashing lib then yes. In SQL, however, it's so well optimized that the page overflow happens extremely rarely and it's not as dramatic as it may seem. Plus in vast majority of cases it's totally worth it to have inserts longer if it means having O(1) for reading. And the complexity of the inserts is also linear or at least log(n), so insignificant.

  • @mmm-ie5ws
    @mmm-ie5ws Před 6 měsíci

    why do you have to make this a 20 min video???

  • @jamesh4129
    @jamesh4129 Před 9 měsíci +1

    First!

  • @silicalnz
    @silicalnz Před 9 měsíci +12

    you've never worked with distributed databases 😂

    • @roy20021
      @roy20021 Před 9 měsíci +1

      can you elaborate?

    • @voicevy3210
      @voicevy3210 Před 9 měsíci

      ​@@roy20021you need uuid to handle data migrations, data should be independent of where it's located

    • @malibudancer8472
      @malibudancer8472 Před 9 měsíci

      @@roy20021 he is correct.

    • @roy20021
      @roy20021 Před 8 měsíci

      @@malibudancer8472 ok, can you elaborate a bit more?

    • @malibudancer8472
      @malibudancer8472 Před 8 měsíci

      @@roy20021 I don't really have time to spend time on a stranger on youtube, and you can't really post links on youtube without getting your content deleted. So I can only guide you what to google for.
      Read about the partitioner of Cassandra and the hash function used. Search for partitioning hot spots and educate yourself about it.
      Also all relevant databases optimize their indexes for truly random UUIDs and allow tuples to be indexed.
      That is all I can do for you, if I could I would have posted you direct links instead. Thank youtube for being a P**e of s***.

  • @andrewbrown8463
    @andrewbrown8463 Před 5 měsíci

    There is a big difference between a database that understands a uuid type and is storing it as a number compared with a uuid stored as a string (varchar). Ultimately though using either a int64/bigint or uuid is the right way to create a primary key for a table. Never should a user entered value be a primary key. Otherwise the rest of this video is mostly garbage or irrelevant in the scheme of things when it comes to performance in a database.

  • @solitary200
    @solitary200 Před 8 měsíci

    ULID > UUID

  • @RaduCruceru
    @RaduCruceru Před 9 měsíci +24

    All talk without showing numbers. You can theorize on how the index performs all you want, but you should take the time to run some performance tests before-hand and present the data. Otherwise this video is just basic guessing and a waste of time.

    • @nbme-answers
      @nbme-answers Před 7 měsíci

      I appreciate the conceptual overview

    • @AtomicCodeX
      @AtomicCodeX Před 5 měsíci +1

      What you expect? Watch him calculate how long it takes to conclude which method they use?
      You do realise he’s talking about how it works, a performance test won’t help with that.
      If you don’t have time to learn and just need to know which is faster, then just do that. Instead of shutting on other peoples work.

  • @opticalmouse2
    @opticalmouse2 Před 8 měsíci +2

    You're like my college professor who lives in a fantasy world.