ULID vs UUID: Which One Should You Use?

Sdílet
Vložit
  • čas přidán 4. 06. 2024
  • ULID, UUID v7 (Time-Based IDs) and UUID v4 (Random Ids) offers completely different characteristics.
    Learn about the data layout and when to prefer one or the other based on your type of database and behaviour you want to obtain.
    Learn how ULID can cause hotspots in distributed databases, and how UUID v4 can cause fragmentation and increased disk I/O in traditional databases.
    Slides are available here: speakerdeck.com/matteobertozz...
    Chapters:
    0:00 Data Layout
    0:50 Rand-based IDs Storage behaviors
    1:21 Time-based IDs Storage behaviors
    2:10 String Encoding characteristics
    2:33 Use cases and usage
    #database #backend #coding #codingtips #programming #databasedesign #backenddeveloper #databases
  • Věda a technologie

Komentáře • 16

  • @rajughorai7483
    @rajughorai7483 Před 6 měsíci +3

    🎯 Key Takeaways for quick navigation:
    00:00 🧱 Both UUID and ULID are 128-bit data blocks, but ULID's 48-bit timestamp field allows for lexicographical sorting and time-based locality.
    00:28 🔄 ULID is not directly compatible with UUIDs; however, UUID version 7 shares similar properties with ULID.
    00:57 🗃️ Random-based IDs like UUID v4 are beneficial for distributed databases due to good data distribution across nodes, but can lead to fragmentation in single-machine databases.
    01:23 ⏳ Time-based IDs like ULID improve disk I/O and cache efficiency in single-machine databases, but may create hotspots in distributed systems.
    02:18 🔢 ULID uses Crockford's base32 for a more compact string encoding, maintaining lexicographical order, and is suitable for generating offline IDs in multi-machine environments.

  • @Aceptron
    @Aceptron Před 7 měsíci +1

    Great explanation! Concise, yet covered great questions.
    Loved it! ❤

    • @th30z-code
      @th30z-code  Před 7 měsíci +1

      Glad you liked it! thanks for watching!

  • @AK-vx4dy
    @AK-vx4dy Před 8 měsíci +2

    Quick and good explanation! Good job !

  • @vikingthedude
    @vikingthedude Před 5 měsíci +1

    This is good stuff. I hope to see more backend/architecteture/system design content like this. This reminds me of Hussein Nasser. I love it

    • @th30z-code
      @th30z-code  Před 5 měsíci

      Thank you! really appreciated.

  • @user-jq4li8kj8o
    @user-jq4li8kj8o Před 2 měsíci

    Great video, thanks!

  • @mofo78536
    @mofo78536 Před 9 měsíci +2

    Would be nice for a quick rundown on ULID vs UUIDv7 and practical considerations between it (e.g. can it be converted between both?)

    • @th30z-code
      @th30z-code  Před 9 měsíci +1

      From a data block point of view ULID and UUIDv7 are almost the same thing, since they are both 48bit timestamp + "random data".
      But the 6bit for version and variant of the UUID format makes them incompatible.
      The conversion from ULID to UUID will result in losing information.
      Truncating those 6bits and replacing them with v7 and the variant should not be a problem, but in the worst case it may generate collisions.
      Converting from UUIDv7 to ULID is not a problem since there is no data loss.
      From a textual representation point of view, the ULID has the advantage to be shorter.
      Unfortunately there is no standard support for that, but with few lines of code you can also encode the UUID to Base32 and the Base32 to UUID assuming you are interested in converting them (e.g. if you are using type UUID "type" that your language provides).
      I hope I have answered your question.
      Thanks for watching!

    • @mofo78536
      @mofo78536 Před 9 měsíci

      @@th30z-code yeah I wonder if the difference of 6 bits would make a difference in practical application as of course the length would effect how likely a birthday collision may occur in a distributed system.
      I am also checking with the spec writer if it would make sense to also standardise on an optional extended ULID that would include a checksum as there may be a chance that people may be writing these IDs down on paper as well. Was looking at Crockford bade32 checksum char which adds 4 more symbols, but his use of = in checksum chars would be hard to use in a url... So might be better if it was appended with a _ to the original ULID string but where the extra characters are the CRC-8 or something easily implemented check instead.

    • @th30z-code
      @th30z-code  Před 9 měsíci

      In practices you can ignore the probability to generate a collision with both UUID/ULID.
      I've seen many systems in production blindly trusting the random implementation (but don't use javascript Math.random(), or any pseudo-random generator for your implementation).
      But yeah, from a numeric point of view, the missing 6bit makes a difference:
      with ULID the probability of collision in a given millisecond is 1 out of 1208925819614629174706176 (2^80).
      with UUIDv7 the probability of collision in a given millisecond is 1 out of 18889465931478580854784 (2^74).
      but it's still a really really low probability.
      To me, as for all the error handling logic, the question is:
      Is my code putting a life at risk or not. if not, you can trust the implementation and don't bother about the collision probability.
      If your ULID/UUID are generated from an enumerable number of machines,
      you can switch to a different ID generation strategy and remove completely the problem of collisions and even reducing the length of your IDs.
      Check out the Twitter Snowflake IDs: czcams.com/users/shortseHr2EIRcYZY

  • @kennedydre8074
    @kennedydre8074 Před 6 měsíci

    Great video! Thank you for the amazing explanation, I have a question. How can I use uuidv7 or ulid in my typescript project?

    • @th30z-code
      @th30z-code  Před 6 měsíci

      You can use 3rd party libraries that provides already simple functions to generate the uuidv7/ulid.
      github.com/LiosK/uuidv7
      github.com/kripod/uuidv7
      github.com/aarondcohen/id128
      github.com/ulid/javascript

  • @abdelrahmandwedar
    @abdelrahmandwedar Před 2 měsíci

    At the moment 1:21 I thought then random-based IDs would be more suitable for databases that's frequently being used for sharding like MongoDB, while time-based IDs would be more suitable for most of other databases with b-tree data structure.

  • @AxioMATlC
    @AxioMATlC Před 12 dny

    For a primary key, neither ULID or UUID is preferable vs an integer for the sake of inefficient storage which will take up precious memory if you want to keep your indices all in memory (you do). a global ID is great for generating IDs without communicating to the database to get an auto incrementing ID, or for using it as a external ID that cannot be exploited/guessable by users. You should have an int/bigint primary key and additionally/optionally a UUID or ULID depending on your use-case