Algorithms behind Modern Storage Systems

Sdílet
Vložit
  • čas přidán 6. 09. 2024
  • QCon San Francisco International Software Conference is back this November 18-22, 2024. Software leaders at early adopter companies will come together to share actionable insights to help you adopt the right technologies and practices.
    Get exposed to new ideas and innovative approaches to software development and engineering, guaranteed to inspire and challenge you.
    Don’t miss this opportunity to take your knowledge and skills to the next level and stay ahead in the fast-paced world of technology.
    Register now: bit.ly/3Tc73rM
    ------------------------------------------------------------------------------------------------------------------------------------
    Video with transcript included: bit.ly/2WZvYzb
    Alex Petrov talks about modern storage system approaches, discussing storage internals, and evaluation techniques to choose a database with the optimal read, write or memory overhead, best suitable for a certain data.
    This presentation was recorded at QCon San Francisco 2018: bit.ly/2uYyHLb
    The next QCon is QCon London 2019 - March 4-6, 2019: bit.ly/2hxsoN1 . Save GBP75 with “INFOQ75”
    For more awesome presentations on innovator and early adopter topics check InfoQ’s selection of talks from conferences worldwide bit.ly/2tm9loz
    Interested in Artificial Intelligence, Machine Learning and Data Engineering? Follow the topic on InfoQ: bit.ly/2rrEicK
    #DataStorage #Algorithms #Database #InfoQ #QConSanFrancisco

Komentáře • 17

  • @Xeoncross
    @Xeoncross Před 2 lety +6

    Starts at 16:00 with LSM-Tree's if you're already aware of sequential vs random access

  • @ameynaik2743
    @ameynaik2743 Před 2 lety +12

    Not for a beginner. Good talk to revise the concepts. Highly recommend reading Chapter 3 in DDIA book.

  • @arijit_ad
    @arijit_ad Před 5 lety +6

    Enjoyed the talk. Thanks.

  • @charan7240
    @charan7240 Před rokem

    one of best talks about database read and writes

  • @mullergyula4174
    @mullergyula4174 Před 2 lety +1

    It was a joy to watch.

  • @mr-boo
    @mr-boo Před 4 lety +4

    Great talk, much appreciated! :)

  • @manan4436
    @manan4436 Před 2 lety +1

    Amazing talk

  • @jonnytheponny5753
    @jonnytheponny5753 Před 3 lety +4

    good talk, but has one flaw: He has too less slides. it is not good (for beginners/learners) if too much is explained without having backed that by slides.

    • @KPTalksStuff
      @KPTalksStuff Před 3 lety

      Yeah, true. Lot of talking with the same slide on, the slide just becomes a distraction and also boring I guess. I can see people talking about lack of visualizations when talking about database. Lot of scope for improvement and content for databases I guess! ;)

  • @benevolent6705
    @benevolent6705 Před 3 lety +5

    In 19:44 it is assumed that ss-tables have a synchronized clock because their entries have a key and timestamp. What method is used to synchronize the clocks of separate nodes that contain ss-tables?

    • @altanozlu8268
      @altanozlu8268 Před 3 lety

      Use NTP

    • @SimonBuchanNz
      @SimonBuchanNz Před 3 lety +1

      It's handy to think about what it actually looks like for this to matter: you have multiple nodes being written to with different values for the same key at close to the same time, so this is essentially just the multiple master/primary node problem. Either it's fine for one of those to win, or you already need something like a mechanism for optimistic update where the nodes can agree on which is the existing latest value that is getting replaced and that the incoming write was from a client that knew about it.
      The simple answer is have a single primary node that writes go to, and use its clock. You can be more clever and determine a different primary for each key based on hash to spread the load, which then replicates to the other nodes for resilience. You can still have multiple primaries for a key, but generally that involves then knowing about each other and pushing any received updates to each other, along with the common timestamp, so that communication had to take into account that there's clock differences, time lag, and concurrency issues to consider.
      Note that making it a timestamp isn't even needed, an auto increment version number works too with most of these approaches, but using a timestamp can be handy.

  • @subusrable
    @subusrable Před 3 lety

    Awesome

  • @milossimicsimo
    @milossimicsimo Před 2 lety

    Great talk