Writing Garbage Collector in C

Sdílet
Vložit
  • čas přidán 12. 06. 2021
  • References:
    - Source Code: github.com/tsoding/memalloc
    - Previous Episode (malloc/free): • Writing My Own Malloc ...
    - Data alignment: Straighten up and fly right: developer.ibm.com/technologie...
    - My JSON Serialization Library: github.com/tsoding/jim
    - Print out value of Stack Pointer: stackoverflow.com/questions/2...
    - gcc.gnu.org/onlinedocs/gcc/Re...
    Support:
    - Patreon: / tsoding
    - Twitch Subscription: / tsoding
    - Streamlabs Donations: streamlabs.com/tsoding/tip
    Feel free to use this video to make highlights and upload them to CZcams (also please put the link to this channel in the description)
  • Věda a technologie

Komentáře • 70

  • @InmuAyuayu
    @InmuAyuayu Před 6 měsíci +91

    This guy: writes Garbage Collector in C
    Me: writes Garbage in C

  • @rogo7330
    @rogo7330 Před 2 lety +62

    Garbage collector: *collects itself*

    • @Mrkenjoe1
      @Mrkenjoe1 Před rokem +7

      "I'm trash, so I took myself out." -The garbage collector probably.
      (P.S. this is a joke)

    • @rogo7330
      @rogo7330 Před rokem

      @@Mrkenjoe1 you writing code in C? Probably you doing something silly with your data (I mean variables). If you still have issues with describing what exactly this code is doing and why its here, or if it's here because of "C didn't give me something-something", try to write down what you want this program to do (in user pov, not for programmer) and start again.

    • @Mrkenjoe1
      @Mrkenjoe1 Před rokem

      Removed

  • @OdyseeEnjoyer
    @OdyseeEnjoyer Před 2 lety +68

    I've a lot of experience with C and padding (specially with bitfields) but your way of teaching is so interesting that you've made me to stay for the entirety of the video

  • @LuisMatos_ahoy
    @LuisMatos_ahoy Před 2 lety +27

    "I am a random person from the internet allowing you to use global variables!"
    Nice 😂😂

  • @nivelis91
    @nivelis91 Před 3 měsíci +1

    After watching this I immediately hate this channel for how many videos you have, how long they are, and that I'm not gonna be able to see them all in one weekend.

  • @playerguy2
    @playerguy2 Před 2 lety +12

    14:11 you are mostly correct.
    essentially, the compiler is guaranteeing alignment of the datatypes involved.
    Most machines have memory alignment requirements for most datatypes.
    On x86_64:
    Chars cannot have one nybble in one address and the other in another address.
    Shorts must be aligned to 2 bytes.
    Ints must be aligned to 4 bytes.
    etc, etc..
    Same applies to (hardware-)vector datatypes. (look up vectorized instructions for more info.)
    So you have rightly observed that chars have a smaller alignment requirement than the pointer, and as such can be packed more effectively.
    Printing the addresses would've confirmed that.
    If that behavior is not desirable (for example, if you'd rather pack data as densely as possible) you can use the pack pragma in C/C++.
    The downside is more costly reads and writes.

    • @playerguy2
      @playerguy2 Před 2 lety +2

      bonus: the exception you mentioned is often called a "bus fault" (similar to a segmentation fault).
      It is an error the cpu catches (or doesn't, depends on the arch) and often calls a vector to handle and cleanup.
      Linux handles this pretty well even on the Raspberry pi.

    • @heraldo623
      @heraldo623 Před 2 měsíci

      Machines have no concept of data types. It operates on words. Registers are wordsized. Memory load reads a word from the memory to a register. Memory store writes a word from a register to the memory.
      Data types are defined by the programming language standard and enforced by its compilers.

  • @greob
    @greob Před 3 lety +2

    That was really interesting, thanks for sharing.

  • @tresuvesdobles
    @tresuvesdobles Před 2 lety +18

    Pretty cool video! BTW: If there is a cycle of pointers among several chunks in the heap (it may not necessarily be a two-cycle) your code will enter a never-ending loop until you reach recursion limit and cause a Stack Overflow. A naive solution would be to set a manual recursion limit on mark_region, but then cyclic chunks would never be deallocated.

    • @beyondcatastrophe_
      @beyondcatastrophe_ Před 2 lety +11

      No, it won't, the recursion is only entered if the chunk wasn't reachable before, so it won't try to check in a cycle

  • @heraldo623
    @heraldo623 Před 2 měsíci +1

    You should mark as alive only the heap chunks that can be reached from the stack. It will make self-referenced chunks to be collected too (circular pointers). If you take the heap itself as root of the "references tree" circular refs will stay alive forerer, thus leading to memory leaks.

  • @adamkrawczyk9261
    @adamkrawczyk9261 Před 2 lety +3

    15:06, just bookmarking but good shit. It's pretty interesting

  • @liebranca
    @liebranca Před 2 lety +2

    sizeof(size_t) != sizeof(intptr_t), though generally it doesn't matter c:
    Cool stuff btw. Subscribed.

  • @xirate7091
    @xirate7091 Před 9 měsíci +1

    45:55 there's some Terry Davis energy here

  • @aFlyingElefant
    @aFlyingElefant Před 3 lety +15

    In what video did you develop your arena allocator? I cannot find any reference to it in the faq or the repo or anything.

    • @TsodingDaily
      @TsodingDaily  Před 3 lety +13

      I don't remember. I implemented my own arenas many times in different projects throughout the years, sorry.

  • @freestyle85
    @freestyle85 Před 3 lety +13

    I think you can don’t use builtin gcc function for get the frame address. Just declare on the stack some variable of type uintptr_t and get pointer to it by the “&” operator.

    • @TsodingDaily
      @TsodingDaily  Před 3 lety +11

      Yep, there are many ways to obtain the stack pointer as discussed in stackoverflow.com/questions/20059673/print-out-value-of-stack-pointer which I linked in the description as the reference. :)

    • @freestyle85
      @freestyle85 Před 3 lety +7

      @@TsodingDaily How could I have missed this? Sorry! :-)

  • @Local_Nerd
    @Local_Nerd Před rokem

    Thanks

  • @pechasetentidos5854
    @pechasetentidos5854 Před 8 měsíci

    And what if i have in the stack frame some argument passed to some function that is in the range of the heap base address+size?

  • @berndeckenfels
    @berndeckenfels Před měsícem

    I debugged the traditional GC of gofer (Haskell variant) on DOS because it did not use pointers but “cell index” which way too often tainted memory for integer value (with only a few hundred cells available that hurt very much).

  • @siddharthsinghchauhan8664
    @siddharthsinghchauhan8664 Před 2 lety +16

    Although his sessions are not educational per se... But damn they are awesome... I watched this particular one many times.. to figure stuff out for design interviews

  • @FsKir
    @FsKir Před rokem +1

    Не знаю сишек, но было интересно посмотреть. Спасибо за видео

  • @chrisalexthomas
    @chrisalexthomas Před rokem +1

    you wrote a what?? wow, you woke up today and chose violence... :D

  • @alexandrohdez3982
    @alexandrohdez3982 Před rokem

    👏👏👏👏

  • @_luisgerardo
    @_luisgerardo Před 2 lety

    Good thumbnails haha

  • @tsilikitrikis
    @tsilikitrikis Před rokem

    I FCNG LOVE YOU

  • @cheebadigga4092
    @cheebadigga4092 Před 10 měsíci

    I think the struct Foo is 16 bytes instead of 9 because of how GCC does structs. It has a special "packed" mechanic for structs which prevents it from aligning to the full next 8 bytes, but only if you're explicit about it (GCC extension to the C standard). Maybe clang would've behaved differently. Maybe not. But either way, it's the compiler who does the alignment, not the kernel or libc.
    Tried it out: typedef struct { char i; void *ptr; } __attribute__((packed)) FooPacked;
    results in 9 bytes instead of 16.
    So essentially as long as you're on x86_64 GCC you can be 99% certain that the alignment is done for you by GCC if you're not explicitly tell it to do otherwise. I don't know about other architectures so I can't say anything about them.

    • @user-oe4id9eu4v
      @user-oe4id9eu4v Před 9 měsíci

      Actually not really ..
      C struct size and field offset is strictly defined by rules. One rule defines that pointer type should always be aligned to architecture size.
      That is C standard well-defined behaviour.

    • @cheebadigga4092
      @cheebadigga4092 Před 9 měsíci +1

      @@user-oe4id9eu4v how does that deny anything I said? If GCC implements the C standard correctly, then "how GCC does structs" effectively means the same as "how the C standard defines how structs should behave ".

  • @galbalandroid
    @galbalandroid Před 2 lety +2

    What is the editor you use in the terminal? great vid by the way! :D

  • @berndeckenfels
    @berndeckenfels Před měsícem

    With struct #pragma pack applies - but that does not apply to the location of data segments

  • @Jurasebastian
    @Jurasebastian Před rokem

    i can fill heap with random data and get accidental pointer to heap

  • @koftabalady
    @koftabalady Před 10 měsíci +2

    How can I learn this stuff? can someone give me a simple roadmap or anything?

    • @khuntasaurus88
      @khuntasaurus88 Před 9 měsíci

      Learn what stuff

    • @koftabalady
      @koftabalady Před 9 měsíci

      @@khuntasaurus88 Operating systems, compilers, interpreters, garbage collectors, assembly, low level networking, advanced machine learning without libraries and being good at doing all of this...

    • @FF-hy4ro
      @FF-hy4ro Před 9 měsíci +1

      @koftabalady going through Tsoding videos and pausing to try and implement the solutions yourself before he does may be a good way

    • @_start
      @_start Před 2 měsíci

      @@koftabalady you have online free book called "Crafting Interpreters" in that you will learn how to make your programming language in C, this will teach you how to create 4 data structures (rope, hashtable with open addressing and simple string hash function, dynamically allocated array and implicit linked list). Additionally you will implement simple mark and sweep garbage collector. Also it will introduce you with the Pratt parsing which is the best parsing method for parsing with precedence. Final product will be garbage collected interpreted programming language that emits its own bytecode, you can also expand on it by making it much more embeddable with C/C++.
      If you wish to learn about low level networking you should probably check out Unix Sockets Programming by Stevens, W. Richard (3rd edition). This will teach you about IPv4/IPv6 (Internet protocol v4/v6), TCP (Transmission Control Protocol), UDP (User datagram protocol), SCTP (Stream Control Transmission Protocol), etc.. You will be coding this in C also, it basically teaches you all of the stuff there is to know about programming with sockets in C and you will know how these protocols work under the hood.
      For operating systems I would recommend the book "Operating Systems: Three Easy Pieces" , you should also write few basic programs in assembly (recomending fasm or nasm assemblers) before trying to write your own OS because you will need to write some assembly. I would also reccomend coding along osdev.org and take it easy, creating your own OS is considered the hardest thing you can do as a programmer (depending on which features your OS will have or what hardware it will support, etc...).

    • @rusi6219
      @rusi6219 Před 25 dny

      ​@@koftabaladytry a computer science degree and lots of practice

  • @mixail844
    @mixail844 Před měsícem

    didn't get, why he made recursion in mark_reachable function?

  • @alexandersemionov5790
    @alexandersemionov5790 Před 2 lety +1

    Can you implement Ownership like in Rust?

  • @sangalamballa6975
    @sangalamballa6975 Před 6 měsíci

    i have a question, how can we install it to JDK?

    • @MrChelovek68
      @MrChelovek68 Před 6 měsíci

      Я тоже,но на вид это один из лучших языков. Прямая работа с памятью и абсолютная гибкость-привет указателям.

  • @kelali
    @kelali Před rokem

    9:36 realy weird

  • @FsKir
    @FsKir Před rokem

    Hey, what graphic editor you used in this video?

  • @brightprogrammer
    @brightprogrammer Před rokem

    I love the way u sarcastically praise other languages 😂

  • @dragonlp7144
    @dragonlp7144 Před 2 lety +2

    which ide is he using

    • @celdaemon
      @celdaemon Před 2 lety +2

      It looks like vim

    • @tcroyce8128
      @tcroyce8128 Před 2 lety +7

      @@celdaemon It's a vanilla Emacs and his own customization IMO. Does not feel like any off the shelf Emacs variant. Vim can do a lot of his text naviation and redirecting cmd output into buffer operations, but that is stretching vim's ability a bit. The distinction comes from how the statusbar is rendered in Emacs.

    • @joekerr5418
      @joekerr5418 Před 2 lety +1

      the best ide you could ever possibly use in your life

  • @user-lt9ey6gw3x
    @user-lt9ey6gw3x Před 5 měsíci

    Not really Garbage Collector but nice video.

  • @IndellableHatesHandles
    @IndellableHatesHandles Před rokem +1

    Python's GC is garbage, so will it collect itself?

  • @meJevin
    @meJevin Před 2 lety +1

    дико кринжанул с начала видео.
    в кратце - чел открыл для себя cache lines и выравнивание памяти.
    поздравляю, i guess.

    • @DelphiPro
      @DelphiPro Před 11 měsíci

      А чтобы не гадать долго, надо было вывести просто дамп занятой памяти структуры в лог, чтобы посмотреть что куда двигается ))) Пишу, посмотрев 18 минут видео

  • @chrisalexthomas
    @chrisalexthomas Před rokem +1

    You're only allowed to use global variables, when you know exactly and precisely why you're NOT allowed to use them and all the reasons against using them and can explain the reasons why in detail. Then you qualify. Until then. You aren't allowed. End of discussion :)

    • @pattyspanker8955
      @pattyspanker8955 Před 10 měsíci

      explainnnn

    • @drednaught608
      @drednaught608 Před 8 měsíci

      I don't know, global variable constants seem to be common enough and is not bad practice if immutable.