I made JIT Compiler for Brainf*ck lol
Vložit
- čas přidán 29. 06. 2024
- References:
- Project idea by github.com/bit69tream
- Wikipedia - Brainfuck - en.wikipedia.org/wiki/Brainfuck
- Computer Enhance - Byte Positions Are Better Than Line Numbers - www.computerenhance.com/p/byt...
- flat assembler - flatassembler.net/
- ChromiumOS - Linux Syscall Table -chromium.googlesource.com/chr...
- Tsoding - bfjit - github.com/tsoding/bfjit
Chapters:
- 00:00:00 - Announcement
- 00:00:42 - Intro
- 00:06:00 - Hello, World
- 00:09:36 - Intermediate Representation
- 00:52:09 - Interpreter
- 01:02:03 - Flat Assembler
- 01:14:24 - Raw Binary Image
- 01:18:41 - runbin
- 01:36:08 - JIT compiler
- 03:02:45 - Outro - Věda a technologie
"Your scientists were so preoccupied with whether or not they could, they didn't start to think if they should"
Stop* not start lol. But yeah, this quote has become so real today. You could say this about an innumerable number of things.
@@iWillAvert The quote I used was from the book where it says "start" rather than "stop"
Not for any other reason than I was lazy about copy and paste and that is the quote that showed up first
I don't believe you, what book? If it was Jurassic Park, the phrase was only used in the movie and they said stop. There is no phrase in the book, ctrl-f "Particle accelerators" and look above, where it would be: "They never stop to ask if they should do
something." @@jayshartzer844
Excuse me but I do not believe you sir, what book? If it was Jurassic Park, the phrase was only used in the movie and they said stop. There is no such phrase in the book, ctrl-f "Particle accelerators" less than two paragraphs above, where it would be: "They never stop to ask if they should do something."@@jayshartzer844
You should do more low level programming like compilers and vm's. Very entertaining and great learning experience.
I agree, it reminds me of A86 assembler... fun time.
tsoding will literally build whole architecture with lexer, ir compiler, interpreter and jit for language that only supports 8 characters. beautiful
I'd love to see the TVM (Tsoding Virtual Machine) with async, multi threading and JIT. It's gonna be the next BEAM
amistah azozeen should get back to bm (birtual machine) for that
0 is usually reserved as 'illegal instruction'.
Most memories will statically have mostly 0 after power-up and also flip-flops reset to 0 in most cases (unless you have a specific reason to change that).
So if you have anything just at reset value and not explicitly initialized, it will trigger an illegal instruction exception, which is what you want.
RISC-V does that as well, in fact RISC-V does not even have NOP, it is a Pseudo-instruction performing an ADD with target register fixed 0, so it does not have any effect, but NOP in RISC_V is actually an ADD.
Same goes here for x86. All 0 is not really a desirable value for a valid instruction.
Huh, RISC-V not having a NOP is interesting, although it does make sense
@@chri-kNot sure about other architectures but I could imagine there are other instruction sets doing the same. No reason to waste decoder space if you can model a 'do nothing' through other means.
@@markusdd5 on x86 nop is an actual instruction, as far as i can tell the reason is some edge-case optimisation: If it actually did anything, it could cause a pipeline stall or block OOO and waste a few cycles ( oh no, the instruction that's supposed to waste cycles is wasting cycles )
I just want to say I think you're brilliant and it's a pleasure to watch you craft code and explain your methods
I think this could be a very useful resource for people who want to make their own languages, especially for the code generation part.
Also would love to see some benchmarks of interpreter vs JIT compiled assembly.
Can confirm: Watched all the way to the end. Us youtube plebs aren't weak
Love your thumbnails. And the content too of course :)
Great video, really enjoy this low level stuff, hard to find gems like this! THIS IS POGGERS!
Great video, easy and steady explained "complex" theory! Thanks for the great stuff!
Mate, I watched this whole thing on YT to the end. Awesome stuff. If you ever feel the inclination to go mega deep into obscure lexer/parser methodology, I'll be well up for watching. Awesome video as always.
Never seen any of your videos but this is right up my alley. Laughed out loud when you said "Was ist das für ein Haufen Scheiße meine Freunde?"
I had to rewind to make sure my brain wasn't playing me
Awesome video, what you made is way too cool for only three hours
I haven't watched the video yet, but I read the Computer Enhance article in the description and already implemented byte-aware diagnostic messages as an option in my compiler. Very good idea, very easy to implement, I will be part of the change.
C++ developers be like: Is this operator overloading?
Around 3:00:30 it's nice to see other people have the weird feeling, when you have mental list of tasks that need to be written to be "done" with the problem, and you write the last part and it works, there is the weird disconnect of the mind of being in a - write what you thought of - state. And it refuses to leave that and run it for a bit, expecting there to be more on the todo list.
It's hard to describe that state of mind, leaving writing mode and going into testing mode - which sometimes might show a bug and back to writing mode but when it doesn't, there is a weird void for a bit.
this was an epic episode !!
Brainf*ck was originally designed in an attempt to write the simplest and smallest compiler. It's a fascinating programming language that is surprisingly slow to write and slow to run. But it IS Turing-complete.
And I think it is a lot of fun!
it lacks the infinite tape part.
There’s really not much surprising about how slow it is to write… ;)
@@replikvltyoutube3727every language lacks that part. consensus in computer science is, that it does not matter, because that theoretical turing machine is physically impossible to create.
@@replikvltyoutube3727 just buy more ram lol
@@replikvltyoutube3727just download more ram and store it in your RAM, but make sure to save a copy of it to disk too.
Exponentially growing RAM.
That was a heroic effort by the end man -- way cool and very inspiring!
Do you script these things before you start? I love the way you just bite down, break the problem up and solve it like that - You've got a really good process and it'd be awesome if you have a blog post about how you do it! Seriously, Thank you and well done!!
When I read the title I knew it was you Tsoding. I have had this itch before, lol.
Minor nitpick: you can just do "TEST al,al"; (which is a two byte instruction) no need to zero out the higher bits with "XOR ax,ax"
A real legend
When are you going to release merchandise?!?! We love you
wow, I had a similar idea a couple of days ago. I was thinking about compiler optimizations to the intermediate representations. Of course the reduction of many +++++ into one is nice, maybe also [[[? There are also more advanced stuff like [-] that can be optimized, looking forward to it
Really cool!
Those null bytes in the opcode strings are making me ANSI
Also heads-up, signed overflow is UB in C (but unsigned overflow is not)
This is peak tsoding material.
An actual compiler, neat!
Your programming skills are very good, but what I am really impressed by is at 2:03:11
Bro, I am german, I have a suggestion for you. Let out please Fäkal Sprache. Fäkalsprache your mama will say it is not nice, ok? lol Nichtsdestotrotz, nice contribution my friend. Appreciating learning from such a crazy brain like you lol Best Regards from Berlin
Yes, I'm still watching 😁 Btw, can you do some perl script video?
Similar to your convenient use of increment/decrement in this video, you can shorten your while loops by a line by using assignment as an expression, from:
char s = lexer_next(&l);
while (s == c) {
++count;
s = lexer_next(&l);
}
to
char s;
while ((s = lexer_next(&l)) == c) ++count;
NO
it was awesome. like super duper awesome 😎👍🏻
You don't need the asserts for adding / subtracting. The modulo operation you're looking for is with respect to 256 not 255. Since you're adding to a single byte, all multiples of 256 do not change the result, so the single (operand & 0xFF) is already sufficient.
Watched the whole video, it was really interesting.
I think if I wrote something like this, I would try to use inline assembly, since you can take pointer of label(like goto one) or pointer of function.
I ma binge watch this
NGL, I first read the title as "I made a JIT Compiler *in* Brainf\*ck lol", and was way more concerned.
This title is promising.
This is how programming content should be,
unlike others wasting 80% of time talking, 10% watching mems and sometimes coding.
Awesome content.
That reminded me the joy when I did BF run in constexpr in C++.
1:41:21 the Russian in him finally spoke out
i scream every time zozin writes % 255 instead of % 256
you can kind of intuitively get modulo if you think about it like decimal
7 + 255 = 6 if 8 bits
same way
7 + 9 = 6 if 1 "bit" (decimal)
BUT
7 + 256 = 7 if 8 bits
and
7 + 10 = 7 if 1 "bit" (decimal)
which is the same as (7 + 10) % 10 = 7
and the same rule applies to binary
an easy to remember rule is to modulo with value that causes an "overflow" (9 + 1 = 10, we overflow to using two digits, so we modulo with 10, 0xFF + 0x01 = 0x100 = 256, so we modulo with 256)
sorry for a wall of text, i hope my schizo explanation helped someone 🙏🙏🙏
another way that doesnt involve any math is to modulo with the lowest value that's not possible with a given representation
for example
the lowest value not possible to represent with 8 bits is 256
the lowest value not possible to represent with 1 decimal digit is 10
etc etc
I helped you scream while watching, too, hehehe:)
Also: Didn't you assembler n00bs see that x86-64 opcode 0x8a is "Move r/m8 to r8"? (RM byte stands for register or memory, size). Shouldn't he using imm8, immediate mode? His "mov al, byte [rdi] (the brackets!!!)" fetches the byte (m8), from the memory location where rdi points(!) to. I am not exactly sure, but there wasn't going much pointer magic in the brainf. implementation going on, just something will be stored INSIDE rdi and not at the location where rdi points to. But I may be wrong. Have a good one Redstonek and coders, and greetings to Poland from Germany:)
Edit: Got it, that are the "memory" locations the brainf. implementation uses as intermediate calculation storage, its memory. All good, I missed that on the first look, because I would've used the stack for that:)
my actual instant reaction "YOU MADE WHAT??"
Awesome project! Will nmap work similar on windows? How much of a headache would this project be in windows?
43:03 You don’t have to apologise to us, we find it hilarious (at least I do)
JIT compiler hmmm...
"Haufen scheiße, meine Freunde" :D You know some German!? wtf :D Привет из Германии.
Don't sweat English spelling. We all have trouble with it too. We Americans simplified the original English spelling a bit, but there's a reason why Autocorrection is so popular.
In the early '90s, I had a manager in our computer department who said he really couldn't work until they got a spell-check program installed on our system.
About 200 years ago the Brits decided to change all of their spellings because they hated the Americans for breaking away and starting their own country.
Tsoding always celebrates „first try“ after literally fixing a boo boo :)
Me going back to center my div after watching tsoding implement a jit compiler : 🤡
It'd be amazing if Mr. Tsoding handled memory bounds checking, not by checking the pointers, but by letting the OS check the pointers. Aka, handling the segmentation fault.
Don’t know if he fixes this but if you have a [ and no ] it doesn’t throw an error up as of 1:02:00.
Three hours, man
Yeah imagine, and entire compiler in just three hours. Impressive af
1:16:55 this part was great :D
1:52:54 if that was the case, then if execution hit a block of 0x00's it would continue executing instead of raising an exception. so not sure if that that was a design choice but yeah
may be now its time to implement translator from one of high level languages to bf? from python, for example or c )
Maybe you should make a short video how you done it I swear this would hit viral D
So there was no need to actually make it? I could've just claimed that I made it?
Voodoo I tells ya, voodoo!
I learnt more in 3 hours then most of the tutorials can provide in 100 < hours.
Fascinating :)
2:02:30 you have to mod it by 256 or and it by 255 (0xff). Either one in compile time
02:00:44 why split?
wrap around addition is commutative ( (a+b)+c=a+(b+c) ) so doesn't
```
add byte[rdi], 255
add byte[rdi], 255
```
do same thing as
```
add byte[rdi], 254
```
?
02:01:18 yes, mod 256 but at jit compile time and &255 instead of %256
1:57:56 javascript is a real programming language (one of the most used ones) and python also has the eval function lol
I think the interpreter has the same out of bounds error as the compiler, of having a ']' at the end of the program.
Self compiling RUST compiler please!
Why you dont make a JIT for BASH or make JIT DIY SHELL? I always wanted one
dang
44:00 Tsoding, look at the shavian alphabet, it does have predictable spelling.
at 8:08 how did you jump to the end of the line in insert mode? i would think to use alt+shift+a but i'm wondering if there's a better way to do this
edit: oh my god i'm 30 minutes in and i just realized this is emacs not vim xd
5:37 new language: brainfart
weird coincidence, i made a brainfuck jit last week, but only becayse my brainfuck code was running too slow...
gotta watch this when i have a few hours to kill
I once did this using libgccjit but to think you can just directly output raw assembly...
show us some other language :( I would love to see some nim but any variety would be awesome!
Quick question for people in the comments who can provide an answer : when you execute (from C/C++/any language) a function which is "raw-encoded" in memory, does this function need to setup the stack and stuff like that ? Or does the compiler just emit a "CALL [insert the address of your function entry point]" instruction, so you can later just "RET" like Tsoding is doing (at least on Linux x86-64) ? If this is the latter case, then it's just incredibly cool
PS : I'm not completely familiar with the Linux x86-64 calling convention, hence the question
I assume with "setup the stack" you mean stuff like saving certain registers and "allocating" stack space (i.e. incrementing the stack pointer and decrementing it again at the end of the function)? The calling convention defines which registers need to be preserved. You don't need to preserve arguments but for pretty much everything else, you do need to save them to the stack if you overwrite them. And ofc, you only need to increment the stack pointer if you want to place stuff there. Although on the "Linux x86-64" calling convention, there's a 128 byte red zone so I guess you can even just placed stuff there without incrementing the stack pointer.
So isn't brainfuck just a good intermediate representation for an arbitrary programming language? Doing optimizations on brainfuck like collapsing '+', etc. will be very easy. I think loop unrolling (something llvm does) will also be very easy to do.
i liked the was ist das für ein haufen scheiße meine freunde part
1:15:50 could have save few minutes if you have asked phind:
q :fasm 64 bit mode
Answer | Phind V9 Model
To use FASM in 64-bit mode, you can use the use64 directive at the start of your code. ...
For the "dot streaks" Wouldn't it make sense to just repeat the syscalls instead of repeating the whole setup?
That only works if the syscall doesn't clobber the argument registers. At minimum, it overwrites rax, since that's where the return value is placed, so you'd have to reset that. Not sure if the other arguments are kept.
Byte offsets might be easier for machine interpretation, but there's few things I hate more than getting "Error at character 578" in a 30-line SQL query... and the shitty editor doesn't know how to jump to that offset
for the operand > 255 you could simply make a while loop that adds 255 by 255 and so on
while (operand > 255) {
char subop = operand % 255;
operand -= subop;
...
}
You can just mod it by 256 before adding once. (x + y) % 256 = x % 256 + y % 256. Tbh I'm surprised tsoding didn't know that. It's pretty intuitive, if you add 256, it just does nothing, so you can remove 256 from the value you're adding until the value is below 256. i.e. just mod by 256. Or for more efficiency, "x & 255" but the compiler is gonna optimize it to that anyways.
You have to mod the last some one more time in the end so that it's the same value.
(a+b) % n = (a%n+b%n)%n@@1vader
JIT a load of this guy
we wont more compiler thingssssssssssssssssssssssssssssssssssss
Right
I wonder if it would've been easier and more educational to go through x86 architecture spec to lookup opcodes instead of poking around with fasm.
I can't see jit in this mist
It was indeed pretty long for youчube
Lol :D ""Haufen Scheiße " means "Pile of Shit" :D
That "opa" on 1:41:15 was soooo Greek, it felt like you were a native. Are you sure you are Russian?
If you really wanted to write this for iOS, you'll be disappointed to know that you'll need term rewriting in a hand-crafted interpreter with human curated assembly instructions like the LuaJIT interpreter.
And don't get me started on Harvard Architecture CPUs, which make this task literally impossible since memory can't be used as code.
I wanna do a llvm based bf compiler and call it "fucklang" :D
duckduckgo is just bing with a different front now
they still track you.
Can we also have DreamBerd?
🔴
Assembly is polymorphic.
next challenge: copy and patch compilation
Mr. Tsoding I looked up your shirt and found out that company sells tshirts for 380$ I didn’t know you were ballin like that 😮 (this is a joke) but also who buys 380$ tshirts
tsoding is Tolstoy or clang
and if you not?!?! It was sayed! Now you need to be it!!!! >_-
1:56:30 you shiftarg before if argc guard
sometimes this guy speaks german (Scheisse), and i like that