How do computers add numbers so quickly?

Spanning Tree

zhlédnutí 77 206

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 16. 05. 2024
For computers to be able to perform billions of operations per second, they need strategies to make calculations quickly. Carry-lookahead adders make addition much more efficient by reducing the amount of time circuits spend waiting for carries to be calculated.
***
Spanning Tree is an educational video series about computer science and mathematics. See more at spanningtree.me
To be notified when a new video is released, sign up for the Spanning Tree mailing list at spanningtree.substack.com/
Spanning Tree is created by Brian Yu. brianyu.me/
Email me at brian@spanningtree.me to suggest a future topic.
0:00 Ripple Carry Adders
2:40 Carry-Lookahead
7:40 Combining CLAs

Komentáře • 179

@funtechu Před 2 měsíci ⁺⁷⁸
Chip designer here. One thing to note is that above a certain size, eventually you can't compute the carry lookahead in a reasonable amount of time. In this case we will often combine carry lookahead logic with a technique called *carry select*. For example, let's say we can compute the sum of two 32 bit numbers in one clock cycle using carry lookahead (CL), but we don't have enough time to compute a 64 bit sum. In that case, we could combine three 32 bit CL adders. One of the adders computes the sum of the 32 least-significant bits, one computes the sum of the most significant 32 bits assuming the carry in bit will be 0, and one computes the sum of the most significant 32 bits assuming the carry in bit will be 1. Then once we know the actual carry bit from the LSB adder, we just use it to select (hence the name) which of the two MSB possibilities was the real answer (via a simple 2:1 mux), and ignore the incorrect result.
@adamalbee3433 Před 2 měsíci ⁺²²
On a software level, graphics shaders often contain a bunch of this kind of *branchless* decision making : calculate *every possible result*, and then discard the incorrect ones.
@sammy5576 Před 2 měsíci ⁺¹
What a legend
@GordonWrigley Před 2 měsíci ⁺²³¹
Thank you for not stopping at ripple adders. I get tired of the dozens of youtube videos that cover the basics and stop there. Like all the different videos that cover pathfinding up to A* but no further.
@ArneChristianRosenfeldt Před 2 měsíci ⁺⁵
I think it is interesting that every microprocessor used carry look ahead for the program counter. Maybe the slow RCA 1802 did not.
@GordonWrigley Před 2 měsíci
@@ArneChristianRosenfeldt What do you find interesting about that? PC needs a fast increment on a decent word size, carry look ahead is good for that without being too many gates.
@ArneChristianRosenfeldt Před 2 měsíci ⁺²
@@GordonWrigley microprocessors in the 70a scarified a lot to fit into a single chip. But they did not scarify CLA, a technology unknown to a lot of computer “experts”.
@Darkev77 Před 2 měsíci ⁺²¹⁸
Wow wow wow, I swear I was diving into the hardware part of computer science just days ago to get a better and lower-level understanding of cpu and hardware as a software developer and you just uploaded this!!! Please continue on explaining how the hardware works on a low-level because it’s so complex
@DFPercush Před 2 měsíci
You can find people who design their own custom CPUs on CZcams. Ben Eater and James Sharman are two I know of. A primer course on digital logic design might help as well. Stuff like combinational circuits, sum of products or product of sums, Karnaugh maps (until you learn that somebody made a program in the 80s called Espresso that is infinitely better at minifying truth tables), finite state machines and latches, registers, multiplexers and selectors, helps to be familiar with all that. It's not too bad. The basic building blocks are incredibly simple, there are just so many ways to put them together.
@MaxPicAxe Před 2 měsíci ⁺⁹⁴
I like the smile that my face generated when you mentioned combining CLAs, unfortunately that smile didn't propagate to everyone else around me who thought I was some lunatic staring at nonsense on his phone
@SunroseStudios Před 2 měsíci ⁺¹⁷
maybe you need to generate smiles instead of propagating them :)
@elijahf Před 2 měsíci ⁺³⁶
The robots make this a lot easier to understand thank you for taking time to make this
@neshploda17 Před 16 dny ⁺³
I love how the little robots react to your dialog, following along with each step. It makes it super clear and easy to follow. Especially at 1:35, when the full adder robot squints at you, annoyed that you called what they do "very simple" xD
@canaDavid1 Před 2 měsíci ⁺⁹⁰
8:40 actually, it is usually in picoseconds (1000 ps = 1 ns, 10¹² ps = 1 s) for highly integrated circuits, with cpu's reaching several GHz and cycle times in the sub-nanosecond scale
@ArneChristianRosenfeldt Před 2 měsíci ⁺²
The N64 could add 64 bits in a single cycle at 90 MHz and had a 3 to 5 stage pipeline. In current deep pipelines, is an Add still single cycle?
@khatharrmalkavian3306 Před 2 měsíci ⁺⁴
Addition on modern desktops is usually 1/4 cycles, meaning that a single ALU can perform 4 additions in parallel in a single cycle.
@ArneChristianRosenfeldt Před 2 měsíci
@@khatharrmalkavian3306 so like Pentium MMX from 1998 ? AtariJaguar from 1983 has vec4 add. I just could not find out why it needs 2 cycles. I guess something with the bus. You could always leave some bits at zero to stop the carry between your fields in a struct. Then AND
@khatharrmalkavian3306 Před 2 měsíci
@@ArneChristianRosenfeldtNah, they just have multiple adders in the ALU because they're so simple and easy to slip in between registers. They can execute in a single cycle because they don't actually wait for a clock signal to run the calculation, so you just read the existing output from the relevant adder into a register.
@ArneChristianRosenfeldt Před 2 měsíci
@@khatharrmalkavian3306 I think that CPU design lost me with DECs alpha. But an FPGA like design with a big matrix sounds interesting. A checkerboard made up of registers and adders fields. Then the register renaming algorithm routes the data through it.
And a router may know when we need the carry flag / do bit logic and can decide to use 4 bit full adders and not fully propagate the carry. A shift not aligned to 4 needs to propagate the carry up to 3 digits.
@cmilkau Před 2 měsíci ⁺²³
Actually at uni we learned to make hierarchical CLA consisting of just copy-pasting slightly modified single bit CLA. It's a very small step from what is shown here. Missed opportunity but maybe someone reads this comment and solves it as homework. The hierarchical CLA takes O(n) hardware pieces and O(log n) time steps to compute all carries for n-bit addition, compared to O(n²) hardware pieces and O(1) time steps for the variant shown in the video
@Henrix1998 Před 2 měsíci ⁺⁸
Isn't 7:52 basically it?
@CheaterCodes Před 2 měsíci ⁺¹
I've seen the same number online, but i can't understand how the cost can be O(n), I would think it's O(n log(n)) like in the video
@vampire_catgirl Před 2 měsíci ⁺⁸⁶
Carry Cancel Adders in Minecraft are a thing that is very cool as well, though they don't have a real world equivalent
@ghasttastic1912 Před 2 měsíci ⁺⁴
what is that
@kitlith Před 2 měsíci
@@ghasttastic1912It's essentially a compact version of a carry lookahead adder that takes advantage of signal strength mechanics and the ability to create instant diodes by having redstone run up a transparent block.
(grain of salt for the following explanation, it's based on my understanding of the original posy describing it. If you want more details, do a search for Carry Cancel Adder and click the link that points to the open redstone engineers forum)
A CCA generates two signals for each input bit. A Generate and a Carry Cancel signal, the latter of which is the inverse of Propagate. These are all output into the same Generate tower and Carry Cancel tower, and their signal strength is compared at every bit/height to determine the carry status. (in short: is the previous generate closer than the previous cancel)
so, really I'd say that this video *does* describe the real-life equivalent to the carry cancel adder, it's just that carry cancel adder has a bunch of Minecraft - specific optimizations. (It's a similar story for the CLE or Carry Look Everywhere, though I think that one only relies on instant diode behavior.)
@Hubcat_ Před 2 měsíci ⁺³⁵
@@ghasttastic1912They work essentially the same way as explained in this video, but they make clever use of signal strength to do the carry look ahead, which allows the adder to run much faster than Minecraft would otherwise allow, since it essentially makes the two operation process a one operation process, at least in the realm of Minecraft. Under the hood it may still take more operations, but the intentional delays in Minecraft redstone components are much more important in redstone than the lag
Edit: For a better explanation/more info, watch mattbatwings' video logical redstone reloaded #4
@stanleydodds9 Před 2 měsíci ⁺¹⁵
Carry cancel adders do exactly the same thing as ripple adders. They aren't a different algorithm. They are only faster than a simple ripple carry adder in minecraft because the carry propogation is based on methods that send signals instantly (within a single tick) in minecraft; this includes changing the signal strength of distant redstone dust, or changing the state of a wall much higher up, both of which happen instantly.
It's totally possible to make a normal ripple carry adder in minecraft that's just as fast, using whatever instant wire method you want to. People just started calling the simplest ones "carry cancel" adders because of how they interpret carry propagation when it uses redstone components efficiently, but it does exactly the same thing as a full adder.
@a-bombmori7393 Před 2 měsíci
I actually was wondering if technical Minecraft players did something like this while watching this video, so thanks for sharing.
@Valo_iO Před 2 měsíci ⁺²⁶
This is amazing! I'm studying for CS, and they never thought this to us in our Computer Architecture/Logic Gates classes...
@ArneChristianRosenfeldt Před 2 měsíci
Yeah, how would you Lay-out those gates?
@sporksto4372 Před 2 měsíci
Well, they did teach us.
@ArneChristianRosenfeldt Před 2 měsíci ⁺¹
@@sporksto4372 I mean, I would love to see the circuit with gates in the video and how the information flows through it. I wanted this to be the heart of my CPU simulation, but it eats so much space if detangled and still needs bidirectional flow. I could probably read it, but it lacks clarity for noobs and will not impress them.
@yash1152 Před 2 měsíci ⁺¹⁴
6:23 that cute nod oh lol
@DrunkenUFOPilot Před 2 měsíci ⁺⁷
I was fascinated by this kind of stuff since I was a kid, (mumble) decades ago. In all my time studying electronics, chip design, fiddling with logic chips, and explaining stuff as a lab instructor, I've never encountered a decent explanation of look-ahead logic. Until now. This is very well done!
@theutz Před 2 měsíci ⁺¹⁰
This video is great and your whole channel is fantastic! Keep up the good work!
@bathfun Před 2 měsíci ⁺¹¹
Crazy, i really did not know this. Thank you
@proturtleman Před 2 měsíci ⁺¹
It’s been a while, but I’m so glad to see another video from you! It’s great, as always. The robots are really cool and they help make the subject simple enough to understand while still having depth and helping us understand the core details. Thank you for the video!
@yoonseongdo3303 Před 2 měsíci ⁺²
I understood everything about logic circuits except this one. And this explains it so well, awesome!
@timsmith2525 Před 2 měsíci
This is such a clear explanation. Great job! Thank you.
@memesalldayjack3267 Před 2 měsíci ⁺¹
this is awesome, very well explained and easy to follow
@christopherjaya342 Před 2 měsíci ⁺²
It's actually nuts that carry-look-ahead adder has the time complexity of O(log n) instead of O(n)
@ArneChristianRosenfeldt Před 2 měsíci
But power requirement of O( n log n )
@oscarvanslijpe581 Před 2 měsíci ⁺¹
Very good and clear animation!
@nikbl4k Před 2 měsíci ⁺¹
This is the first time i watched a video this intricate and understood it on the first watch.
@milakohen630 Před 2 měsíci ⁺⁸
Love these animations ❤
@ritwikchandra4612 Před 2 měsíci ⁺¹
Great explaination with awesome animation. Now I really got 4 bit look ahead carry adder
@dixztube Před 2 měsíci
Boy reading about book this and then seeing the video. Man really makes it makes sense. Great job!!!!
@i.am.abhi747 Před 2 měsíci
Love to see well created deep knowledge. It will be awesome to watch similar video for multiply algorithm
@vladislavvlasov5825 Před 2 měsíci
Thanks a lot, waiting for new videos! Wanna see trees and graphs here
@eric-seastrand Před měsícem
Love this channel.
@roronoa_d_law1075 Před 2 měsíci ⁺⁴
I don't understand why we can compute all the carries at the same time in one step. What's the difference between computing the carry of a pair of bits and making the addition of a pair of bits ?
Why the carries are not computed one by one for each pair just like for addition ?
@capability-snob Před 2 měsíci ⁺³
Well, "one step" is a bit of a lie. In practice, fan-in for multi-input AND and OR gates is not quite linear, so the "tree" structure of several lookahead circuits demonstrated later in the video is used. So, you get one step for each payer in the tree.
@maxmn5821 Před 2 měsíci ⁺²
Very nice video, thank you!
Waiting for Mr Parker to do it in domino in a sport hall and Mr Mould in hydraulic with a blue fluid
@evansjahja711 Před 2 měsíci
Thank you
@jamashe Před 2 měsíci ⁺¹
Great vid bro. Thank u so much.
How do you do these animations by the way?
@old486whizz Před 2 měsíci ⁺⁴
I dont quite get how time is saved.. with the lookahead logic, it has to essentially perform the add anyway to generate the carry bit... It reads the bits in calculates etc.. so how does it end up speeding up the calculations since youre still waiting.
@ArneChristianRosenfeldt Před 2 měsíci ⁺¹
The carry jumps over the groups. This allows you to have modern CPUs with 64 bit. They do a ripple add for all the groups of 4 bits. In the next stage they repeat this for the 16 bit groups. And finally they calculate the carry bit for 64 bit. Thus, a 64 bit CPU is only 3 times as slow as an Intel 4004.
@yensteel Před 2 měsíci
Great video! will you continue to explain other concepts of computing in regards to mathematics?
@flameofthephoenix8395 Před 2 měsíci
Very nice!
@User-pi3nf Před 8 dny
Is is awesome thank you
@fdk7014 Před 2 měsíci
That's really clever!
@agsystems8220 Před 2 měsíci
Im a fan of the 3 to 2 adder. When you are adding more than two numbers you don't have to wait for each calculation to finish before putting more information in. If each bit takes 3 inputs and outputs a value and a carry then you can run it as a ripple by replacing one of the inputs with the values and another with the carries shifted, and iterating till the carry disappears. The cool bit is that still leaves one input to be folded in every tick. You can add lots of numbers very efficiently.
@zuksofalter Před 2 měsíci ⁺³
Teacher Brian! #AlwaysInsightful
@AliceErishech Před 2 měsíci
I used a schematic of a 4-bit look-ahead carry adder to design an 8-bit one in Logisim Evolution a few years ago to ensure I understood how it worked. The difference in complexity was something like 4-5x. The 8-bit one was *way* more complex than the 4-bit one.
@tacticalassaultanteater9678 Před 16 dny
I imagine in circuitry most of this can be streamlined; adders can set their Carry and Carry Ready eagerly if A == B, and a big AND tree can collect all the CR values into the Output Ready of the whole adder. This results in a completely flat chain and a single large aggregated signal.
@LaserFur Před 2 měsíci ⁺²
Given the number of transistors available nowadays it could be faster to have each group calculate both results. one with a carry coming in and one without a carry coming in. And then have the results selected at the end. That way the group adder does not have to wait for the carry's to be calculated.
@wayando Před 2 měsíci
But then the selection is done after waiting, which doesn't sopve the problem of waiting.
@LaserFur Před 2 měsíci
@@wayandobut the wait for each group is smaller since it's less total gates. so a 8 bit group would have 16 gates to get to the output carry. and all the groups would be done in parallel. then the carries would just take 8 groups to get to 64 bits which would be another 16 gates to get the final result. I see it as easier to explain since the look ahead math is the same as the carry math.
@iganic7574 Před 8 dny
As a non computer science student half thing goes above my mind
@Chrisionvision Před 2 měsíci ⁺¹
Boys, new Spanning Tree video just dropped!
@shy-watcher Před 2 měsíci
It's good to note that in a ripple adder both the number of adders and the time-to-add grow linearly as we process numbers with more bits. With a carry-lookahead the time stays the same, but the amount of components grows as a square of the number of bits. See how formulas create a triangle shape at 6:31. So this is not a free optimization, you pay with more silicon and more complexity.
I didn't know about carry-lookahead, this seems kind of fundamental to a practically useful CPU. Thanks for a great video!
@ArneChristianRosenfeldt Před 2 měsíci
You pay with silicon, but as others have said only log (n). Not a real triangle. A ripple adder in CMOS switches twice: once when the inputs are applied and then when the ripple runs through. Best to use two lines: low means: latch inputs. One line high means that a carry ripples through, other line high means that a non-carry ripples through. Then pull down both lines starting from lsb
@shy-watcher Před 2 měsíci
Unexpected, I was sure the triangle was real. I should probably try and implement this in some model to really understand it.
@ArneChristianRosenfeldt Před 2 měsíci
@@shy-watcher the upfront cost is quite big though. Z80 gets away with a single 4 bit adder for 16 bit adds. Now let’s say we accept the carry from the last instruction late in the cycle, then for full speed we would need 4x2 adders in the leaf of the tree alone. ( like 8 16-pin packages on the breadboard).
@CrazyGaming-zh3qk Před 2 měsíci ⁺²
Why is this channel so under rated 😭😭truly one of the best if not the best content on CZcams
@Kanibulus Před 2 měsíci ⁺⁴
Incredible! What books would you recommend to read to know more about how a cpu works?
@zokalyx Před 2 měsíci ⁺¹
Haven't finished it myself but I've seen good reviews of "The Elements of Computing Systems"
@AmanYadav-ry3xr Před 2 měsíci ⁺¹
"but how do it know" by J. clark Scott. It starts with simple gates(and, or, not, nand ....) and using those gates make a single 8 bit cpu and memory.
@ArneChristianRosenfeldt Před 2 měsíci
I Wonder how a CPU can Route the data from a register to the integer units so fast. Surely, carry look ahead and bit logic don’t run on the same ALU anymore. Barrel shifter is extra?
@David_Box Před 2 měsíci ⁺¹
If you just do the operations as described, the carry-calculators still effectively rely on the output of all the previous bits. The only way in which I see it being faster is if instead of simply oring the results together, you iteratively check over them from the "closest" to furthest bit in order, adding significant overhead to each operation. This would make it vary in time depending on the input, something you cannot really take advantage of unless you modify how clock cycles work.
@ArneChristianRosenfeldt Před 2 měsíci
N64 could add 64 Bits in single cycle and check the sign in the next instruction (cycle) and you tell me how. Interestingly, the 64 bit on Jaguar were actually vectors of 16 bit - no carry all the way.
6502 timing depends on the carry, and I hate it. I also hate the 68k MUL time dependency. I kinda understand the early out on 386, but that thankfully also went away.
Floating point addition can lead to a very small sum. So a floating point unit better can detect leading 0s in a single cycle ( while adding with CLA ) and also has access to a barrel shifter to normalize the float in constant time.
But I also sinned. I thought of a CPU which just waits a state after any ADD. ARM has this dedicated fast 4 bit counter for loops. Larger loops would be made of nested loops.
The SH2 loads two instructions in a pair. So the instruction pointer has 2 cycles available to increment. One could even interleave it on a two shared 16 bit adders with intermediate carry flag. Logic functions get their own ALU.
@Ryrzard Před 2 měsíci ⁺¹
There's no clock involved. The whole propagation is asynchronous.
The carry calculator uses bits generated in parallel all at once. There's no signal that has to propagate through each bit of the number sequentially, unlike a ripple adder.
@ArneChristianRosenfeldt Před 2 měsíci
@@Ryrzard for low power consumption and high speed, I thought if a skewed timing pulse could run through the register file and lift up the chosen register bits onto the bitlines just in time for the ripple.
@Ryrzard Před 2 měsíci
@@ArneChristianRosenfeldt No real reason to do that. Just let it propagate through naturally and latch the result on the next clock edge like normal.
@ArneChristianRosenfeldt Před 2 měsíci
The reason I had in mind were that in CMOS every switch costs. So there better be only on switch per gate per cycle. All latencies need to match. The other reason is that the CPU could run skewed all the time and at a higher clock rate. Problem is that the flags would be latched to late to be available for the next instruction. So ADC and branches would need a wait state.@@Ryrzard
@kabuuu_voky Před 12 dny
Really great explanation, need you as a teacher lol, thanks
@abhishekrajanofficial Před 2 měsíci
Superb information.
How do you create such wonderful animations, Can you please tell me the source name or tool name?
@merseyless Před 2 měsíci
I did not know this. Where can I go to find more digital logic shortcuts like this?
@yaus0527 Před 2 měsíci ⁺¹
Great video. Could you make a booth multiplier video?
@ArneChristianRosenfeldt Před 2 měsíci
Does anyone actually do this? I just don’t get it. Why not use ripple carry for multiplication? You store a carry per bit and use it in the next stage. The ARM in the GBA only does 8 stages per cycle. So 32x32 but takes 4 cycles? There would need to be a final add. So 2-5 cycles!
MAC should need one less cycle because it could start with the carries from the last MAC. Similar for other instructions. Only when branches, saturation, or div are involved, the accumulator would need to be transferred into the register file.
In the other comment they mention dadda and Wallace, which use (full) adders within each column and only process carries in the next stage. Seems like this reduces latency even if we use an accumulator made of two registers which represent a sum.
Doesn’t booth need to supplement each adder with an inverter and a barrel shifter? Sounds expensive. So it is for microcode which makes the instruction so slow that we don’t even…
Since the tree multiply algorithms don’t really care where the operands come from, I think that the booth 10 and 01 cases are no problem. A full adder is kinda expensive, so skipping may be interesting with dadda or Wallace . Also we Need 3 bits to add. So there would be a fabric which can do some skips. But with a 10101 pattern, a second cycle is needed. And we need a register to remember how far we got.
The fabric is pretty cheap in CMOS if we construct it as power switches to a lot of busses. So the output driver of gets power. Or the bus sensor gets power. Then fill the 3 bit groups . Looks like if we exceed a skip length, stuff gets nasty. So better have a full fabric. Only othe last and first group of 3 can utilize the law of commutation to remove some gates.
Booth may be great for the Z80 with its sole 4 bit adder or for 32 bit factors on the 16 Bit ALU of the 68k.
@TinusBruins Před 2 měsíci
The biggest revelation for me, was that the basis for a carry-lookahead is the same circuit as a half-adder. And as a full-adder is build from 2 half-adders you already have al your generate and propagate values available and you only have to add the secondstage of the carry-lookahead.
@mehmetdemir-lf2vm Před 2 měsíci ⁺⁴
good video. please explain fast hardware integer multipliers too.
@capability-snob Před 2 měsíci
100%. Dadda and wallace multipliers are just lots of full adders, but are a complete mind bend once first examined.
@ArneChristianRosenfeldt Před 2 měsíci
@@capability-snobthey look so similar. Both seem to need a carry look ahead add at the end.
@ArneChristianRosenfeldt Před 2 měsíci
Now I guess I may have some misconception how a division circuit looks like, which in a parallel operation spits out 2 bits plus a carry. Like doing 4 comparisons in parallel.
@capability-snob Před 2 měsíci
@@ArneChristianRosenfeldt Yes - this is why many processors can offer fused multiply-add at the same performance as just a multiply - the network just simplifies a large number of additions into one final addition!
I don't know of any pipelined hardware division circuitry, which is why a lot of systems (notably, GPUs) just approximate the reciprocal and then perform multiplications to get it accurate. It takes a bit longer, but you can perform several divisions at once, or mix in other operations. Modern CPUs use this mechanism for integer division internally, which is why division tends to take 21-29 cycles.
From what I can guess from the timing and power data, the floating-point division unit on modern CPUs does parallel trial subtraction. The divisor is statically multiplied by a sequence of small numbers, and these are compared with (effectively, subtracted from) the remainder. The index of the smallest positive number is taken as the next few bits of the divisor. It takes 8 cycles to perform division on 64-bit mantissas, but only one instruction on the core can be using this unit at a time.
@ArneChristianRosenfeldt Před 2 měsíci
@@capability-snob I would build a pipeline of that digit based method for 8 bits followed by that iterative multiplication algorithm for 16, 32, 64 followed by one 64 bit carry look ahead check to get 100% correct integer division.
@kelpR Před 2 měsíci
lets go!! a vid!
@proceduralism376 Před 2 měsíci ⁺¹
truly one of the vids of all time
@jamesalewis Před 2 měsíci
I built a Rust arbitrary size integer system, and a carry look-ahead may speed things up! Especially if I figure out how to parallelize the process.
@joshuascholar3220 Před 2 měsíci ⁺¹
On a Von Newman architecture you can't get enough parallelism for the extra steps to break even. But if you were adding super large numbers on a GPU architecture then it would pay off.
@plemli Před 2 měsíci ⁺¹
It is possible to optimize the special cases when
- adding or subtracting 1 as in an up or down counter
- one of the operands is an arbitrary but fixed number as in an increment by n counter
- operands are known to have a (widely) different number of bits e.g. adding an 8 bit to a 64 bit number.
It would be interesting to compare the output of your favorite VHDL or Verilog synthesiser for the general and these special cases.
After that, consider adding more than two operands efficiently.
@ArneChristianRosenfeldt Před 2 měsíci
Programm pointer circuits look almost like an adder even if the CPU does branch jump length calculation on the ALU. Only thing is count down in a sprite engine for x ordinate. Zero flag is calculated from msb to lsb. The msb don’t change before zero. So zero flag is faster than carry. Or keep a carry ever 4 bit and compensate the delay when loading the counter.
Same thing with 8+64. There can always be a carry. For 6502 it would be cool to have two values of the high byte prepared. Jumps are within 256 targets. Still high byte needs to steal a cycle from the low byte ALU ( concurrently to the first fetch at the target).
@ArneChristianRosenfeldt Před 2 měsíci
MUL comments show how to add multiple operands. X86 does this for some addressing modes with segment.
@user-pr6ed3ri2k Před 2 měsíci
1:55 the intro immediately made me realize that the faster method of look ahead just does the adding and carrying at the same time and FASTER
@user-pr6ed3ri2k Před 2 měsíci
4:48 and and or are yummy
@user-pr6ed3ri2k Před 2 měsíci
8:30 recursive gaming
@yash1152 Před 2 měsíci ⁺¹
1:48 TIL: chaining of digital units gives them name "ripple". thanks.
@milakohen630 Před 2 měsíci
Brian 🎉
@sphynx_owner8224 Před 2 měsíci
how do the byte's generate and propegate values get decided?
@jalapenogaming9740 Před 2 měsíci
we need videos on algorithms
@darqed Před 2 měsíci
Great video!, although I already knew the main idea. Wouldn't packing up into 2 bit pairs each time be faster though, because then it will be just a binary tree to search in?
@trevinbeattie4888 Před 2 měsíci ⁺³
I think that would be slower because you’re doubling the propagation delays. Remember that the point of the carry-lookahead circuit is to eliminate propagation delays for the entire group at the cost of increasing circuit complexity at a rate of N², so the choice of group size is a matter of balancing speed versus transistor count.
@ArneChristianRosenfeldt Před 2 měsíci
Ripple is quite fast per bit and in CMOS can use this inversion trick. Not sure if it works for look ahead. Also 4-NAND is kinda the default gate and you can apply it more if you use groups of 4.
@hlibprishchepov322 Před měsícem ⁺¹
I am to stupid or explanation to hard but i come up with my solution. The way I can come up with: we computing obvious carry and obvious not carry situation and fill all idk using regular algorithm. Doing it that way we split up our addition in chunks of two in average but longest one going slow down system. Is my solution same as in the video?
@johnopalko5223 Před 2 měsíci
Thank you for the wonderful explanation. I had run across the concept of carry-lookahead ages ago but it all seemed like witchcraft. But my question is can it be done with purely combinatorial logic or do you need sequential logic to synchronize things?
@ArneChristianRosenfeldt Před 2 měsíci
Combinatorial. Multiplication also. You could even unroll division, but with so many gates in the critical path, every synthesiser will insert pipeline stages.
@johnopalko5223 Před 2 měsíci ⁺¹
@@ArneChristianRosenfeldt Thanks! Just for grins, I recently designed a simple 4-bit clocked adder/subtractor using ripple carry. I'll see if I can design one with carry-lookahead.
My challenge to myself is to design these things without looking up how others have done it. I start with truth tables, derive the Boolean equations, and translate those into gates.
A lot of people my age do Sudoku puzzles to keep their brains agile. I do this. Yeah, I know: weird.
@adissentingopinion848 Před 2 měsíci
Aw heck yeah. Computer Engineering 201. Hit me with some of that Double Dabble with a real life 7 segment. Maybe even get some sweet AI hype with Wallace and Dadda multipliers and MAC operations.
@godofcows4649 Před 2 měsíci
7:53
The circuit is the same for all the adders. The reason he says it's grouped into four bit adders is because that is the practical amount of adders that can be used on a 16 pin DIP package
@ArneChristianRosenfeldt Před 2 měsíci
Ben Eaters SAP!
@godofcows4649 Před 2 měsíci ⁺¹
@@ArneChristianRosenfeldthis videos are really cool - tho anyone is a mad man to use breadboards lol
@Z3rgatul Před 2 měsíci ⁺¹
Last part is a bit confusing.
We can't just calculate carry-lookahead for groups of 4 bits. We actually have to do 4 bits sum for them, right? And we have CL for groups, we run 4 bits sum again?
@kitlith Před 2 měsíci
Propagate for groups of 4 bits should be ANDing the Propagate signals for all 4 of the individual bits, and the Generate signal for the group would be equal to the Generate signal for the most significant bit in the group, I believe.
@ArneChristianRosenfeldt Před 2 měsíci
I think that this all is much more easy to understand if you are used to zero and carry flag in an 8 bit CPU. Then let us do a SUB. If the SUB internally of a group set the zero flag, we know that the group will propagate externally.
If the group receives a carry, we need to DEC. For minimal delay we could have pre calculated this.
@samwalko Před 2 měsíci
It might be worth mentioning that sometimes, there is a carry-in into the least significant bit, which means there's an additional term to the CLU formulas. But very good video nonetheless!
@ArneChristianRosenfeldt Před 2 měsíci
Like the ADC instruction?
@samwalko Před 2 měsíci ⁺¹
@@ArneChristianRosenfeldt I was actually just thinking of subtraction, which is typically performed by bitwise inverting the second operand and carrying in a 1 to the LSB.
@kakyoindonut3213 Před 2 měsíci
binary logic are the first thing I've learn before starting programming, yet I only knew about this carry lookahead algorithm after watching this video
@kenrowe167 Před 2 měsíci
But doesn't the generate/propogate have to "ripple" from low look-ahead to high look-ahead? Since each successive look-ahead generator needs to know the status of all preceeding look-ahead generator decisions in order to make is own decision.
@ArneChristianRosenfeldt Před 2 měsíci
It’s is speculation. You could say that each block of 4 bits calculates the result for both cases : incoming carry or not.
See spectre bug in modern CPUs and branch prediction.
@DominikJaniec Před 2 měsíci
interesting ;)
@matteofrattini9133 Před 2 měsíci
I'm confused... If the carry-lookahead formulas at 6:30 are calculated in a recursive way, aren't they all waiting for the previous ones to finish their work just like the adders?
I get that this would be very convenient for longer operations like multiplication, but single-bit addition is literally two logic gates 🤔
@capability-snob Před 2 měsíci
Recursive in the sense that when you break up a 64-bit addition into a 32-bit addition, and those into 16-bit additions etc, you only have 6 steps before you're adding 1-bit numbers. These 6 steps are much fewer than the 63 steps you need to take in order when using a ripple-carry addition. By breaking the problem into similar sized chunks at every level and computing them in parallel, you arrive at the answer in a logarithmic number of recursive steps. You have to do more work, sure - but by focusing on breaking the problem up in this way, you can do more of the work in parallel and get the output much sooner.
@AmanYadav-ry3xr Před 2 měsíci
Make next video on CRC(Cyclic Redundancy Check)
@ArneChristianRosenfeldt Před 2 měsíci
Ah yeah, how CDs jumps over dust and scratches!
@poutineausyropderable7108 Před 2 měsíci
Doesn't seem faster to me.
You still have to wait for the end of the carry lookahead to do the addition step.
It feels like doing the same task but one after another before addition rather then during.
The only advantage is hard chemistry-ing 4 way | and & circuit.
@ferenccseh4037 Před 2 měsíci
But if the generate/propagate formula needs the generate/propagate of all the bits before, how do they not have to wait for each other?
@Ryrzard Před 2 měsíci
The P and G of each pair of bits can be computed in parallel, without looking at any other bits. The last adder can get its carry-in from the calculated P/G and 2 stages of normal logic (ands and ors) instead of having to wait for the carry to propagate through 4/8/whatever full adders.
@ferenccseh4037 Před 2 měsíci
@@Ryrzard Well, thanks for summarising the video, but that still doesn't answer my question. The video states that the P bits need the last bit's P&G to figure out if it propagates. But if it needs that, how does it not have to wait for it?
@Ryrzard Před 2 měsíci
@@ferenccseh4037 It didn't state that. If you listen carefully, P and G can be figured out entirely based on the bit pair and nothing else.
@ferenccseh4037 Před 2 měsíci
@@Ryrzard Doesn't Bit(n) only propagate if Bit(n-1) also propagates or generates in some cases?
@Ryrzard Před 2 měsíci
@@ferenccseh4037 But you don't need to perform the actual addition to determine that. All the information needed to determine every bit's carry is available in a single step.
@prosamis Před 2 měsíci
I don't get it.
So we fix having to wait for a process... By adding another process? Which also has terms that cannot be known simultaneously and relies on previous numbers?
I don't see what we accomplished here besides trimming down on time for exactly just when we have generative and non propagating
While it sounds like propagating carries go through the same issue
@dertechl6628 Před 2 měsíci
The propagating/generating properties are not computed sequentially. They are done simultaneously for each digit.
@prosamis Před 2 měsíci
@@dertechl6628 how is the outcome of the propagating determined without looking at parts before, and hence having a sequential part?
@dertechl6628 Před 2 měsíci ⁺¹
@@prosamisat 6:50 you can see that after the propagate/generate properties for each digit was computed (which happens in parallel), the carry values only depend on these results. If we have OR and AND gates with sufficiently many inputs, we can then compute the carries in two steps, by applying the ANDs in parallel and then the ORs in parallel.
@byeronkactus Před 2 měsíci
not me using this for redstone builds! ( srsly though it was really useful )
@bytesandbikes Před 2 měsíci
This is (as far as I can tell), the same as a tree of full adders that pass forward rather than along
@netheritecraftondrugs5126 Před 2 měsíci
For an 8 bit number what youcould do is divide them into 2 4 bit numbers and do the calculations simultaneously
@ArneChristianRosenfeldt Před 2 měsíci
That is what the video shows, no? Anyway, 8 bit CPUs need to deal with 16 bit instruction pointers. Z80 actually uses 4 Bit adders for bytes. 6502 uses ripple 8 bit. Ben Eater does what you propose in his SAP.
@nevokrien95 Před 2 měsíci ⁺¹
This is like simd but for adders
@weckar Před 2 měsíci ⁺¹
I 100% still don't get this. With every carry still being dependent on every previous carry, how is it any different from ripple propagation directly?
I understand that if an addition generates a carry you can ignore everything prior to that, of course.
@perplexedon9834 Před 16 dny
I think I've got my head around it. The big picture idea is that even though heaps of comparisons are made in the carry lookahead, those comparisons can be implemented in parallel, wherase ripple progation is inherently serial and thus slower.
Review 6:40. Even though there formula for C_n+1 depends on Cn, we can walk that back to generate a closed formula. This closed formula is the same for all numbers, the parameters just depends on the bits. The formula itself, even though it has many different G and P parameters, can be completed in two logical steps a set of AND operations that are computed in parallel, and a set of OR operations computed in parallel. Calculating each value of G requires only an AND gate, and calculating P requires only an OR gate, and these can also be done in parallel with one another for all values.
Now it's important to understand the actual metal of the chip. The physical silicon is all NAND gates, so we can estimate the time an operation takes by the maximum number of NAND gates a signal will have to traverse. A full adder has 9 NAND gates, with the longest path consisting of 6 NAND gates. An AND gate is implemented with two NAND gates, and an OR gate is implemented with 3 NAND gates, with a longest path of 2.
All that together means that, regardless of length, it takes only 6 NAND gate cycles to generate the carry for the nth bit, no matter how high n is, and you can do this for all bits in parallel. Waiting on the carry would take 6n cycles, and obviously 6≤6n for all n>0. So once you have the carry, you are only one full adder cycle away, which is 6 NAND gate cycles. Therefore the carry lookahead algorithm requires that no electron has to travel through more than 12 NAND gates to complete the addition of two numbers of any size, whereas the naive adder algorithm would require each electron to pass through 6n NAND gates where n is the number of bits! This is clearly better for 3 or more bits.
I'm not sure of this, but I think the reason we stop at 4 bits is a practical geometry issue. Implementing the very large formulas for the 64th bit would mean a lot of NAND gates in a small space, all closely connected. At some stage the bottleneck will become the time that it takes for an electron to get to the next gate, rather than to go through it.
@chipichipichapachapaWHY Před 2 měsíci
I thought the last bit would be a half adder (since it will never have a carry in)
@urgtuiop5455 Před 2 měsíci ⁺²
ADC. add with carry in. instruction needs a carry in so you can chain instructions together for adding/subtracting larger numbers. full adders in a cpu implemaentation need ci and co circuitry otherwise you would need an additional ADD #1 and some extra conditional jump instructions.
@TildaAzrisk Před 2 měsíci ⁺¹
If all you are doing is adding 2 numbers that are the same number of bits as what the addition circuitry takes in, a half adder on the first bit works no problem.
However, with some clever extra circuitry, adder circuitry can do alot more than just addition. When an adder is part of an Arithmatic Logic Unit (ALU), having the first bit have an optional carry in makes the ALU able to do more things than if the carry in was not possible.
@ArneChristianRosenfeldt Před 2 měsíci
@@TildaAzrisklike subtract and increment.
@wolf7115 Před 2 měsíci
We have to go deeper
@ArneChristianRosenfeldt Před 2 měsíci
Why don’t you go check the commented die shots of chips from the 70s?
@nicholas_0x7f Před 2 měsíci ⁺²
Cute robots
@spezial-m9146 Před 2 měsíci
did not understand why it is faster. :(
@filipwolinski8915 Před 2 měsíci ⁺¹
The fact that this is more efficient seems paradoxical to me. Isn't stacking so many gates together to look ahead for the carry the same as stacking full adders together? I know it isn't, but it doesn't seem intuitive.
@MuyGordoGato Před 2 měsíci ⁺¹
Do an experiment. Assign a delay to each gate type (nand, and, or, nor, invert etc). Then design a full adder with ripple carry, and another with CLA. Chain eight bits together, and count the number of gates of each type the carry propagates through. I guarantee your paradox will resolve itself.
@SamiSaba2 Před 2 měsíci
1/0? HOW
@bayurukmanajati1224 Před 2 měsíci
Now, we should aplly it into decimal based numbers. So human can do it as well. XD
@matthewn2559 Před 2 měsíci
I just found out Ime stewpid
@tsunningwah3471 Před 2 měsíci
看見不見看不見看不見看 sezx
@NopeNopeNope9124 Před měsícem
For someone with the name "Spanning Tree" which pops up when I search spanning tree, I'd expect more spanning tree videos, but all I get is YOU talking about NOT spanning tree stuff, BRUH 😭😭😭😭
I wanna know about spanning trees man what the HECK
Like yeah lemme call myself Planar Graph or Dijkstra's Algorithm or Fibonacci Sequence so every term links to a youtube channel NOT talking about the topic
BRO!!!!!!
@lukamtc9188 Před 2 měsíci
So basically, you use the fact computers can do many things at once whereas humans can't, so the fact that for humans all these "propagate" and "generate" values are a waste of time doesn't mean they are for computers too.
I wonder if AI is going to do math in a fashion similar to us, even using intuitive rules for wether a number is divisible by 7 (the VSauce video), or if the computer's methods will always be faster?
@Grimziez Před 2 měsíci
What.
@Huntracony Před 2 měsíci ⁺⁶
I've always wanted to know how carry-lookahead worked (but not enough to go read about it on Wikipedia), thanks!
@simcore999bernard6 Před 2 měsíci
Isn't it just wasting space
@ArneChristianRosenfeldt Před 2 měsíci
Yeah, it costs some. That is why the first ARM CPU used ripple despite its 32 bit. But then most of the die was used up by a barrel shifter. They could have stuck with 8bit and 1 bit shifts and do others in cycles …
@traybern Před 2 měsíci
There is NO SUCH THING as “at the same time.” The computer does ONE calculation at a TIME!
@mrcryptozoic817 Před 2 měsíci
Well, for small scale CPUs anyway (Windows and Macs). Not multiple core machines.
@capability-snob Před 2 měsíci
The millions of logic gates in a modern ALU can all be doing different calculations at the same time. There are synchronous components that ensure that all of the outputs are stable at the start and end of a clock cycle, but all of the inputs and outputs are different physical components that do not have to wait for one another.
@user-ml1ux4ve7w Před měsícem
Thank you soo much for this🥹🥹

Další v pořadí

Automatické přehrávání