Why Information Theory is Important - Computerphile

Sdílet
Vložit
  • čas přidán 24. 05. 2022
  • Zip files & error correction depend on information theory, Tim Muller takes us through how Claude Shannon's early Computer Science work is still essential today!
    / computerphile
    / computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottscomputer
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Komentáře • 148

  • @mba4677
    @mba4677 Před 2 lety +314

    "a bit"
    "a bit more"
    after years living with the pack of geniuses, he had slowly become one

    • @laurenpinschannels
      @laurenpinschannels Před 2 lety +6

      ah yes I recognize this sense of genius. it's the same one people use when I say that doors can be opened. "thanks genius" I am so helpful

    • @VivekYadav-ds8oz
      @VivekYadav-ds8oz Před 9 dny +1

      "he had slowly become *one* "

  • @louisnemzer6801
    @louisnemzer6801 Před 2 lety +209

    This is the best unscripted math joke I can remember!
    How surprised are you?
    >A bit
    One bit?

    • @JavierSalcedoC
      @JavierSalcedoC Před 2 lety +69

      _Flips 2 coins_ "And now, how surprised are you?"
      "A bit more"
      *exactly*

    • @068LAICEPS
      @068LAICEPS Před 2 lety +1

      I noticed during the video but after reading here now I am laughing

  • @Ziferten
    @Ziferten Před 2 lety +295

    EE chiming in: you stopped as soon as you got to the good part! Shannon channel capacity, equalization, error correction, and modulation are my jam. I'd love to see more communications theory on Computerphile!

    • @Mark-dc1su
      @Mark-dc1su Před 2 lety +12

      If anyone wants an extremely accessible intro to these ideas, Ashby's Introduction to Cybernetics is the gold standard.

    • @hellowill
      @hellowill Před 2 lety +4

      Yeah feels like this video was a very simple starter

    • @travelthetropics6190
      @travelthetropics6190 Před 2 lety +1

      Greetings EE! those are the first topics on our "communications theory" subject back at Uni.

    • @OnionKnight541
      @OnionKnight541 Před 2 lety +1

      Hey! What channel is that stuff on ? I'm still a bit confused by IT

    • @mokovec
      @mokovec Před 2 lety +2

      Look at the older videos on this channel - prof. Brailsworth already covered a lot of the details and history.

  • @roninpawn
    @roninpawn Před 2 lety +78

    Nice. This explanation ties so elegantly to the hierarchy of text-compression. While I've, many times, been told its mathematically provable that there is no more efficient method... This relatively simple explanation leaves me feeling like I understand HOW it is mathematically provable.

  • @gaptastic
    @gaptastic Před 2 lety +19

    I'm not gonna lie, I didn't think this video was going to be interesting, but man, it's making me think about other applications. Thank you!

  • @LostTheGame6
    @LostTheGame6 Před 2 lety +95

    The way I like to do that conclusion would be to say : ok let's describe a population where everyone plays once.
    In the case of the coin flip, if a million people play, you need to, on average, give the name of 500k people who got tails (or heads). Otherwise your description is incomplete.
    In the case of the lottery, you can just say "no one won", or just give the name of the winner. So, you can clearly see how much more information is needed in the first case.

    • @MrKohlenstoff
      @MrKohlenstoff Před 2 lety +2

      That's a nice explanation!

    • @sanferrera
      @sanferrera Před 2 lety

      Very nice, indeed!

    • @NathanY0ung
      @NathanY0ung Před rokem +1

      This makes me think of something like an ability to correctly guess. For a coin flip, which requires more information, it's harder to guess the outcome than the wining of a lottery.

  • @Double-Negative
    @Double-Negative Před 2 lety +56

    The reason we use the logarithm is because it turns multiplication into addition.
    The chances of 2 independent events X and Y happening is P(X)*P(Y)
    if entropy(X) = -log(P(X))
    entropy(X and Y) = -log(P(X)*P(Y)) = -log(P(X))-log(P(Y)) = entropy(X) + entropy(Y)

    • @PetrSojnek
      @PetrSojnek Před 2 lety +23

      isn't that more of a result of using logarithm, instead of reason of using logarithm? It feels like using logarithm for better scaling was still the primary factor.

    • @entropie-3622
      @entropie-3622 Před 2 lety +8

      @@PetrSojnek There are lots and lots of choices for functions that model diminishing returns, but only the log functions will turn multiplication into addition.
      Considering how often independent events show up in probabilistic theory it makes a lot of sense to use the log function for this specific property and it will yield all kinds of nice results that you would not see if you were to use another diminishing returns model.
      If we go by the heuristic of it representing information this property is fairly integral.
      Because you would expect that the total information for multiple independent events should come out as the sum of the information about the singular events.

    • @GustavoOliveira-gp6nr
      @GustavoOliveira-gp6nr Před 2 lety +1

      Exactly, the choice of the log function is more due to the addition property than about diminishing returns.
      Also, it is totally related to the number of bits it uses to code a sequence of fair coins using binary digits. 1 more digit on a sequence changes the sequence probability by a factor of 2 while adding exactly 1 more bit of information, which works well with the logarithm formula.

    • @temperedwell6295
      @temperedwell6295 Před rokem +1

      The reason for using logarithm to base 2 is that there are 2^N different words of length N formed with the alphabet {H,T}; i.e., length of word =log_2 number of words. The reason for the minus sign is so that N gives a measure of the amount of information.

  • @elimgarak3597
    @elimgarak3597 Před 2 lety +41

    I believe Popper made this connection between probability and information a bit earlier on his Logik Der Forschung (1934 Shannon's first paper was written in 1949). That's why he says that we ough to search for "bold" theories, that is, theories with low probability and thus more content. Except, at first, he used a simpler formula: Content(H) = 1-P(H), where H is a scientific hypothesis.
    Philosopher's role on the history of logic and computer science is a bit underrated and obscured imo (see for example, Russell's type theory).
    Btw, excellent explanation. Please, bring this guy more often.

    • @yash1152
      @yash1152 Před 2 lety +3

      thanks a lot for bringing philosophy up in here 😇

    • @Rudxain
      @Rudxain Před rokem

      This reminds me of quantum superposition

  • @travelthetropics6190
    @travelthetropics6190 Před 2 lety +10

    This and Nyquist-Shannon sampling theorem are two of the buildings block of communication as we know today. So we can say even this video is brought to us by those two :D

  • @agma
    @agma Před 2 lety +8

    The bit puns totally got me 🤣

  • @Jader7777
    @Jader7777 Před 2 lety +8

    Coffee machine right next to computer speaks louder than any theory in this video.

  • @CristobalRuiz
    @CristobalRuiz Před 2 lety +4

    Been seeing lots of documentary videos about Shannon lately. Thanks for sharing.

  • @scitortubeyou
    @scitortubeyou Před 2 lety +35

    "million-to-one chances happen nine times out of ten" - Terry Pratchett

    • @-eurosplitsofficalclanchan6057
      @-eurosplitsofficalclanchan6057 Před 2 lety +2

      how does that work?

    • @AntonoirJacques
      @AntonoirJacques Před 2 lety +6

      @@-eurosplitsofficalclanchan6057 By being a joke?

    • @IceMetalPunk
      @IceMetalPunk Před 2 lety +5

      "Thinking your one-in-a-million chance event is a miracle is underestimating the sheer number of things.... that there are...." -Tim Minchin

    • @davidsmind
      @davidsmind Před 2 lety +2

      Given enough time and iterations million to one chances happen 100% of the time

    • @hhurtta
      @hhurtta Před 2 lety +4

      @@-eurosplitsofficalclanchan6057 Terry Pratchett knew human behavior and reasoning really well. We tend to exaggerate a lot, we have trouble comprehending large numbers, and we are usually very bad at calculating probabilities. Hence we often say one-in-a-million chance when it's actually much lower. On the other hand, one-in-a-million events do occur much more often than we intuitively expect, when iterating enough, like brute forcing guessing a 5 letter password (abt 1 in 12 millions).

  • @DeanHorak
    @DeanHorak Před 2 lety +3

    Greenbar! Haven’t seen that kind of paper used in years.

  • @drskelebone
    @drskelebone Před 2 lety +6

    Either I missed a note, there's a note upcoming, or there is no note stating that these are log_2 logarithms, not natural or common logarithms.@
    @5:08. "upcoming" is the winner, giving me log_2(1/3) ~= 1.585 bits of information.

  • @TheFuktastic
    @TheFuktastic Před 2 lety

    Beautiful explanation!

  • @gdclemo
    @gdclemo Před 2 lety +5

    You really need to cover arithmetic coding, as this makes the relationship between Shannon entropy and compression limits much more obvious. I'm guessing this will be in a followup video?

  • @elixpo
    @elixpo Před rokem

    This explanation was really awesome

  • @clearz3600
    @clearz3600 Před 2 lety +1

    Alice and Bob are sitting at a bar when Alice pulls out a coin, flips it and says heads or tails.
    Bob calls out heads while looking on in anticipation.
    Alice reveals the coin to be indeed heads and asks how surprised are you.
    A bit proclaims Bob.

  • @CarlJohnson-jj9ic
    @CarlJohnson-jj9ic Před rokem

    Boolean algebra is awesome!!!: Person(Flip(2), Coin(Heads,Tails)) = Event(Choice1, Choice2) == (H+T)^2 == (H+T)(H+T) == H^2 + 2HT + T^2 (notice coefficient orderings) where the constant coefficient is the frequency of the outcome and the exponent or order is the amount of times the identity is present in the outcome. This preserves lots of the algebraic axioms which are largely present in expanding operations. If you try to separate out the object and states from agents using denomination of any one of the elements, you can start to be able to combine relationships and quantities with standard algebra words with positional notation(I like abstraction be used as the second quadrant, like exponents are in the first, to resolve differences of range in reduction operations from derivatives and such) polynomial equations to develop rich descriptions of the real world and thus we may characterize geometrically the natural paths of systems and their components. These become extraordinarily useful when you consider quantum states and number generators which basically describe the probability of events in a world space which allows one to rationally derive the required relationships elsewhere, events or agents involved by stating with a probability based on seemingly disjoint phenomena, i.e. coincident and if we employ a sophisticated field ordering, we can look at velocities of gravity to discern what the future will bring. Boolean algebra is awesome! Right up there with the placeholder-value string system using classification of identities.

  • @Mark-dc1su
    @Mark-dc1su Před 2 lety +2

    I'm reading Ashby at the moment and we recently covered Entropy. He was very heavy handed with making sure we understood that the measure of Entropy is only applicable when the states are Markovian, or that the state the system is currently in is only influenced by the state immediately preceding it. Does this still hold?

    • @ConnorMcCormick
      @ConnorMcCormick Před 2 lety +2

      You can relax the markovian assumption if you know more about your environment. You can still compute the entropy of a POMDP, it just requires guesses at the underlying generative models + your confidence in those models

  • @sean_vikoren
    @sean_vikoren Před 2 lety +1

    I find my best intuition of Shannon Entropy flows from Chaos Math.
    Plus I get to stare at clouds while pretending to work.

  • @adzmarsh
    @adzmarsh Před 2 lety

    I listened to it all. I hit the like button.
    I did not understand it.
    I loved it

  • @nathanbrader7591
    @nathanbrader7591 Před 2 lety +12

    3:41 "So 1 in 2 is an odds of 2, 1 in 10 is an odds of 10" That's not right: If the probability is 1 in x then the odds is (1/x)/(1-(1/x)). So, 1 in 2 is an odds of 1 and 1 in 10 is an odds of 1/9.

    • @patrolin
      @patrolin Před 2 lety +2

      yes, probability 1/10 = odds 1:9

    • @BergenVestHK
      @BergenVestHK Před 2 lety +4

      Depends on the system, I guess. Where I am from, we would say that the odds are 10, when the probability is 1/10. I know you could also call it "one-to-nine" (1:9), but that's not in common use here. Odds of 10 would be correct here.

    • @nathanbrader7591
      @nathanbrader7591 Před 2 lety

      @@BergenVestHK Interesting. Where are you from?

    • @BergenVestHK
      @BergenVestHK Před 2 lety

      @@nathanbrader7591 I'm from Norway. I just googled "odds systems", and found that there are supposedly three main types of odds: "fractional (British) odds, decimal (European) odds, and moneyline (American) odds".
      I must say, that seeing as Computerphile is UK based, I do agree with you. I am a little surprised that they didn't use the fractional system in this video.
      However, I see that Tim, the talker in this video, previously studied in Luxembourg and the Netherlands, so perhaps he imported the European decimal odds systems from there. :-)

    • @nathanbrader7591
      @nathanbrader7591 Před 2 lety +2

      @@BergenVestHK Thanks for this. That explains his usage which I take to be intentionally informal for an audience perhaps more familiar with gambling lingo. I'd expect (hope) that with a more formal discussion, the term "odds" would be reserved for the fractional form as it is used in statistics.

  • @Juurus
    @Juurus Před 2 lety +1

    I like how there's almost every source of caffeine on the same computer desk.

  • @DrewNorthup
    @DrewNorthup Před 2 lety

    The DFB penny is a great touch

  • @MrVontar
    @MrVontar Před 2 lety

    stanford has a page about the entropy in the english language, it is interesting as well

  • @TheNitramlxl
    @TheNitramlxl Před 2 lety +1

    A coffee machine on the desk 🤯this is end level stuff

  • @oussamalaouadi8521
    @oussamalaouadi8521 Před 2 lety +8

    I guess information theory is - historically - a subset of communications theory which is a subset of EE.

    • @sean_vikoren
      @sean_vikoren Před 2 lety +8

      Nice try. Alert! Electrical Engineer in building, get him!

    • @eastasiansarewhitesbutduet9825
      @eastasiansarewhitesbutduet9825 Před 2 lety +2

      Not really. Well, EE is a subset of Physics.

    • @oussamalaouadi8521
      @oussamalaouadi8521 Před 2 lety

      @@eastasiansarewhitesbutduet9825
      Yes EE is a subset of Physics.
      Information theory was coined solving EE problems ( transmission of information, communication channel characterisation and capacity, minimum compression limit, theoritical model for transmission.. etc) , and Shannon himself was an EE.
      Despite the extended use of information theory in many fields such as computer science and statistics and physics, it's historically an EE thing.

    • @nHans
      @nHans Před 2 lety +2

      ​@@oussamalaouadi8521 Dude! Engineering is nobody's subset! It's an independent and a highly rewarding profession-and it predates science by several millennia.
      Engineering *_uses_* science. It also uses modern management, finance, economics, market research, law, insurance, math, computing and other fields. That doesn't make it a "subset" of any of those fields.

  • @068LAICEPS
    @068LAICEPS Před 2 lety

    Information Theory and Claude Shannon 😍

  • @David-id6jw
    @David-id6jw Před 2 lety

    How much information/entropy is needed to encode the position of an electron in quantum theory (either before or after measurement)? What about the rest of its properties? More generally, how much information is necessary to describe any given object? And what impact does that information have on the rest of the universe?

    • @ANSIcode
      @ANSIcode Před 2 lety +1

      Surely, you don't expect to get an answer to that here in a CZcams comment? Maybe start with the wiki article on "Quantum Information"...

  • @danielg9275
    @danielg9275 Před 2 lety +2

    It is indeed

  • @Lokesh-ct8vt
    @Lokesh-ct8vt Před rokem +3

    Question : is this entropy in anyway related to the thermodynamic one?

    • @temperedwell6295
      @temperedwell6295 Před rokem +3

      I am no expert, so please correct me if I am wrong. As I understand, entropy was first introduced by Carnot, Clausius, and Kelvin as a macroscopic quantity whose differential temperature is integrated with respect to to give energy. Boltzman was the first to relate macroscopic quantities of thermodynamics, i.e., heat and entropy to what is happing on the molecular level. He discovered that entropy is related to the number of microstates associated to a macrostate, and as such is a measure of disorder of the system of molecules. Nyquist, Hartley, and Shannon extended Boltzman's work by replacing statistics on microsystems of molecules to statistics on messages formed from a finite set of symbols.

    • @danielbrockerttravel
      @danielbrockerttravel Před 3 měsíci

      Related but not identical because the thermodynamic one still hasn't been worked out and because Shannon never defined meaning. I strongly suspect that solving those two will allow for a unification.

  • @tlrndk123
    @tlrndk123 Před 8 měsíci

    the comments in this video are surprisingly informative

  • @laurenpinschannels
    @laurenpinschannels Před 2 lety +1

    if you don't specify what base of log you mean, it's base NaN

  • @assepa
    @assepa Před 2 lety

    Nice workplace setup, having a coffee machine next to your screen 😀

  • @TheArrogantMonk
    @TheArrogantMonk Před 2 lety +2

    Extremely clever bit on such a fascinating subject!

  • @arinc9
    @arinc9 Před 2 lety

    I understood not much because of my bad math but this was fun to watch

  • @sanderbos4243
    @sanderbos4243 Před rokem

    I loved this

  • @dixztube
    @dixztube Před rokem

    I got the talis tails one on a guess and now I understand the allure of gambling and casinos it’s fun psychologically

  • @sdutta8
    @sdutta8 Před měsícem

    We claim Shannon as a communication theorist, rather than a computer theorist, but concede with Shakespeare: what’s in a name.

  • @pedro_8240
    @pedro_8240 Před 2 lety

    6:58 in absolute terms, no, not really, but when you start taking into consideration the chances of just randomly getting your hands on a winning ticket, without actively looking for a ticket, any ticket, that's a whole other story.

  • @Veptis
    @Veptis Před rokem

    Variance as the derivation of the expected value is the interesting concept of statistics, entropy as the amount of information is the interesting concept.of information theory.
    But I feel like they kinda do the same.

  • @filipo4114
    @filipo4114 Před 2 lety

    1:54 - "A bit more." - "That's right - one bit more" ;D

  • @user-fd9rx8dh9b
    @user-fd9rx8dh9b Před 9 měsíci

    Hey, I wrote an article using information theory, I was hoping I could share it and receive some feedback?

  • @h0w1347
    @h0w1347 Před 2 lety

    thanks

  • @retropaganda8442
    @retropaganda8442 Před 2 lety +1

    4:02 Surprise, the paper has changed! ;p

  • @inuwara6293
    @inuwara6293 Před 2 lety

    Wow 👍Very interesting

  • @YouPlague
    @YouPlague Před rokem +1

    I already knew everything he talked about, but boy this was such a nice concise way of presenting it to laymen!

  • @AntiWanted
    @AntiWanted Před 2 lety

    Nice

  • @juliennapoli
    @juliennapoli Před 2 lety +1

    Can we imagine a binary lottery where you bet on a 16bits séquence of 0 an 1 ?

  • @abiabi6733
    @abiabi6733 Před 2 lety

    wait, so this is base on probability?

  • @jimjackson4256
    @jimjackson4256 Před 9 měsíci

    Actually I wouldn’t be surprised at any combination of heads and tails.If it was purely random why would any combination be surprising?

  • @johnhammer8668
    @johnhammer8668 Před 2 lety

    how can a bit be floating point

  • @pedropeixoto5532
    @pedropeixoto5532 Před rokem

    It is really maddening when someone calls Shannon a Computer Scientist. It would be a terrible anachronism if Electrical Engineering didn't exist!
    He was really (a mathematician and) an Electrical Engineer and not only The father of Information Theory, but The father of Computer Engineering (as a subarea of Electronics Engeneering), i.e., the first to systematize the analysis of logic circuits for implementing computers in his famous masters thesis, "A Symbolic Analysis of Relay and Switching Circuits", before gifting us with Information Theory.
    CS diverges from EE in the sense EE cares about the computing "primitives". Quoting Brian Harvey:
    "Computer Science is not about computers and it is not a science [...] a more appropriate term would be 'Software Engineering'".
    Finally, I think CS is beaultiful and has a father that is below no one, Turing.

  • @blayral
    @blayral Před 2 lety

    i said head for the first throw, tail-tail for the second. i'm 3 bits surprised...

  • @sedrickalcantara9588
    @sedrickalcantara9588 Před 2 lety

    Shoutout to Thanos and Nebula in the thumbnail

  • @GordonjSmith1
    @GordonjSmith1 Před 2 lety +2

    I am not sure that the understanding of 'information theory' has been moved forward by this vlog, which is unusual for Computerphile. In 'digital terms' it might have been better to explain Claude Shannon's paper first, but from an 'Information professional's perspective' this was not an easy watch.

  • @desmondbrown5508
    @desmondbrown5508 Před 2 lety

    What is the known compression minimum size for things like RAW text or RAW image files? I'm very curious. I wish they'd have given some examples of known quantities of common file types.

    • @damicapra94
      @damicapra94 Před 2 lety +4

      It's not really the file type, rather the file contents that determine it's ideal minimum size.
      At the end of the day, files are simply a collection of bits. Wheter they represent text, images, video or more.

    • @Madsy9
      @Madsy9 Před 2 lety

      @@damicapra94 The content *and* the compressor and decompressor. Different file formats use different compression algorithms or different combinations of them. And lossy compression algorithms often care a great deal about the structure of the data (image, audio, ..).

  • @levmarcus8198
    @levmarcus8198 Před 2 lety

    I want an expresso machine right on my desk.

  • @joey199412
    @joey199412 Před 2 lety +1

    Amazing video, title should have been something else because I was expecting something mundane, not to have my mind blown and look at computation differently forever.

  • @Maynard0504
    @Maynard0504 Před rokem

    I have the same coffee machine

  • @Wyvernnnn
    @Wyvernnnn Před 2 lety +15

    The formula log(1/p(n)) was explained as if it was arbitrary, it’s not

    • @OffTheWeb
      @OffTheWeb Před 2 lety +2

      experiment with it yourself.

  • @KX36
    @KX36 Před 2 lety

    after all that you could have at least given us some lottery numbers at the end

  • @liambarber9050
    @liambarber9050 Před 2 lety

    My suprisal was very high @4:58

  • @GordonjSmith1
    @GordonjSmith1 Před 2 lety +3

    Let me add a 'thought experiment'. Some people spend money every week on the Lottery, their chance of winning is very small. So what is the difference between a 'smart' investment strategy' and an 'information' based strategy? Answer: Rational investors will consider their chances of winning and conclude that for every dollar extra they invest (say from one dollar to two dollars) their chance will increase proportionally. An 'Information engaged' person will see that the chance of winning is entirely remote, and increasing the investment hardly improves the chances, in this case they know that in order to 'win' they need to be 'in', but even the smallest amount spent is nearly as likely to win as those who place more bets. No !! Scream the 'numbers' people, but 'Yes'!!! scream anyone who has considered the opposite case. The chance of winning is so small that the increase in paying for more Lotto numbers really does not do that much to improve the payback from entering, better to be 'just in' than 'in for a lot'...

  • @jamsenbanch
    @jamsenbanch Před rokem

    It makes me uncomfortable when people flip coins and don’t catch them

  • @anorak9383
    @anorak9383 Před 2 lety +2

    Eighth

  • @user-js5tk2xz6v
    @user-js5tk2xz6v Před 2 lety

    So there is one arbitrary equation and I don't understand form where it came and also what is it's purpose.
    And once he said that 0.0000000X is minimal amount of bits ,but then he says he needs 1 bit for information about wining and 0 for losing, so it seems the minimal amount of bits to store information is always 1, so how can it be smaller than 1 ?

    • @shigotoh
      @shigotoh Před 2 lety +1

      A value of 0.01 means that you can store on average 100 instances of such information in 1 bit. It is true that when storing only one piece of information it cannot use less than one bit.

    • @hhill5489
      @hhill5489 Před 2 lety

      You typically take the ceiling of that function output when thinking practically about it, or for computers. Essentially, the information contained was that miniscule number, but realistically you still need 1 bit to represent it. For an event that is guaranteed, or probablity 100% /1.0, there is 0 information gained by its observance....therefore it takes zero bits to represent that sort of event.

    • @codegeek98
      @codegeek98 Před 2 lety

      You only have fractional bits in _practice_ with amortization (or reliably if the draws are batched).

  • @rmsgrey
    @rmsgrey Před 2 lety +2

    "We will talk about the lottery in one minute".
    Three minutes and 50 seconds later...

  • @Andrewsarcus
    @Andrewsarcus Před 2 lety

    Explain TLA+

  • @filda2005
    @filda2005 Před 2 lety

    8:34 No one really no one has been rolling on the floor?
    LOOL and in addition the cold blood face to it. It's like visa card, you can't buy that with money.

  • @CalvinHikes
    @CalvinHikes Před rokem

    I'm just good enough at math to not play the lottery.

  • @eliavrad2845
    @eliavrad2845 Před 2 lety

    The "reasonable intuition" about this formula is that, if there are two independent things, such as a coin flip and a lottery ticket, the information about them should be a sort of sum
    H(surprise about a coinflip and a lottery result)=H(surprise about coinflip result)+H(surprise about lottery result)
    but the probabilities should be multiplication
    p(head and win lottery)=p(head)p(win)
    and the best way to get from multiplication to addition is a log
    Log(p(head)p(win))=Log(p(head)) + Log(p(win))

  • @hypothebai4634
    @hypothebai4634 Před 2 lety

    So, Claude Shannon was a figure in communications electronics - not computer science. And, in fact, the main use of the Shannon Limit was in RF modulation (which is not part of computer science).

  • @mcjgenius
    @mcjgenius Před 2 lety

    wow ty🦩

  • @thomassylvester9484
    @thomassylvester9484 Před 2 lety

    “Expected amount of surprisal” seems like quite an oxymoron.

  • @danielbrockerttravel
    @danielbrockerttravel Před 3 měsíci

    I cannot believe that philosophers, who always annoying go on about what stuff 'really means' never thought to try to update Shannon's theory to include meaning. Shannon very purposefully excludes meaning from his analysis of information. Which means it provides an incomplete picture.
    In order for information to be surprising, it has to say something about a system that a recipient doesn't know. This provides a clue as to what meaning is- a configuration of a system. If a system configuration is already known, then no information about it will be surprising to the recipient. If the system configuration changes, then the amount of surprise the information contains will increase in proportion.
    In order for information to be informative there must be meanings to communicate, which means that meaning is ontologically prior to information.
    All of reality is composed of networks and these networks exhibit patterns. In networks with enough variety of patterns to be codable, you create the preconditions for information.

  • @karavanidet
    @karavanidet Před 11 měsíci

    Very difficult :)

  • @hypothebai4634
    @hypothebai4634 Před 2 lety

    The logs that Shannon originally used were natural logs (base e) for obvious reasons.

  • @TheCellarGuardian
    @TheCellarGuardian Před 2 lety +1

    Great video! But terribile title... Of course it's important!

  • @atrus3823
    @atrus3823 Před 2 lety

    This explains why they don't announce the losers!

  • @pgriggs2112
    @pgriggs2112 Před 2 lety

    Lies! I zip my zip files to save even more space!

  • @ThomasSirianniEsq
    @ThomasSirianniEsq Před 7 měsíci

    Wow. Reminds me how stupid I am

  • @BAMBAMBAMBAMBAMval
    @BAMBAMBAMBAMBAMval Před 7 měsíci

    A bit 😂

  • @kofiamoako3098
    @kofiamoako3098 Před 2 lety +1

    So no jokes in the comments??

  • @artic0203
    @artic0203 Před 2 lety

    i solved AI join me now before we run out of time

  • @atsourno
    @atsourno Před 2 lety +3

    First 🤓

  • @zxuiji
    @zxuiji Před 2 lety +1

    Hate to be pedantic but a coin flip has more than 2 possible outcomes, there's the edge after all, it's the reason why getting either side is not a flat 50%
    Likewise with dice, they have edges and corners, they can also be an outcome, it's just made rather unlikely due to the air circulation and the lack of resistance vs the the full drag of the landing zone, by full drag I mean the earth dragging it along while rotating and by lack of resistance I mean that not enough air molecules not slam into it through their own drag state, thereby allowing it to just roll over/under the few that do

    • @galliman123
      @galliman123 Před 2 lety +1

      Except you just rule those out and skew the probability 🙃

    • @roninpawn
      @roninpawn Před 2 lety +1

      There is no indication, whatsoever, that you "hate to be pedantic" about this. ;)

    • @zxuiji
      @zxuiji Před 2 lety

      @@roninpawn ever heard of OCD, it's similar, I couldn't ignore the compulsion to correct the info

    • @zxuiji
      @zxuiji Před 2 lety

      @@galliman123 except that gives erroneous results, the bane of experiments and utilization

    • @JansthcirlU
      @JansthcirlU Před 2 lety

      doing statistics is all about confidence intervals, the reason why you're allowed to ignore those edge cases is that they only negligibly affect the odds of those events you are interested in

  • @elijahromer6544
    @elijahromer6544 Před 2 lety

    IN FIRST

  • @muskduh
    @muskduh Před rokem

    thanks