25
2 062 512

The Most Important (and Surprising) Result from Information Theory

9:10

What happens at the Boundary of Computation?

14:59

The Boundary of Computation

12:59

Why Do Neural Networks Love the Softmax?

10:47

Policy Gradient Methods | Reinforcement Learning Part 6

29:05

Function Approximation | Reinforcement Learning Part 5

21:16

Is the Future of Linear Algebra.. Random?

The machine learning consultancy: truetheta.io
Want to work together? See here: truetheta.io/about/#want-to-work-together
"Randomization is arguably the most exciting and innovative idea to have hit linear algebra in a long time." - First line of the Blendenpik paper, H. Avron et al.
Follow up post: truetheta.io/concepts/linear-algebra/randomized-numerical-linear-algebra/
SOCIAL MEDIA
LinkedIn : www.linkedin.com/in/dj-rich-90b91753/
Twitter : DuaneJRich
Github: github.com/Duane321
SUPPORT
www.patreon.com/MutualInformation
SOURCES
Source [1] is the paper that caused me to create this video. [3], [7] and [8] provided a broad and technical view of randomization as a strategy for NLA. [9] and [12] informed me about the history of NLA. [2], [4], [5], [6], [10], [11], [13] and [14] provide concrete algorithms demonstrating the utility of randomization.
[1] Murray et al. Randomized Numerical Linear Algebra. arXiv:2302.11474v2 2023
[2] Melnichenko et al. CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT). arXiv:2311.08316v1 2023
[3] P. Drineas and M. Mahoney. RandNLA: Randomized Numerical Linear Algebra. Communications of the ACM. 2016
[4] N. Halko, P. Martinsson, and J. Tropp. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. arXiv:0909.4061v2 2010
[5] Tropp et al. Fixed Rank Approximation of a Positive-Semidefinite Matrix from Streaming Data. NeurIPS Proceedings. 2017
[6] X. Meng, M. Saunders, and M. Mahoney. LSRN: A Parallel Iterative Solver for Strongly Over- Or Underdetermined Systems. SIAM 2014
[7] D. Woodruff. Sketching as a Tool for Numerical Linear Algebra. IBM Research Almaden. 2015
[8] M. Mahoney. Randomized Algorithms for Matrices and Data. arXiv:1104.5557v3. 2011
[9] G. Golub and H van der Vorst. Eigenvalue Computation in the 20th Century. Journal of Computational and Applied Mathematics. 2000
[10] J. Duersch and M. Gu. Randomized QR with Column Pivoting. arXiv:1509.06820v2 2017
[11] Erichson et al. Randomized Matrix Decompositions Using R. Journal of Statistical Software. 2019
[12] J. Gentle et al. Software for Numerical Linear Algebra. Springer. 2017
[13] H. Avron, P. Maymounkov, and S. Toledo. Blendenpik: Supercharging LAPACK's Least-Squares Solver. Siam. 2010
[14] M. Mahoney and P. Drineas. CUR Matrix Decompositions for Improved Data Analysis. Proceedings of the National Academy of Sciences. 2009
TIMESTAMPS
0:00 Significance of Numerical Linear Algebra (NLA)
1:35 The Paper
2:20 What is Linear Algebra?
5:57 What is Numerical Linear Algebra?
8:53 Some History
12:22 A Quick Tour of the Current Software Landscape
13:42 NLA Efficiency
16:06 Rand NLA's Efficiency
18:38 What is NLA doing (generally)?
20:11 Rand NLA Performance
26:24 What is NLA doing (a little less generally)?
31:30 A New Software Pillar
32:43 Why is Rand NLA Exceptional?
34:01 Follow Up Post and Thank You's

zhlédnutí: 206 615

Video

The Most Important (and Surprising) Result from Information Theory

9:10

The Most Important (and Surprising) Result from Information Theory

zhlédnutí 83KPřed 7 měsíci

The machine learning consultancy: truetheta.io Want to work together? See here: truetheta.io/about/#want-to-work-together Information Theory contains one idea in particular that has had an incredibly impact on our society. David MacKay's lecture: czcams.com/play/PLruBu5BI5n4aFpG32iMbdWoRVAA-Vcso6.html SOCIAL MEDIA LinkedIn : www.linkedin.com/in/dj-rich-90b91753/ Twitter : DuaneJRich...

What happens at the Boundary of Computation?

14:59

What happens at the Boundary of Computation?

zhlédnutí 56KPřed 9 měsíci

The machine learning consultancy: truetheta.io In this video, we look inside the bizarre busy beaver function. SOCIAL MEDIA LinkedIn : www.linkedin.com/in/dj-rich-90b91753/ Twitter : DuaneJRich Github: github.com/Duane321 Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: www.patreon.com/MutualInformation REFERENCE NOTES As mentioned, [1] was th...

12:59

The Boundary of Computation

zhlédnutí 924KPřed 10 měsíci

The machine learning consultancy: truetheta.io Want to work together? See here: truetheta.io/about/#want-to-work-together There is a limit to how much work algorithms can do. SOCIAL MEDIA LinkedIn : www.linkedin.com/in/dj-rich-90b91753/ Twitter : DuaneJRich Github: github.com/Duane321 Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: www.patreo...

Why Do Neural Networks Love the Softmax?

10:47

Why Do Neural Networks Love the Softmax?

zhlédnutí 63KPřed 11 měsíci

The machine learning consultancy: truetheta.io Want to work together? See here: truetheta.io/about/#want-to-work-together Neural Networks see something special in the softmax function. SOCIAL MEDIA LinkedIn : www.linkedin.com/in/dj-rich-90b91753/ Twitter : DuaneJRich Github: github.com/Duane321 Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: ...

Policy Gradient Methods | Reinforcement Learning Part 6

29:05

Policy Gradient Methods | Reinforcement Learning Part 6

zhlédnutí 23KPřed rokem

The machine learning consultancy: truetheta.io Want to work together? See here: truetheta.io/about/#want-to-work-together Policy Gradient Methods are among the most effective techniques in Reinforcement Learning. In this video, we'll motivate their design, observe their behavior and understand their background theory. SOCIAL MEDIA LinkedIn : www.linkedin.com/in/dj-rich-90b91753/ Twitter : twitt...

Function Approximation | Reinforcement Learning Part 5

21:16

Function Approximation | Reinforcement Learning Part 5

zhlédnutí 16KPřed rokem

The machine learning consultancy: truetheta.io Want to work together? See here: truetheta.io/about/#want-to-work-together Here, we learn about Function Approximation. This is a broad class of methods for learning within state spaces that are far too large for our previous methods to work. This is part five of a six part series on Reinforcement Learning. SOCIAL MEDIA LinkedIn : www.linkedin.com/...

Temporal Difference Learning (including Q-Learning) | Reinforcement Learning Part 4

28:39

Temporal Difference Learning (including Q-Learning) | Reinforcement Learning Part 4

zhlédnutí 26KPřed rokem

The machine learning consultancy: truetheta.io Want to work together? See here: truetheta.io/about/#want-to-work-together Part four of a six part series on Reinforcement Learning. As the title says, it covers Temporal Difference Learning, Sarsa and Q-Learning, along with some examples. SOCIAL MEDIA LinkedIn : www.linkedin.com/in/dj-rich-90b91753/ Twitter : DuaneJRich Github: github....

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

27:06

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

zhlédnutí 35KPřed rokem

The machine learning consultancy: truetheta.io Want to work together? See here: truetheta.io/about/#want-to-work-together Part three of a six part series on Reinforcement Learning. It covers the Monte Carlo approach a Markov Decision Process with mere samples. At the end, we touch on off-policy methods, which enable RL when the data was generate with a different agent. SOCIAL MEDIA LinkedIn : w...

Bellman Equations, Dynamic Programming, Generalized Policy Iteration | Reinforcement Learning Part 2

21:33

Bellman Equations, Dynamic Programming, Generalized Policy Iteration | Reinforcement Learning Part 2

zhlédnutí 49KPřed rokem

The machine learning consultancy: truetheta.io Want to work together? See here: truetheta.io/about/#want-to-work-together Part two of a six part series on Reinforcement Learning. We discuss the Bellman Equations, Dynamic Programming and Generalized Policy Iteration. SOCIAL MEDIA LinkedIn : www.linkedin.com/in/dj-rich-90b91753/ Twitter : DuaneJRich Github: github.com/Duane321 Enjoy l...

18:19

Reinforcement Learning, by the Book

zhlédnutí 73KPřed rokem

The machine learning consultancy: truetheta.io Want to work together? See here: truetheta.io/about/#want-to-work-together Part one of a six part series on Reinforcement Learning. If you want to understand the fundamentals in a short amount of time, you're in the right place. SOCIAL MEDIA LinkedIn : www.linkedin.com/in/dj-rich-90b91753/ Twitter : DuaneJRich Github: github.com/Duane32...

9:01

The Kelly Criterion

zhlédnutí 85KPřed 2 lety

The machine learning consultancy: truetheta.io Want to work together? See here: truetheta.io/about/#want-to-work-together The Kelly Criterion provides the optimal strategy when betting on random outcomes with known probabilities. SOCIAL MEDIA LinkedIn : www.linkedin.com/in/dj-rich-90b91753/ Twitter : DuaneJRich Github: github.com/Duane321 Enjoy learning this way? Want me to make mor...

12:46

Importance Sampling

zhlédnutí 54KPřed 2 lety

The machine learning consultancy: truetheta.io Want to work together? See here: truetheta.io/about/#want-to-work-together Calculating expectations is frequent task in Machine Learning. Monte Carlo methods are some of our most effective approaches to this problem, but they can suffer from high variance estimates. Importance Sampling is a clever technique to obtain lower variance estimates. SOCIA...

17:46

Factor Analysis and Probabilistic PCA

zhlédnutí 18KPřed 2 lety

The machine learning consultancy: truetheta.io Want to work together? See here: truetheta.io/about/#want-to-work-together Factor Analysis and Probabilistic PCA are classic methods to capture how observations 'move together'. SOCIAL MEDIA LinkedIn : www.linkedin.com/in/dj-rich-90b91753/ Twitter : DuaneJRich Enjoy learning this way? Want me to make more videos? Consider supporting me ...

17:06

The Exponential Family (Part 2)

zhlédnutí 7KPřed 2 lety

The Exponential Family (Part 2)

23:47

Gaussian Processes

zhlédnutí 114KPřed 2 lety

Gaussian Processes

15:58

The Exponential Family (Part 1)

zhlédnutí 20KPřed 2 lety

The Exponential Family (Part 1)

13:24

The Principle of Maximum Entropy

zhlédnutí 25KPřed 2 lety

The Principle of Maximum Entropy

14:07

The Bernstein Basis

zhlédnutí 9KPřed 2 lety

The Bernstein Basis

12:01

Linear Regression in 12 minutes

zhlédnutí 6KPřed 3 lety

Linear Regression in 12 minutes

17:28

The Fisher Information

zhlédnutí 60KPřed 3 lety

The Fisher Information

10:55

How to Learn Probability Distributions

zhlédnutí 38KPřed 3 lety

How to Learn Probability Distributions

15:24

The Bias Variance Trade-Off

zhlédnutí 14KPřed 3 lety

The Bias Variance Trade-Off

6:57

Jensen's Inequality

zhlédnutí 25KPřed 3 lety

Jensen's Inequality

Komentáře

@pepinzachary Před 15 hodinami
Fantastic video, well done! I'm watching for path tracing rather than ML :)
@kephalopod3054 Před 23 hodinami
The cost of housing grows faster than the busy beaver.
@David-ld5us Před dnem
Random projections enthusiast here, had some expectation with the video until the definition: "Linear Algebra is the "mathematics" of vectors and matrices"... hmmmm... but one can do linear algebra with polynomials, and "derivative" operators, I don't know... that sounds far beyond an oversimplification.
@Redstoner34526 Před dnem
I wonder what the largest computable, but not stupid function would be, like not just doing the power to tetration to pentation to hexation to so on or something like that.
@cziffras9114 Před dnem
Now the true question is: how can one be clearer than that? Wonderful work, thank you so much
@user-uf4rx5ih3v Před 2 dny
Actually, randomizing algorithms is very common and little to do with linear algebra specifically. The reason why it works depends on the algorithm, it's a bit like magic honestly, not well understood at all. But more often then not it has to do with choice. To solve many algorithms, you have to explore certain paths. This can be recursive, that is, once you start exploring a choice path, you then have to explore more choices. The choice function matters, some functions work better then others, but computing statistics to make the best choice is expensive. One option is to choose at random instead. The reason this sometimes works is because making one "bad" choice based on the statistic you know will likely mean making more "bad" choices with the same statistics in many situations. Thus, by choosing randomly, you tend to settle for an "average". This can be applied to other choice functions. Say you want to compute some statistics to make a choice function, but computing them every branch is expensive so you want to do it at intervals only, but how many intervals is good? One way is to ransomizs the intervals. Again it's the same thing, you don't know the best interval so you settle for an "average".
@jordancobb7553 Před 2 dny
These are some busy ass beavers
@montyhall2805 Před 2 dny
You're already doing least squares - so you're already accepting you're going to an have an approximate answer. If the randomized solution adds little to the error but is significantly faster, that's great.
@TheG0ldx Před 3 dny
I don’t get the part of the 27-state machine. Let’s imagine that I create one 27-state machine and I define such as the state to transition into is Halt for all state. Then it will Halt at first step. Have I just proven that Goldbach conjecture is false? I know I haven’t, so could anyone be so kind to clarify what he meant by « there exists »?
@TheG0ldx Před 3 dny
Ah sorry, I guess I understood now. We know that there is one specific 27 state machine that would halt iif Gb conjecture is false. Now, do we know which one is but we’re not capable of checking if it halts because the computations are too long, or we just don’t know the definition of this specific machine?
@bestechdeals4539 Před 3 dny
Linear Algebra doesn't really deserve to be attacked like this. What did he do to you?
@billsimons4113 Před 3 dny
It looks like sketch-and-solve is an automated version of dimensionality reduction, akin to principal component analysis?
@Mutual_Information Před 3 dny
Yea that's certainly fair to say. SA is to project A into a lower dimension.
@raulgalets Před 3 dny
So, analog computers are back at the game?
@snapman218 Před 3 dny
If you actually wrote this out as python code, rather than a bunch of fancy looking math, it's not that impressive
@chunheichau7947 Před 4 dny
You brought up a good point though. Maybe the current limitation is not the speed of compute; rather, it is the speed of data transfer.
@theclockmaster Před 4 dny
Where is the paper?
@yobabadakong8137 Před 4 dny
1:32 Could you make a video that goes through these 10 algorithms? That'd be awesome!
@everyfuckingnametake Před 4 dny
max cringe detected
@usernameisamyth Před 5 dny
great stuff
@justme7415 Před 5 dny
The Theory of Computation has some of the most beautiful and surprising ideas.
@TripImmigration Před 5 dny
I will love you do a video applying this in a code
@ericc6820 Před 5 dny
man j really wish these kinda of videos existed when I was in school. I would have reached my math potential instead of getting bored and losing interest because my teachers didn’t know how to teach.
@sillysausage4549 Před 5 dny
Parts of this borrow too heavily from Tony Padilla's exploration of Graham's Number
@karsultimatelifeform2620 Před 6 dny
How would this function do agains Rayos(N)?
@Mutual_Information Před 5 dny
Rayo grows faster.
@zerotwo7319 Před 6 dny
Man, I hate that this has nothing to do with neurons, or anything biologically inspired. great explanation to see what is really going on. but this has nothing to do with intelligence.
@JHillMD Před 6 dny
What a terrific video and channel. Great work! Subbed.
@huckleberryfinn8795 Před 7 dny
To be fair, back in Kelly's time, there was significant misinformation on the dangers of smoking. It didnt really become common knowledge of how bad it is for us until decades after he passed away.
@Houshalter Před 4 dny
Yeah experiments at the time put rats in cages with much higher cigarette smoke than any human would ever breathe. And didn't find any higher rate of cancer. Among other experiments. It was basically considered debunked pseudoscience at the time.
@sahilx4954 Před 7 dny
I came here to learn something, and after watching the video, I learned that I should've paid more attention to the maths because I couldn't understand it. 🙃
@Mutual_Information Před 7 dny
Hey it's never too late to get interested :)
@sahilx4954 Před 6 dny
@@Mutual_Information I'll never stop trying if you promise not to stop making great videos like these.
@Mutual_Information Před 6 dny
@@sahilx4954 Deal!
@thygrrr Před 7 dny
I don't understand how Fortran is still relevant. Is it?!? Is cuBLAS used by any Fortran compiler?!?
@Datamining101 Před 7 dny
The matrix shapes you’re discussing here are common in machine learning but uncommon in the scientific computing programs that you started the video with. We do almost no dense computation in modern scientific computing. For instance the finite element algorithm you rendered is dominated by sparse krylov solves.
@Mutual_Information Před 7 dny
Interesting - Yea, during scripting Riley actually pointing out the wording 'typical in practice' wasn't true in the general case. In retrospect and since I was selling the general case, I should have worded differently. Thanks for note anyway
@jerryware1970 Před 7 dny
Fortune’s Formula…great book on the subject
@abdullahsheriff_ Před 8 dny
This. Is. Beautiful. Intuition 100.
@GhostZeroGZ Před 8 dny
I can compute all these numbers for humanity if we made me immortal. We should do that
@leslyscarlettineofranco9093 Před 8 dny
Math? = Meth Amazing and engaging description, thank you!
@TheDoc-Worker Před 8 dny
I subscribed, but if you fix the backgrounds on your monitors to actually align continuously, I'll share your channel with a few friends. As it stands, I have to keep this between us, I can't be associated with this kind of thing.
@Mutual_Information Před 8 dny
lol!
@mtteslian9159 Před 9 dny
Amazing!!
@JaredJeyaretnam Před 9 dny
This is really interesting - I spend all day solving linear algebra problems for my work (quantum physics research), and this involves matrices the scale exponentially with the size of the problem. Accuracy does matter - in what I’m working on now, machine precision is a genuinely annoying limit I’m butting up against! - but a controlled approximation that brings down the cost of these problems by an entire complexity class would be incredible. However, we’re almost always working with square matrices, not tall and skinny ones, but if this is as big a game changer as this video presents, that hurdle should be overcome. We also frequently employ low-rank approximations so that’s not a problem if some of these techniques only work on those. I’d be most interested to see how this can help solve eigenvalue problems - that’s a special case of SVD, and also I believe finding the QR decomposition essentially solves it too.
@ferenccseh4037 Před 10 dny
I didn't understand a lot of things said in this video, but I know one thing for sure: I want to see games that use this with a relatively low accuracy. I just want to see what that looks like! (I'm guessing geometry would jiggle)
@amdenis Před 10 dny
Great video. There is a lot of utility for this, as well as prior art related to it, dating back over 30 years ago. Good stuff. BTW, linear does not mean what you asserted, and actually only limit cases are objects like straight lines, flat planes and such that you asserted.
@elias_toivanen Před 10 dny
I wonder if we are just now discovering what intuition is from a computational point of view. From the point of view of how the brain works, this is really fascinating stuff.
@christophgouws8311 Před 10 dny
So by how much must you shrink them?
@firefoxmetzger9063 Před 10 dny
I realize that YT comments are not the best place to explain "complex" ideas, but here it goes anyway: The head bending relative difference piece reply is "just" a coordinate transformation. At 29:45, you lay ellipses atop each other and show the absolute approximation difference between the full sample and the sketch. The "trick" is to realize that this happens in the common (base) coordinate system and that nothing stops you from changing this coordinate system. For example, you can (a) move the origin to the centroid of the sketch, (b) rotate so that X and Y align with the semi-axis of the sketch, and (c) scale (asymmetrically) so that the sketches semi-axis have length 1. What happens to the ellipsoid of the full sample in this "sketch space"? Two things happen when plotting in the new coordinate system: (1) the ellipsoid of the sketch becomes a circle around the origin (semi-axes are all 1) by construction. (2) the ellipsoid of the full sample becomes an "almost" circle depending on the quality of the approximation of the full sample by the sketch. As sample size increases, centroids converges, semi-axes start aligning, and (importantly) semi-axes get stretched/squashed until they reach length 1. Again, this is for the full sample - the sketch is already "a perfect circle by construction". In other words, as we increase the sample size of the sketch the full sample looks more and more like a unit circle in "sketch space". We can now quantify the quality of the approximation using the ratio of the full sample's semi-axis in "sketch space". If there are no relative errors (perfect approximation), these become the ratio of radii of a circle which is always 1. Any other number is due to (relative) approximation error, lower is better, and it can't be less than 1. The claim now is that, even for small samples, this ratio is already low enough for practical use, i.e., sketches with just 10 points already yield good results.
@firefoxmetzger9063 Před 10 dny
If you understand the above, then the high-dimensional part becomes clear as well: In N dimensions a "hyper-ellipsoid" has N semi-axes, and the claim is that for real (aka. sparse) problems some of these semi-axes are really large and some are really small when measured in "problem space". This relationship applied to the 2D ellipsis you show at 29:45 means that the primary axis becomes really large (stretches beyond the screen size) and the secondary axis becomes really small (squished until the border lines touch each other due to line thickness). This will make the ellipsis plot "degenerate" and it will look like a line - which is boring to visualize.
@Rudxain Před 10 dny
6:08 I'd like to point out that there's a "shortcut" Collatz fn, which is even more similar to the BB one!
@StevenSiew2 Před 10 dny
Why can't you change the estimate of the probability as you get new data about the bet? More data equals better estimate. Continuus improvement.
@EzBz982 Před 10 dny
Great video. Subscribed!
@TripImmigration Před 11 dny
We can feel how much you need to control yourself to not ask " class, are you understanding?" 😂 Thanks to reminds me of Montecarlo tho
@r.hazeleger7193 Před 11 dny
Great vid bruv
@YiqianWu-dh8nr Před 13 dny
大概捋了一下整体思路，用很多浅显易懂的描述代替了很多复杂的数学公式，让我至少明白了他的原理。感谢！
@rajatkumar.j Před 13 dny
Finally, after watching this 3 times I got the intuition of this method. Thank you for uploading a great series!
@katurday Před 14 dny
Jesus Christ 2:23 "What is linear algebra?" is akin to "At first there was nothing... then there was the big bang". Like come on, dont waste time and just drop a LA textbook in the description and get to the point.
@Mutual_Information Před 13 dny
If I do that, others won't have context. If those who already understand the background, like yourself, I have timecodes. You can skip the 'What is LA?' section.
@pedrocolangelo5844 Před 14 dny
This video is incredible. I'm really looking forward in watching other videos of yours

Mutual Information

Komentáře