137
1 844 228

Understanding ChatGPT and LLMs from Scratch - Part 2

25:28

Understanding ChatGPT and LLMs from Scratch - Part 1

34:21

Understanding BERT Embeddings and How to Generate them in SageMaker

13:40

Understanding Coordinate Descent

5:59

Bootstrap and Monte Carlo Methods

17:15

Maximum Likelihood as Minimizing KL Divergence

10:34

Limitations of the ChatGPT and LLMs - Part 3

If you haven't watched the Part 1 and Part 2, I highly suggest watching them before watching the Part 3.
Large Language Models (LLMs) have shown a huge potential and recently the have drawn much attention. In this presentation, Ameet Deshpande and Alexander Wettig gives a detailed explanation about how Large Language Models and ChatGPT works. He makes clear that he does not assume that the audience has any prior knowledge about language models. He starts with embedding and give an explanation about Transformers as well. This is the last episode of this amazing serie. Thanks for watching.

zhlédnutí: 677

Video

Understanding ChatGPT and LLMs from Scratch - Part 2

25:28

Understanding ChatGPT and LLMs from Scratch - Part 2

zhlédnutí 866Před rokem

Large Language Models (LLMs) have shown a huge potential and recently the have drawn much attention. In this presentation, Ameet Deshpande and Alexander Wettig gives a detailed explanation about how Large Language Models and ChatGPT works. He makes clear that he does not assume that the audience has any prior knowledge about language models. He starts with embedding and give an explanation abou...

Understanding ChatGPT and LLMs from Scratch - Part 1

34:21

Understanding ChatGPT and LLMs from Scratch - Part 1

zhlédnutí 3,4KPřed rokem

Understanding BERT Embeddings and How to Generate them in SageMaker

13:40

Understanding BERT Embeddings and How to Generate them in SageMaker

zhlédnutí 4,5KPřed rokem

Course link: www.coursera.org/learn/ml-pipelines-bert In this course, you will use BERT for the same purpose. Before diving into the BERT algorithm, I will highlight a few differences between BlazingText and BERT at a very high level. As you can see here, BlazingText is based on Word2Vec, whereas BERT is based on transformer architecture. Both BlazingText and BERT generate word embeddings. Howe...

5:59

Understanding Coordinate Descent

zhlédnutí 7KPřed rokem

Course link: www.coursera.org/learn/ml-regression let's just have a little aside on the coordinate decent algorithm, and then we're gonna describe how to apply coordinate descent to solving our lasso objective. So, our goal here is to minimize sub function g. So, this is the same objective that we have whether we are talking about our closed form solution, gradient descent, or this coordinate d...

17:15

Bootstrap and Monte Carlo Methods

zhlédnutí 7KPřed rokem

Here we look at the two main concepts that are behind this revolution, the Monte Carlo method and the bootstrap. We will discuss the main principles behind these methods and then see how to apply them in various important contexts, such as in regression and for constructing confidence intervals. Course link: www.coursera.org/learn/stanford-statistics/

Maximum Likelihood as Minimizing KL Divergence

10:34

Maximum Likelihood as Minimizing KL Divergence

zhlédnutí 2,8KPřed 2 lety

While the Bayes' formula for the posterior probability or for parameters given the data is very general, there are some interesting special cases where that can be analyzed separately. Let's look at them in a sequence. The first special case arises when the model is a fixed one and for all. In this case, we can drop the conditioning on M in this formula. The Bayesian evidence, in this case, is ...

16:17

Understanding The Shapley Value

zhlédnutí 14KPřed 2 lety

Shapley Value is one of the most prominent ways of dividing up the value of a society, the productive value of some, set of individuals among its members. The Shapley Value is, is based on Lloyd Shapley's idea that members should basically be receiving things which are proportional to their marginal contributions. So, basically we look at what, what does a person add when we add them to a group...

5:01

Kalman Filter - Part 2

zhlédnutí 26KPřed 3 lety

Course Link: www.coursera.org/learn/state-estimation-localization-self-driving-cars Let's consider our Kalman Filter from the previous lesson and use it to estimate the position of our autonomous car. If we have some way of knowing the true position of the vehicle, for example, an oracle tells us, we can then use this to record a position error of our filter at each time step k. Since we're dea...

8:35

Kalman Filter - Part 1

zhlédnutí 97KPřed 3 lety

This course will introduce you to the different sensors and how we can use them for state estimation and localization in a self-driving car. By the end of this course, you will be able to: - Understand the key methods for parameter and state estimation used for autonomous driving, such as the method of least-squares - Develop a model for typical vehicle localization sensors, including GPS and I...

Recurrent Neural Networks (RNNs) and Vanishing Gradients

5:43

Recurrent Neural Networks (RNNs) and Vanishing Gradients

zhlédnutí 8KPřed 3 lety

For one, the way plain or vanilla RNN model sequences by recalling information from the immediate past, allows you to capture dependencies to a certain degree, at least. They're also relatively lightweight compared to other n-gram models, taking up less RAM and space. But there are downsides, the RNNs architecture optimized for recalling the immediate past causes it to struggle with longer sequ...

Transformers vs Recurrent Neural Networks (RNN)!

6:28

Transformers vs Recurrent Neural Networks (RNN)!

zhlédnutí 21KPřed 3 lety

Course link: www.coursera.org/learn/attention-models-in-nlp/lecture/glNgT/transformers-vs-rnns Using an RNN, you have to take sequential steps to encode your input, and you start from the beginning of your input making computations at every step until you reach the end. At that point, you decode the information following a similar sequential procedure. As you can see here, you have to go throug...

Language Model Evaluation and Perplexity

6:46

Language Model Evaluation and Perplexity

zhlédnutí 18KPřed 3 lety

Course Link: www.coursera.org/lecture/probabilistic-models-in-nlp/language-model-evaluation-SEO4T Transcript: In this video I'll show you how to evaluate a language model. The metric for this is called perplexity and I will explain what this is. First, you'll divide the text corpus into train validation and test data, then you will dive into the concepts of perplexity an important metric used t...

Common Patterns in Time Series: Seasonality, Trend and Autocorrelation

5:06

Common Patterns in Time Series: Seasonality, Trend and Autocorrelation

zhlédnutí 8KPřed 4 lety

Course link: www.coursera.org/learn/tensorflow-sequences-time-series-and-prediction Time-series come in all shapes and sizes, but there are a number of very common patterns. So it's useful to recognize them when you see them. For the next few minutes we'll take a look at some examples. The first is trend, where time series have a specific direction that they're moving in. As you can see from th...

Limitations of Graph Neural Networks (Stanford University)

1:26:35

Limitations of Graph Neural Networks (Stanford University)

zhlédnutí 14KPřed 4 lety

Limitations of Graph Neural Networks (Stanford University)

Understanding Metropolis-Hastings algorithm

9:49

Understanding Metropolis-Hastings algorithm

zhlédnutí 69KPřed 4 lety

Understanding Metropolis-Hastings algorithm

Learning to learn: An Introduction to Meta Learning

1:27:17

Learning to learn: An Introduction to Meta Learning

zhlédnutí 27KPřed 4 lety

Learning to learn: An Introduction to Meta Learning

Page Ranking: Web as a Graph (Stanford University 2019)

1:26:56

Page Ranking: Web as a Graph (Stanford University 2019)

zhlédnutí 3,4KPřed 4 lety

Page Ranking: Web as a Graph (Stanford University 2019)

Deep Graph Generative Models (Stanford University - 2019)

1:22:31

Deep Graph Generative Models (Stanford University - 2019)

zhlédnutí 19KPřed 4 lety

Deep Graph Generative Models (Stanford University - 2019)

Graph Node Embedding Algorithms (Stanford - Fall 2019)

1:29:00

Graph Node Embedding Algorithms (Stanford - Fall 2019)

zhlédnutí 67KPřed 4 lety

Graph Node Embedding Algorithms (Stanford - Fall 2019)

Graph Representation Learning (Stanford university)

1:16:53

Graph Representation Learning (Stanford university)

zhlédnutí 94KPřed 4 lety

Graph Representation Learning (Stanford university)

13:22

Understanding Word Embeddings

zhlédnutí 10KPřed 4 lety

Understanding Word Embeddings

Variational Autoencoders - Part 2 ( Modeling a Distribution of Images )

10:33

Variational Autoencoders - Part 2 ( Modeling a Distribution of Images )

zhlédnutí 1,6KPřed 4 lety

Variational Autoencoders - Part 2 ( Modeling a Distribution of Images )

Variational Autoencoders - Part 1 (Scaling Variational Inference & Unbiased estimates)

6:26

Variational Autoencoders - Part 1 (Scaling Variational Inference & Unbiased estimates)

zhlédnutí 2,9KPřed 4 lety

Variational Autoencoders - Part 1 (Scaling Variational Inference & Unbiased estimates)

6:58

DBSCAN: Part 2

zhlédnutí 21KPřed 5 lety

DBSCAN: Part 2

8:21

DBSCAN: Part 1

zhlédnutí 29KPřed 5 lety

DBSCAN: Part 1

12:13

Gaussian Mixture Models for Clustering

zhlédnutí 90KPřed 5 lety

Gaussian Mixture Models for Clustering

Understanding Irreducible Error and Bias (By Emily Fox)

6:27

Understanding Irreducible Error and Bias (By Emily Fox)

zhlédnutí 7KPřed 5 lety

Understanding Irreducible Error and Bias (By Emily Fox)

Python Libraries for Machine Learning You Must Know!

4:40

Python Libraries for Machine Learning You Must Know!

zhlédnutí 1,9KPřed 5 lety

Python Libraries for Machine Learning You Must Know!

12:41

Conditional Probability

zhlédnutí 1,4KPřed 5 lety

Conditional Probability

Komentáře

@homeycheese1 Před 10 dny
will coordinate descent always converge using LASSO even if the ratio of number of features to number of observations/samples is large?
@muhammadaneeqasif572 Před 20 dny
amazing great to see some good content again thank yt algorithm keep it up
@stewpatterson1369 Před 22 dny
best video i've seen on this. great visuals & explanation
@pnachtwey Před 24 dny
This works ok on nice functions like g(x,y)=x^2+y^2 but real data often looks more like Grand Canyon where the path is very narrow and very windy.
@sELFhATINGiNDIAN Před měsícem
No
@kacpersarnowski7969 Před měsícem
Great video, you are the best :)
@frielruambil6275 Před měsícem
Thanks very much, I was looking for such videos to answer my assignment questions and you answered all of them at once within 3 minutes. I salute you,please keep on do more videos to assist the students to pass their exams and assignments.
@NeverHadMakingsOfAVarsityAthle Před 2 měsíci
Hey! Thanks for the fantastic content :) I'm trying to understand the additivity axiom a bit better. Is this axiom the main reason why Shapley values for machine learning forecast can just be added up for one feature over many different predictions? Let's say we can have predictions for two different days in a time series and each time we calculate the shapley value for the price value. Does the additivity axiom then imply that I can add up the Shapley values for price for these two predictions (assuming they are independent) to make a statement about the importance of price over multiple predictions?
@somerset006 Před 3 měsíci
What about self-driving rockets?
@paaabl0. Před 4 měsíci
Shapley values are great, but not gonna help you much with complex non-linear patterns, especially in terms of global feature importance
@williamstorey5024 Před 4 měsíci
what is text regression?
@yandajiang1744 Před 5 měsíci
Awesome explanation
@user-vh9de5dy9q Před 5 měsíci
Why are the given weights for the distributions, are not really showcasing the distributions on the graph. I mean i would choose π1 = 45, π2 = 35, π3 = 20
@thechannelwithoutanyconten6364 Před 5 měsíci
Two things: 1. What the H matrix is has not been described. 2. One non s1x1 matrix cannot be smaller or greater then another. This is sloppy. Besides that, it is a great work.
@obensustam3574 Před 5 měsíci
I wish there was a Part 3 :(
@DenguBoom Před 6 měsíci
Hi, about the sample has X1 to Xn, do X1 and Xn have to be different? Because you have a previous sample of 100 height from 100 different people. Or it can be like we treated in bootstrap that X1* to Xn* can be drawn randomly from X1 to Xn so basically can draw same height of a single person?
@feriyonika7078 Před 6 měsíci
Thanks, I can more understand about KF.
@usurper1091 Před 6 měsíci
7:10
@lingfengzhang2943 Před 7 měsíci
Thanks! It's very clear
@user-uk2rv4kt8d Před 7 měsíci
very good video. perfect explaination!
@sadeghmirzaei9330 Před 7 měsíci
Thank you so much for your explanation.🎉
@laitinenpp Před 7 měsíci
Great job, thank you!
@SCramah13 Před 8 měsíci
Clean explanation. Thank you very much...cheers~
@felipela2227 Před 8 měsíci
Your explanation was great, thx
@vambire02 Před 8 měsíci
Disappointed ☹️ no part 3
@Commonsenseisrare Před 9 měsíci
Amazing lecture of gnns.
@cmobarry Před 10 měsíci
I like your term "Word Algebra". It might be unintended side effect but I have been pondering it for years!
@rakr6635 Před 10 měsíci
no part 3, sad 😥
@vgreddysaragada Před 10 měsíci
Great work..
@boussouarsari4482 Před 11 měsíci
I believe there might be an issue with the perplexity formula. How can we refer to 'w' as the test set containing 'm' sentences, denoting 'm' as the number of sentences, and then immediately after state that 'm' represents the number of all words in the entire test set? This description lacks clarity and coherence. Could you please clarify this part to make it more understandable?
@GrafBazooka Před 11 měsíci
i cant concentrate she is too hot 🤔😰
@sunnelyeh Před 11 měsíci
this video represent meaning that F/A 18 has capability locked UFO!
@thefantasticman Před 11 měsíci
hard to foucus on ppt can any one explain me why ?
@nunaworship Před 11 měsíci
Can you please share the link for the books you recommended!
@AoibhinnMcCarthy Před rokem
Hard to follow not concise.
@jcorona4755 Před rokem
Pagan porque vean que tiene más seguidores. De echo pagas $10 pesos por cada video
@g-code9821 Před rokem
Isn't the positional encoding done with the sinusoidal function?
@homataha5626 Před rokem
Hello, Thank you for sharing. Do you have the code repositiry? I only learn after I implemented it.
@MachineLearningTV Před rokem
Unfortunately, no..
@because2022 Před rokem
Great content.
@robinranabhat3125 Před rokem
Anyone. at 31:25, shouldn't the final equation at bottom-right be about minimizing the loss. think that's a typo.
@Karl_with_a_K Před rokem
I have run into token exhaustion while working with GPT4 specifically when it is giving programming language output. Im assuming resolving this will be a component of GPT5...
@yifan1342 Před rokem
sound quality is terrible
@nehalkalita Před 10 měsíci
Turning on subtitles can be helpful to some extent.
@majidafra Před rokem
I deeply envy those who have been in your NN & DL class.
@josephzhu5129 Před rokem
Great lecture, he knows how to explain complicated ideas, thanks a lot!
@chris-dx6oh Před rokem
Great video
@ssvl2204 Před rokem
Very nice and conscise presentation, thanks!
@zhaobryan4441 Před rokem
super super clear!
@lara6893 Před rokem
Emily and Carlos rock, heck yeah!!
@StratosFair Před rokem
Great video ! Are you guys planning to upload follow up lectures on this topic ?
@StratosFair Před rokem
Where is the video on recursive least squares though ?

Machine Learning TV

Komentáře