Machine Learning TV
Machine Learning TV
  • 137
  • 1 844 228
Limitations of the ChatGPT and LLMs - Part 3
If you haven't watched the Part 1 and Part 2, I highly suggest watching them before watching the Part 3.
Large Language Models (LLMs) have shown a huge potential and recently the have drawn much attention. In this presentation, Ameet Deshpande and Alexander Wettig gives a detailed explanation about how Large Language Models and ChatGPT works. He makes clear that he does not assume that the audience has any prior knowledge about language models. He starts with embedding and give an explanation about Transformers as well. This is the last episode of this amazing serie. Thanks for watching.
zhlédnutí: 677

Video

Understanding ChatGPT and LLMs from Scratch - Part 2
zhlédnutí 866Před rokem
Large Language Models (LLMs) have shown a huge potential and recently the have drawn much attention. In this presentation, Ameet Deshpande and Alexander Wettig gives a detailed explanation about how Large Language Models and ChatGPT works. He makes clear that he does not assume that the audience has any prior knowledge about language models. He starts with embedding and give an explanation abou...
Understanding ChatGPT and LLMs from Scratch - Part 1
zhlédnutí 3,4KPřed rokem
Large Language Models (LLMs) have shown a huge potential and recently the have drawn much attention. In this presentation, Ameet Deshpande and Alexander Wettig gives a detailed explanation about how Large Language Models and ChatGPT works. He makes clear that he does not assume that the audience has any prior knowledge about language models. He starts with embedding and give an explanation abou...
Understanding BERT Embeddings and How to Generate them in SageMaker
zhlédnutí 4,5KPřed rokem
Course link: www.coursera.org/learn/ml-pipelines-bert In this course, you will use BERT for the same purpose. Before diving into the BERT algorithm, I will highlight a few differences between BlazingText and BERT at a very high level. As you can see here, BlazingText is based on Word2Vec, whereas BERT is based on transformer architecture. Both BlazingText and BERT generate word embeddings. Howe...
Understanding Coordinate Descent
zhlédnutí 7KPřed rokem
Course link: www.coursera.org/learn/ml-regression let's just have a little aside on the coordinate decent algorithm, and then we're gonna describe how to apply coordinate descent to solving our lasso objective. So, our goal here is to minimize sub function g. So, this is the same objective that we have whether we are talking about our closed form solution, gradient descent, or this coordinate d...
Bootstrap and Monte Carlo Methods
zhlédnutí 7KPřed rokem
Here we look at the two main concepts that are behind this revolution, the Monte Carlo method and the bootstrap. We will discuss the main principles behind these methods and then see how to apply them in various important contexts, such as in regression and for constructing confidence intervals. Course link: www.coursera.org/learn/stanford-statistics/
Maximum Likelihood as Minimizing KL Divergence
zhlédnutí 2,8KPřed 2 lety
While the Bayes' formula for the posterior probability or for parameters given the data is very general, there are some interesting special cases where that can be analyzed separately. Let's look at them in a sequence. The first special case arises when the model is a fixed one and for all. In this case, we can drop the conditioning on M in this formula. The Bayesian evidence, in this case, is ...
Understanding The Shapley Value
zhlédnutí 14KPřed 2 lety
Shapley Value is one of the most prominent ways of dividing up the value of a society, the productive value of some, set of individuals among its members. The Shapley Value is, is based on Lloyd Shapley's idea that members should basically be receiving things which are proportional to their marginal contributions. So, basically we look at what, what does a person add when we add them to a group...
Kalman Filter - Part 2
zhlédnutí 26KPřed 3 lety
Course Link: www.coursera.org/learn/state-estimation-localization-self-driving-cars Let's consider our Kalman Filter from the previous lesson and use it to estimate the position of our autonomous car. If we have some way of knowing the true position of the vehicle, for example, an oracle tells us, we can then use this to record a position error of our filter at each time step k. Since we're dea...
Kalman Filter - Part 1
zhlédnutí 97KPřed 3 lety
This course will introduce you to the different sensors and how we can use them for state estimation and localization in a self-driving car. By the end of this course, you will be able to: - Understand the key methods for parameter and state estimation used for autonomous driving, such as the method of least-squares - Develop a model for typical vehicle localization sensors, including GPS and I...
Recurrent Neural Networks (RNNs) and Vanishing Gradients
zhlédnutí 8KPřed 3 lety
For one, the way plain or vanilla RNN model sequences by recalling information from the immediate past, allows you to capture dependencies to a certain degree, at least. They're also relatively lightweight compared to other n-gram models, taking up less RAM and space. But there are downsides, the RNNs architecture optimized for recalling the immediate past causes it to struggle with longer sequ...
Transformers vs Recurrent Neural Networks (RNN)!
zhlédnutí 21KPřed 3 lety
Course link: www.coursera.org/learn/attention-models-in-nlp/lecture/glNgT/transformers-vs-rnns Using an RNN, you have to take sequential steps to encode your input, and you start from the beginning of your input making computations at every step until you reach the end. At that point, you decode the information following a similar sequential procedure. As you can see here, you have to go throug...
Language Model Evaluation and Perplexity
zhlédnutí 18KPřed 3 lety
Course Link: www.coursera.org/lecture/probabilistic-models-in-nlp/language-model-evaluation-SEO4T Transcript: In this video I'll show you how to evaluate a language model. The metric for this is called perplexity and I will explain what this is. First, you'll divide the text corpus into train validation and test data, then you will dive into the concepts of perplexity an important metric used t...
Common Patterns in Time Series: Seasonality, Trend and Autocorrelation
zhlédnutí 8KPřed 4 lety
Course link: www.coursera.org/learn/tensorflow-sequences-time-series-and-prediction Time-series come in all shapes and sizes, but there are a number of very common patterns. So it's useful to recognize them when you see them. For the next few minutes we'll take a look at some examples. The first is trend, where time series have a specific direction that they're moving in. As you can see from th...
Limitations of Graph Neural Networks (Stanford University)
zhlédnutí 14KPřed 4 lety
Limitations of Graph Neural Networks (Stanford University)
Understanding Metropolis-Hastings algorithm
zhlédnutí 69KPřed 4 lety
Understanding Metropolis-Hastings algorithm
Learning to learn: An Introduction to Meta Learning
zhlédnutí 27KPřed 4 lety
Learning to learn: An Introduction to Meta Learning
Page Ranking: Web as a Graph (Stanford University 2019)
zhlédnutí 3,4KPřed 4 lety
Page Ranking: Web as a Graph (Stanford University 2019)
Deep Graph Generative Models (Stanford University - 2019)
zhlédnutí 19KPřed 4 lety
Deep Graph Generative Models (Stanford University - 2019)
Graph Node Embedding Algorithms (Stanford - Fall 2019)
zhlédnutí 67KPřed 4 lety
Graph Node Embedding Algorithms (Stanford - Fall 2019)
Graph Representation Learning (Stanford university)
zhlédnutí 94KPřed 4 lety
Graph Representation Learning (Stanford university)
Understanding Word Embeddings
zhlédnutí 10KPřed 4 lety
Understanding Word Embeddings
Variational Autoencoders - Part 2 ( Modeling a Distribution of Images )
zhlédnutí 1,6KPřed 4 lety
Variational Autoencoders - Part 2 ( Modeling a Distribution of Images )
Variational Autoencoders - Part 1 (Scaling Variational Inference & Unbiased estimates)
zhlédnutí 2,9KPřed 4 lety
Variational Autoencoders - Part 1 (Scaling Variational Inference & Unbiased estimates)
DBSCAN: Part 2
zhlédnutí 21KPřed 5 lety
DBSCAN: Part 2
DBSCAN: Part 1
zhlédnutí 29KPřed 5 lety
DBSCAN: Part 1
Gaussian Mixture Models for Clustering
zhlédnutí 90KPřed 5 lety
Gaussian Mixture Models for Clustering
Understanding Irreducible Error and Bias (By Emily Fox)
zhlédnutí 7KPřed 5 lety
Understanding Irreducible Error and Bias (By Emily Fox)
Python Libraries for Machine Learning You Must Know!
zhlédnutí 1,9KPřed 5 lety
Python Libraries for Machine Learning You Must Know!
Conditional Probability
zhlédnutí 1,4KPřed 5 lety
Conditional Probability

Komentáře

  • @homeycheese1
    @homeycheese1 Před 10 dny

    will coordinate descent always converge using LASSO even if the ratio of number of features to number of observations/samples is large?

  • @muhammadaneeqasif572
    @muhammadaneeqasif572 Před 20 dny

    amazing great to see some good content again thank yt algorithm keep it up

  • @stewpatterson1369
    @stewpatterson1369 Před 22 dny

    best video i've seen on this. great visuals & explanation

  • @pnachtwey
    @pnachtwey Před 24 dny

    This works ok on nice functions like g(x,y)=x^2+y^2 but real data often looks more like Grand Canyon where the path is very narrow and very windy.

  • @sELFhATINGiNDIAN
    @sELFhATINGiNDIAN Před měsícem

    No

  • @kacpersarnowski7969
    @kacpersarnowski7969 Před měsícem

    Great video, you are the best :)

  • @frielruambil6275
    @frielruambil6275 Před měsícem

    Thanks very much, I was looking for such videos to answer my assignment questions and you answered all of them at once within 3 minutes. I salute you,please keep on do more videos to assist the students to pass their exams and assignments.

  • @NeverHadMakingsOfAVarsityAthle

    Hey! Thanks for the fantastic content :) I'm trying to understand the additivity axiom a bit better. Is this axiom the main reason why Shapley values for machine learning forecast can just be added up for one feature over many different predictions? Let's say we can have predictions for two different days in a time series and each time we calculate the shapley value for the price value. Does the additivity axiom then imply that I can add up the Shapley values for price for these two predictions (assuming they are independent) to make a statement about the importance of price over multiple predictions?

  • @somerset006
    @somerset006 Před 3 měsíci

    What about self-driving rockets?

  • @paaabl0.
    @paaabl0. Před 4 měsíci

    Shapley values are great, but not gonna help you much with complex non-linear patterns, especially in terms of global feature importance

  • @williamstorey5024
    @williamstorey5024 Před 4 měsíci

    what is text regression?

  • @yandajiang1744
    @yandajiang1744 Před 5 měsíci

    Awesome explanation

  • @user-vh9de5dy9q
    @user-vh9de5dy9q Před 5 měsíci

    Why are the given weights for the distributions, are not really showcasing the distributions on the graph. I mean i would choose π1 = 45, π2 = 35, π3 = 20

  • @thechannelwithoutanyconten6364

    Two things: 1. What the H matrix is has not been described. 2. One non s1x1 matrix cannot be smaller or greater then another. This is sloppy. Besides that, it is a great work.

  • @obensustam3574
    @obensustam3574 Před 5 měsíci

    I wish there was a Part 3 :(

  • @DenguBoom
    @DenguBoom Před 6 měsíci

    Hi, about the sample has X1 to Xn, do X1 and Xn have to be different? Because you have a previous sample of 100 height from 100 different people. Or it can be like we treated in bootstrap that X1* to Xn* can be drawn randomly from X1 to Xn so basically can draw same height of a single person?

  • @feriyonika7078
    @feriyonika7078 Před 6 měsíci

    Thanks, I can more understand about KF.

  • @usurper1091
    @usurper1091 Před 6 měsíci

    7:10

  • @lingfengzhang2943
    @lingfengzhang2943 Před 7 měsíci

    Thanks! It's very clear

  • @user-uk2rv4kt8d
    @user-uk2rv4kt8d Před 7 měsíci

    very good video. perfect explaination!

  • @sadeghmirzaei9330
    @sadeghmirzaei9330 Před 7 měsíci

    Thank you so much for your explanation.🎉

  • @laitinenpp
    @laitinenpp Před 7 měsíci

    Great job, thank you!

  • @SCramah13
    @SCramah13 Před 8 měsíci

    Clean explanation. Thank you very much...cheers~

  • @felipela2227
    @felipela2227 Před 8 měsíci

    Your explanation was great, thx

  • @vambire02
    @vambire02 Před 8 měsíci

    Disappointed ☹️ no part 3

  • @Commonsenseisrare
    @Commonsenseisrare Před 9 měsíci

    Amazing lecture of gnns.

  • @cmobarry
    @cmobarry Před 10 měsíci

    I like your term "Word Algebra". It might be unintended side effect but I have been pondering it for years!

  • @rakr6635
    @rakr6635 Před 10 měsíci

    no part 3, sad 😥

  • @vgreddysaragada
    @vgreddysaragada Před 10 měsíci

    Great work..

  • @boussouarsari4482
    @boussouarsari4482 Před 11 měsíci

    I believe there might be an issue with the perplexity formula. How can we refer to 'w' as the test set containing 'm' sentences, denoting 'm' as the number of sentences, and then immediately after state that 'm' represents the number of all words in the entire test set? This description lacks clarity and coherence. Could you please clarify this part to make it more understandable?

  • @GrafBazooka
    @GrafBazooka Před 11 měsíci

    i cant concentrate she is too hot 🤔😰

  • @sunnelyeh
    @sunnelyeh Před 11 měsíci

    this video represent meaning that F/A 18 has capability locked UFO!

  • @thefantasticman
    @thefantasticman Před 11 měsíci

    hard to foucus on ppt can any one explain me why ?

  • @nunaworship
    @nunaworship Před 11 měsíci

    Can you please share the link for the books you recommended!

  • @AoibhinnMcCarthy
    @AoibhinnMcCarthy Před rokem

    Hard to follow not concise.

  • @jcorona4755
    @jcorona4755 Před rokem

    Pagan porque vean que tiene más seguidores. De echo pagas $10 pesos por cada video

  • @g-code9821
    @g-code9821 Před rokem

    Isn't the positional encoding done with the sinusoidal function?

  • @homataha5626
    @homataha5626 Před rokem

    Hello, Thank you for sharing. Do you have the code repositiry? I only learn after I implemented it.

  • @because2022
    @because2022 Před rokem

    Great content.

  • @robinranabhat3125
    @robinranabhat3125 Před rokem

    Anyone. at 31:25, shouldn't the final equation at bottom-right be about minimizing the loss. think that's a typo.

  • @Karl_with_a_K
    @Karl_with_a_K Před rokem

    I have run into token exhaustion while working with GPT4 specifically when it is giving programming language output. Im assuming resolving this will be a component of GPT5...

  • @yifan1342
    @yifan1342 Před rokem

    sound quality is terrible

    • @nehalkalita
      @nehalkalita Před 10 měsíci

      Turning on subtitles can be helpful to some extent.

  • @majidafra
    @majidafra Před rokem

    I deeply envy those who have been in your NN & DL class.

  • @josephzhu5129
    @josephzhu5129 Před rokem

    Great lecture, he knows how to explain complicated ideas, thanks a lot!

  • @chris-dx6oh
    @chris-dx6oh Před rokem

    Great video

  • @ssvl2204
    @ssvl2204 Před rokem

    Very nice and conscise presentation, thanks!

  • @zhaobryan4441
    @zhaobryan4441 Před rokem

    super super clear!

  • @lara6893
    @lara6893 Před rokem

    Emily and Carlos rock, heck yeah!!

  • @StratosFair
    @StratosFair Před rokem

    Great video ! Are you guys planning to upload follow up lectures on this topic ?

  • @StratosFair
    @StratosFair Před rokem

    Where is the video on recursive least squares though ?