6 4 Laplace Approximation | Machine Learning
Vložit
- čas přidán 7. 10. 2022
- LAPLACE APPROXIMATION
One strategy
Pick a distribution to approximate p(wjx; y). We will say
p(wjx; y) ≈ Normal(µ; Σ):
Now we need a method for setting µ and Σ.
Laplace approximations
Using a condensed notation, notice from Bayes rule that
p(wjx; y) = R eeln lnpp((yy;;wwjjxx))dw:
We will approximate ln p(y; wjx) in the numerator and denominator
LAPLACE APPROXIMATION
Let’s define f (w) = ln p(y; wjx).
Taylor expansions
We can approximate f (w) with a second order Taylor expansion.
Recall that w 2 Rd+1. For any point z 2 Rd+1,
f (w) ≈ f (z) + (w − z)Trf (z) + 1
2
(w − z)T r2f (z) (w − z)
The notation rf (z) is short for rwf (w)jz, and similarly for the matrix of
second derivatives. We just need to pick z.
The Laplace approximation defines z = wMAP
LAPLACE APPROXIMATION (SOLVING)
Recall f (w) = ln p(y; wjx) and z = wMAP. From Bayes rule and the Laplace
approximation we now have
p(wjx; y) = R eeff((ww))dw
≈
e f (z)+(w−z)Trf (z)+ 1 2 (w−z)T(r2f (z))(w−z)
R e f (z)+(w−z)Trf (z)+ 1 2 (w−z)T (r2f (z))(w−z)dw
This can be simplified in two ways,
1. The term e f (wMAP) in the numerator and denominator can be viewed as a
constant since it doesn’t vary in w. It therefore cancels out.
2. By definition of how we find wMAP, the vector rw ln p(y; wjx)jwMAP = 0.
LAPLACE APPROXIMATION (SOLVING)
We’re therefore left with the approximation
p(wjx; y) ≈ e−
12
(w−wMAP)T(−r2 ln p(y;wMAPjx))(w−wMAP)
R e− 1 2 (w−wMAP)T (−r2 ln p(y;wMAPjx))(w−wMAP)dw
The solution comes by observing that this is a multivariate normal,
p(wjx; y) ≈ Normal(µ; Σ);
where
µ = wMAP; Σ = −r2 ln p(y; wMAPjx)−1
We can take the second derivative (Hessian) of the log joint likelihood to find
r2 ln p(y; wMAPjx) = −λI −
nXi=1
σ(yi · xiTwMAP) 1 − σ(yi · xiTwMAP) xixi
#laplace #laplacetransform
Find videos about :-
#ArtificialIntelligence #ai #AI #DataScience #MachineLearning #DeepLearning #NeuralNetworks #ArtificialNeuralNetwork #ann #ConvolutionalNeuralNetwork #cnn #RecurrentNeuralNetwork #rnn #LongShortTermMemory #lstm #GatedRecurrentUnit #gru #ComputerVision #NaturalLanguageProcessing #nlp #Nltk #Spacy #Tensorflow #LinearRegression #LogisticRregression #KNearestNeighbour #knn #DecisionTree #RandomForest #SupportVectorMachine #svm #clustering #cluster #pca #ensemble #Sklearn #Python #Django #DjangoRestFramework - Věda a technologie
Great explanation, better than my lecture notes
Thanks for the clear and concise explanation!
thx, it helps!
Fantastic explanation! Can’t understand why there’s no one found this video.
Thank You ... Discoverability will take more time, i suppose