We present Skip-Gram for representing words as vectors. We present it as an alternative to Continuous Bag of Words, CBOW, and discuss the few differences between the two.
But what if the gradient changes a lot after each step - then using sum of the losses OR using each loss one after the other, would be different, right?
thank you so much, great content, seriously.
Wow. This is actually golden. Keep it up!
amazing explanation!
Great explanation sir
If in Skip-gram I multiply output from Identity activation with W', would it not give same vectors corresponding to all four words ??
But what if the gradient changes a lot after each step - then using sum of the losses OR using each loss one after the other, would be different, right?
Yes indeed. It is merely a practical heuristic that seems to work well and is inspired by this intuition.
u almost killed me