The definition of TF (Term Frequency) is wrongly explained in the video as compared to what is written. It should be the frequency of term(feature) i in the document j divided by the maximum frequency of any other term k in the same document j.
Thank you sir, your videos are excellent. I am in a Data Science bootcamp and your videos are the perfect complexity level for me. Huge help! Please keep them coming!
Thank you for putting all together! Just one small question on TF, the simplest understanding is - count(term i) in Doc j/sum(count(term x where x in Doc j) in Doc j), right? Not only word 'apple' itself.
i dont mean to be so off topic but does anybody know of a way to get back into an instagram account? I was stupid forgot the account password. I would appreciate any tips you can offer me
I think you could divide 1 by the number of stars. For exemple: if you have 5 stars, you make 1/5 =0.2 , and you make repartition on stars (Star 1: [0, 0.2[, Star 2: [0.2 ,0.4[; Star 3: [0.4 ,0.6[; Star 4: [0.6 ,0.8[ ; Star 5: [0.8, 1]) It's just a simple idea and not a theoritical method. good luck
Thank you! How can I take into consideration the different sizes of documents. For example in a 200 word document I have 20x the word apple in a 10000 word document I have 300 timeS the word apple
Thank you for your clear explanation and great effort sir. And what is evaluation techniques to evaluate these content based recommendation systems sir, Could you please answer it will be more helpful sir
I dispute the normalized ratings for Actor A and B at 11:51 (1 and -2/3)... I don't think it's right to examine these two movie attributes in isolation. The representation of the five movies should be as follows: (Actor A, Actor B) ... (1,0) (1,0) (0,1) (0,1) (0,1) Multiply each of these by the user's normalized ratings and sum them, and divide by FIVE. You get: (+0.4, -0.4). NOT (+1, -0.66) Anyone disagree?
So in that case if there are 'a' actors across all movies, then each item vector would be 'a' long and further if there are 'g' genres across movies then the item vector would now be a+g long. So an item vector will be as long as the sum of values of each feature . Say Actor and Genre are two features per movie. And across any number of movies you have 2 actors and 4 genres, then item vector would have 6 components ( Actor1, Actor2, Genre1, Genre2, Genre3, Genre 4 ) and NOT just 2 components ( Actor, Genre). is this correct ?
Also another doubt i have is , for a new item in the item catalog , would the item vector just be 1s and 0s ? ( presence of Actor a =1, absence = 0 and say if its a comedy then 1 else 0) So if item vector's features are ( Actor A, ActorB, Comedy, Action, Thriller, Drama) and movie 6 was a comedy staring actor A then item vector for Movie 6 would be (1,0,1,0,0,0) ? And you compare this with the user profile vector for cosine similarity is this understanding correct ?
Great content! Can you provide few examples of other aggregation techniques to create user profiles?
Thanks in a million. Awesome. Where have you been all these years.
The definition of TF (Term Frequency) is wrongly explained in the video as compared to what is written. It should be the frequency of term(feature) i in the document j divided by the maximum frequency of any other term k in the same document j.
Yes, I wanted to comment it, but I saw you had already mentioned it :)
This is the explanation of the formula, but is this wright? This means we divide by the most frequent term in document j.
yea, that has been referenced in section 3.1.3, I wanted to comment it and saw your comment. Thanks!
Great work!
I also recommend to make tutorial on "Matrix Factorization" Methods as used in "Recommender Systems".
did you find any tutorials on that topic? i'm struggling to find more advanced, real-world recommender system tutorials with python
Thank you sir, your videos are excellent. I am in a Data Science bootcamp and your videos are the perfect complexity level for me. Huge help! Please keep them coming!
for image features, can you just use resnet output?
Thank you professor.
Thanks that helps
Amazing video!!
Thank you for putting all together! Just one small question on TF, the simplest understanding is - count(term i) in Doc j/sum(count(term x where x in Doc j) in Doc j), right? Not only word 'apple' itself.
i dont mean to be so off topic but does anybody know of a way to get back into an instagram account?
I was stupid forgot the account password. I would appreciate any tips you can offer me
Based on the Cosine Similarity , we get a value between [0,1]. How do you get the user rating after this step?
I think you could divide 1 by the number of stars.
For exemple: if you have 5 stars, you make 1/5 =0.2 , and you make repartition on stars
(Star 1: [0, 0.2[,
Star 2: [0.2 ,0.4[;
Star 3: [0.4 ,0.6[;
Star 4: [0.6 ,0.8[ ;
Star 5: [0.8, 1])
It's just a simple idea and not a theoritical method.
good luck
Thanks Sir
Thank you! How can I take into consideration the different sizes of documents. For example in a 200 word document I have 20x the word apple in a 10000 word document I have 300 timeS the word apple
TF-IDF
Thank you for your clear explanation and great effort sir. And what is evaluation techniques to evaluate these content based recommendation systems sir, Could you please answer it will be more helpful sir
I dispute the normalized ratings for Actor A and B at 11:51 (1 and -2/3)...
I don't think it's right to examine these two movie attributes in isolation.
The representation of the five movies should be as follows:
(Actor A, Actor B) ... (1,0) (1,0) (0,1) (0,1) (0,1)
Multiply each of these by the user's normalized ratings and sum them, and divide by FIVE. You get:
(+0.4, -0.4).
NOT (+1, -0.66)
Anyone disagree?
So in that case if there are 'a' actors across all movies, then each item vector would be 'a' long and further if there are 'g' genres across movies then the item vector would now be a+g long. So an item vector will be as long as the sum of values of each feature . Say Actor and Genre are two features per movie. And across any number of movies you have 2 actors and 4 genres, then item vector would have 6 components ( Actor1, Actor2, Genre1, Genre2, Genre3, Genre 4 ) and NOT just 2 components ( Actor, Genre). is this correct ?
Also another doubt i have is , for a new item in the item catalog , would the item vector just be 1s and 0s ? ( presence of Actor a =1, absence = 0 and say if its a comedy then 1 else 0) So if item vector's features are ( Actor A, ActorB, Comedy, Action, Thriller, Drama) and movie 6 was a comedy staring actor A then item vector for Movie 6 would be (1,0,1,0,0,0) ? And you compare this with the user profile vector for cosine similarity is this understanding correct ?
very low volume
Is Content Based Recommendations called an algorithm in Machine Learning?
I am lost, man.
damn I was actually looking for "based content" on youtube wtf
bhai tu kitna dehre bolta hai hai be