NLP Demystified 8: Text Classification With Naive Bayes (+ precision and recall)
Vložit
- čas přidán 2. 08. 2024
- Course playlist: • Natural Language Proce...
In this module, we'll apply everything we've learned so far to a core task in NLP: text classification. We'll learn:
- how to derive Bayes' theorem
- how the Naive Bayes classifier works under the hood
- how to train a Naive Bayes classifier in scikit-learn and along the way, deal with issues that come up.
- how things can go wrong when using accuracy for evaluation
- precision, recall, and using a confusion matrix
In the demo, we'll apply everything from the slides to build a full text classifier with spaCy and scikit-learn. We'll go from a bunch of raw text, preprocess and vectorize it, and build multiple versions of our text classifier, improving it each iteration.
Colab notebook: colab.research.google.com/git...
Timestamps:
00:00:00 Naive Bayes
00:00:25 Classification as a core task in NLP
00:01:11 Revisiting conditional probability
00:03:26 Deriving Bayes' Theorem
00:04:12 The parts of Bayes' Theorem
00:05:43 A spatial example using Bayes' Theorem
00:07:33 Bayes' Theorem applied to text classification
00:08:30 The "naive" in Naive Bayes
00:09:34 The need to work in log space
00:10:05 Naive Bayes training and usage
00:13:27 How the "accuracy" metric can go wrong
00:14:10 Precision, Recall, and Confusion Matrix
00:17:47 DEMO: Training and using a Naive Bayes classifier
00:36:28 Naive Bayes recap and other classification models
This video is part of Natural Language Processing Demystified --a free, accessible course on NLP.
Visit www.nlpdemystified.org/ to learn more.
Timestamps:
00:00:00 Naive Bayes
00:00:25 Classification as a core task in NLP
00:01:11 Revisiting conditional probability
00:03:26 Deriving Bayes' Theorem
00:04:12 The parts of Bayes' Theorem
00:05:43 A spatial example using Bayes' Theorem
00:07:33 Bayes' Theorem applied to text classification
00:08:30 The "naive" in Naive Bayes
00:09:34 The need to work in log space
00:10:05 Naive Bayes training and usage
00:13:27 How the "accuracy" metric can go wrong
00:14:10 Precision, Recall, and Confusion Matrix
00:17:47 DEMO: Training and using a Naive Bayes classifier
00:36:28 Naive Bayes recap and other classification models
This is one of the best video I enjoyed ever while learning machine learning. Explaining conditional probability to naive Bayes demo in detailed and still in concise way is art. Wow, this is excellent playlist.
Very well done
Thank you it was very well done!
Thanks for making it enjoyable and exciting by explaining clearly!
This video made a lot of things "click" for me, thank you!
Love to hear it!
Your explanations are simply amazing! Congrats!!!
"My toilet is haunted" at 10:30 had me cracking up 😂
Thanks so much for your clear and concise explanations and steps
Thanks for the feedback! I'm glad you're finding it useful.
@@jett_royce pease what time frame for the completion, and are you open to working on private tutorship or projects?
@@CharlesOkwuagwu My aim is to publish the remaining modules before the end of summer (a new module was published today). Unfortunately, I don't have time for private tutorship but you can email me and I'll try to point you in the right direction.
sir is it possible that you can provide the link to the slides? Please! I would be really appreciate it.
I think When we are using Naive Bayes for text classification, it calculates probabilities at the token level rather than for intact words. For example, at 10:52 when considering the word "vaccine", the model is actually looking at the likelihood of the token "vaccine" rather than the exact word. This is an important distinction, because tokenizing the input texts allows the model to handle different word forms, capture useful n-gram contexts, and focus on predictive tokens. So the probabilities are calculated for tokens P(token|class) rather than unique words P(word|class). Tokenization before applying Naive Bayes is a key step, as it allows the model to better capture patterns and meaning from the input texts. Correct me if I am wrong