NLP Demystified 4: Advanced Preprocessing (part-of-speech tagging, entity tagging, parsing)
Vložit
- čas přidán 2. 08. 2024
- Course playlist: • Natural Language Proce...
We'll look at tagging our tokens with useful information including part-of-speech tags and named entity tags. We'll also explore different types of sentence parsing to help extract the meaning of a sentence. In the demo, we'll explore how to get these things done with spaCy and how to use the library's "matchers" and other features to build simple rules-based tools.
Colab notebook: colab.research.google.com/git...
Timestamps:
00:00:00 Advanced Preprocessing
00:00:18 Part-of-Speech (PoS) Tagging
00:01:06 Uses of PoS tags
00:02:24 Named Entity Recognition (NER)
00:03:17 Uses of NER tags
00:04:08 The challenges of NER
00:04:57 PoS- and NER-tagging as sequence labelling tasks
00:07:30 Constituency parsing
00:10:24 Dependency parsing
00:12:22 Uses of parsing
00:13:46 Which parsing approach to use
00:14:14 DEMO: advanced preprocessing with spaCy
00:21:11 Preprocessing recap
This video is part of Natural Language Processing Demystified --a free, accessible course on NLP.
Visit www.nlpdemystified.org/ to learn more.
Timestamps:
00:00:00 Advanced Preprocessing
00:00:18 Part-of-Speech (PoS) Tagging
00:01:06 Uses of PoS tags
00:02:24 Named Entity Recognition (NER)
00:03:17 Uses of NER tags
00:04:08 The challenges of NER
00:04:57 PoS- and NER-tagging as sequence labelling tasks
00:07:30 Constituency parsing
00:10:24 Dependency parsing
00:12:22 Uses of parsing
00:13:46 Which parsing approach to use
00:14:14 DEMO: advanced preprocessing with spaCy
00:21:11 Preprocessing recap
Your series is a life-saver. You an EXCELLENT teacher, and your voice is radio-like. Thanks so much!
Thank you, Davy!
This series is incredible! I can't believe we get to access such content for free online... what an era
This series is excellent! I'm glad I came across this series while searching for TF-IDF & cosine similarity related materials.
the best NLP course in youtube at least to me!! Thanks so much.
With your Great NLP lectures, understand a lot of NLP concepts
Great video (and great course). Thank you, thank you, thank you.
Nitpick at 4:25
Hamilton was never a president. He was "the ten-dollar founding father without a father", but was never a president
LOL
OMFG of all the things I didn't google LOL
Great lectures.
Hello Sir!
Your course is great!
Please suggest us a course on Generative AI to easily learn the concepts like we are learning in this course!
Very grateful Sir!
Thank you!
have u find any good course> bro
After transformer is used, are these preprocessing steps still very useful?
It depends on what you want to do. The transformer is a model architecture. It's not something that automatically takes care of NLP tasks for you end to end.
When you load a transformer-based model under the hood using a library from Hugging Face, the library itself is taking care of things like tokenization (and you can customize what tokenizer it uses).
When you want to use an LLM to embed a document, you might need to do certain types of preprocessing depending on the LLM's context length or what you're trying to accomplish (e.g. you might need to enrich your data).
@@futuremojo Thank you so much. when I learned the hugging face, it skips a lot of concepts you mentioned. After you explained, I understand now because HF provide the functions to take care of these in their tokenization process.
Which software u used to generate that voice?
What makes you think it's software?
@@futuremojo it’s so clean, no disturbance nothing and fount to be bit artificial. That’s why I got this doubt. btw your explanatory skills are amazing. Hope you make more technical videos on LLM , finetuning or on transfer learning.
Why is there no 'end of entity' tag? I'm sure some might say it's redundant and unnecessary because when you come to the 'o' you are *obviously* at the end of the entity. But it is just as possible that the 'o' is a mistake, especially in a long multi word name. An end tag would be both more explicit and eliminate any ambiguity. But maybe that's just me....
Machine learning is a niche topic, sure, but how does this have so few views?
Thanks for the comment. I think it's just the search patterns. The three most-viewed videos in this series are:
- The first introduction video. This is probably people looking for introductions to NLP.
- Neural networks from scratch. This is probably because of deep learning hype and people wanting to learn fundamentals.
- Transformers. Self-explanatory.
None of these are surprising. I think the people most interested in NLP are the ones who go over the rest.
If you look at the most popular NLP-related videos now, they're about prompt engineering, Langchain, etc. Whatever's current and applied rather than foundational.
I have no idea! It beats me as well! The views on these courses should be running into millions!