NLP Demystified 4: Advanced Preprocessing (part-of-speech tagging, entity tagging, parsing)

Sdílet
Vložit
  • čas přidán 2. 08. 2024
  • Course playlist: • Natural Language Proce...
    We'll look at tagging our tokens with useful information including part-of-speech tags and named entity tags. We'll also explore different types of sentence parsing to help extract the meaning of a sentence. In the demo, we'll explore how to get these things done with spaCy and how to use the library's "matchers" and other features to build simple rules-based tools.
    Colab notebook: colab.research.google.com/git...
    Timestamps:
    00:00:00 Advanced Preprocessing
    00:00:18 Part-of-Speech (PoS) Tagging
    00:01:06 Uses of PoS tags
    00:02:24 Named Entity Recognition (NER)
    00:03:17 Uses of NER tags
    00:04:08 The challenges of NER
    00:04:57 PoS- and NER-tagging as sequence labelling tasks
    00:07:30 Constituency parsing
    00:10:24 Dependency parsing
    00:12:22 Uses of parsing
    00:13:46 Which parsing approach to use
    00:14:14 DEMO: advanced preprocessing with spaCy
    00:21:11 Preprocessing recap
    This video is part of Natural Language Processing Demystified --a free, accessible course on NLP.
    Visit www.nlpdemystified.org/ to learn more.

Komentáře • 22

  • @futuremojo
    @futuremojo  Před 2 lety +1

    Timestamps:
    00:00:00 Advanced Preprocessing
    00:00:18 Part-of-Speech (PoS) Tagging
    00:01:06 Uses of PoS tags
    00:02:24 Named Entity Recognition (NER)
    00:03:17 Uses of NER tags
    00:04:08 The challenges of NER
    00:04:57 PoS- and NER-tagging as sequence labelling tasks
    00:07:30 Constituency parsing
    00:10:24 Dependency parsing
    00:12:22 Uses of parsing
    00:13:46 Which parsing approach to use
    00:14:14 DEMO: advanced preprocessing with spaCy
    00:21:11 Preprocessing recap

  • @davypeterbraun
    @davypeterbraun Před rokem +8

    Your series is a life-saver. You an EXCELLENT teacher, and your voice is radio-like. Thanks so much!

  • @Momiji1998
    @Momiji1998 Před 3 měsíci

    This series is incredible! I can't believe we get to access such content for free online... what an era

  • @Engineering_101_
    @Engineering_101_ Před rokem +1

    This series is excellent! I'm glad I came across this series while searching for TF-IDF & cosine similarity related materials.

  • @joely2k83
    @joely2k83 Před rokem +1

    the best NLP course in youtube at least to me!! Thanks so much.

  • @caiyu538
    @caiyu538 Před rokem

    With your Great NLP lectures, understand a lot of NLP concepts

  • @toyomicho
    @toyomicho Před rokem +1

    Great video (and great course). Thank you, thank you, thank you.
    Nitpick at 4:25
    Hamilton was never a president. He was "the ten-dollar founding father without a father", but was never a president
    LOL

    • @futuremojo
      @futuremojo  Před rokem

      OMFG of all the things I didn't google LOL

  • @caiyu538
    @caiyu538 Před rokem +1

    Great lectures.

  • @Ahbab91
    @Ahbab91 Před 11 měsíci +1

    Hello Sir!
    Your course is great!
    Please suggest us a course on Generative AI to easily learn the concepts like we are learning in this course!
    Very grateful Sir!
    Thank you!

  • @user-nm5jl8gy1u
    @user-nm5jl8gy1u Před rokem

    After transformer is used, are these preprocessing steps still very useful?

    • @futuremojo
      @futuremojo  Před rokem

      It depends on what you want to do. The transformer is a model architecture. It's not something that automatically takes care of NLP tasks for you end to end.
      When you load a transformer-based model under the hood using a library from Hugging Face, the library itself is taking care of things like tokenization (and you can customize what tokenizer it uses).
      When you want to use an LLM to embed a document, you might need to do certain types of preprocessing depending on the LLM's context length or what you're trying to accomplish (e.g. you might need to enrich your data).

    • @user-nm5jl8gy1u
      @user-nm5jl8gy1u Před rokem +1

      @@futuremojo Thank you so much. when I learned the hugging face, it skips a lot of concepts you mentioned. After you explained, I understand now because HF provide the functions to take care of these in their tokenization process.

  • @shivaram8930
    @shivaram8930 Před rokem

    Which software u used to generate that voice?

    • @futuremojo
      @futuremojo  Před rokem

      What makes you think it's software?

    • @shivaram8930
      @shivaram8930 Před rokem

      @@futuremojo it’s so clean, no disturbance nothing and fount to be bit artificial. That’s why I got this doubt. btw your explanatory skills are amazing. Hope you make more technical videos on LLM , finetuning or on transfer learning.

  • @malikrumi1206
    @malikrumi1206 Před 9 měsíci

    Why is there no 'end of entity' tag? I'm sure some might say it's redundant and unnecessary because when you come to the 'o' you are *obviously* at the end of the entity. But it is just as possible that the 'o' is a mistake, especially in a long multi word name. An end tag would be both more explicit and eliminate any ambiguity. But maybe that's just me....

  • @dmytrokulaiev9083
    @dmytrokulaiev9083 Před rokem +1

    Machine learning is a niche topic, sure, but how does this have so few views?

    • @futuremojo
      @futuremojo  Před rokem +3

      Thanks for the comment. I think it's just the search patterns. The three most-viewed videos in this series are:
      - The first introduction video. This is probably people looking for introductions to NLP.
      - Neural networks from scratch. This is probably because of deep learning hype and people wanting to learn fundamentals.
      - Transformers. Self-explanatory.
      None of these are surprising. I think the people most interested in NLP are the ones who go over the rest.
      If you look at the most popular NLP-related videos now, they're about prompt engineering, Langchain, etc. Whatever's current and applied rather than foundational.

    • @hamidadesokan6528
      @hamidadesokan6528 Před rokem

      I have no idea! It beats me as well! The views on these courses should be running into millions!