Tokenization in Spacy: NLP Tutorial For Beginners - S1 E8

Sdílet
Vložit
  • čas přidán 25. 07. 2024
  • Word and sentence tokenization can be done easily using the spacy library in python. In this NLP tutorial, we will cover tokenization and a few related topics.
    NLP platform: www.firstlanguage.in/
    ⭐️ Timestamps ⭐️
    00:00 What is tokenization
    02:35 Install spacy
    02:49 Coding starts
    03:23 Basic English word tokenization
    14:15 Span object
    15:00 Token attributes
    18:40 Grab emails from the student information doc
    23:58 Tokenization in Hindi
    26:13 Customize tokenization rule
    29:52 Sentence tokenization (or segmentation)
    33:15 Exercise
    Code: github.com/codebasics/nlp-tut...
    Exercise: In the above code, go to the end and you will find exercises
    Complete NLP Playlist: • NLP Tutorial Python
    🔖Hashtags🔖
    #nlp #nlptutorial #nlppython #spacytutorial #spacytutorialnlp #spacytutorialnlp #wordtokenization #tokenizerspacy #tokenizationnlp #wordtokenizerspacy #tokenizationandspacy #spacynlp
    Do you want to learn technology from me? Check codebasics.io/?... for my affordable video courses.
    Need help building software or data analytics and AI solutions? My company www.atliq.com/ can help. Click on the Contact button on that website.
    🎥 Codebasics Hindi channel: / @codebasicshindi
    #️⃣ Social Media #️⃣
    🔗 Discord: / discord
    📸 Instagram: / codebasicshub
    🔊 Facebook: / codebasicshub
    📱 Twitter: / codebasicshub
    📝 Linkedin (Personal): / dhavalsays
    📝 Linkedin (Codebasics): / codebasics
    🔗 Patreon: www.patreon.com/codebasics?fa...
    ❗❗ DISCLAIMER: All opinions expressed in this video are of my own and not that of my employers'.

Komentáře • 58

  • @codebasics
    @codebasics  Před 2 lety +2

    Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced

  • @aakuthotaharibabu8244
    @aakuthotaharibabu8244 Před rokem +10

    SPACY makes NPL implementation easy just like the way CODEBASICS making NLP learning easy.

  • @dikshyakasaju7541
    @dikshyakasaju7541 Před 2 lety +19

    Really enjoying this playlist, and I've reached the 8th tutorial already just in 1 day. Thank you for making it interesting!

    • @codebasics
      @codebasics  Před 2 lety +2

      Glad you liked it 👍☺️

    • @somendrew
      @somendrew Před 11 měsíci +1

      in One day? Holy Moly , its my 3rd day..., How you mentally prepare for ?

  • @PrathmeshBodas
    @PrathmeshBodas Před 7 měsíci +2

    Thanks! Your videos are really helpful. You are making great job of explaining complex topics. Thanks once again

  • @gautamnayak8847
    @gautamnayak8847 Před rokem +1

    Pretty much loved it all on a watching spree 8th lesson in 24hours :)

  • @parttimelarry
    @parttimelarry Před rokem +3

    Thanks!

  • @Breaking_Bold
    @Breaking_Bold Před 10 měsíci +1

    Very nice video ...explaining NLP !!!

  • @nadianizam6101
    @nadianizam6101 Před 11 měsíci

    Excellent Explanation👍

  • @rajiv7
    @rajiv7 Před měsícem

    Simply Superb !!! Thanks a ton !!!

  • @balamuralisrinivasan7297
    @balamuralisrinivasan7297 Před 10 měsíci

    Excellent and insightful

  • @ajaythapar6169
    @ajaythapar6169 Před 9 měsíci +2

    I want to write this every time when I go through your CZcams videos (earlier Deep Leaning and now NLP)....
    You are an outstanding educator. Your practice of illustrating complex concepts with pertinent use cases adds an engaging dimension to the learning experience.
    Your proficiency in simplifying intricate ideas with clarity is truly remarkable. Your sense of timing in presenting crucial details is impeccable, and your suggested reading resources are exceptionally valuable.
    Thank you for putting your efforts in creating such useful leaning material.

    • @codebasics
      @codebasics  Před 9 měsíci +2

      Ajay, thanks for the detailed feedback and I am glad these videos are helpful to you 👍🙏

  • @nimishshirodkar
    @nimishshirodkar Před 2 lety +3

    You are the best Dhaval. I have seen many tutorials on different ML/DL/NLP topics but the way you teach is something different. It is very hands on and easy to understand. I really look forward to your videos. I recently did post graduate program in Data Science from Great Lakes but frankly, the teaching you provide is much better than some of the professors I had there. Keep it up!

  • @umeshtiwari800
    @umeshtiwari800 Před rokem

    Always very good👍

  • @harshalbhoir8986
    @harshalbhoir8986 Před rokem

    Thank you so much!!

  • @codebasics
    @codebasics  Před 2 lety

    Do you want to learn technology from me? codebasics.io is my website for video courses. First course going live in the last week of May, 2022

  • @amandaahringer7466
    @amandaahringer7466 Před 2 lety

    Thank you!

  • @saarthaksangamnerkar
    @saarthaksangamnerkar Před 2 lety +3

    Good intro into NLP concepts, Dhawal. Btw, as someone who has worked on a large scale NLP projects here in Toronto, I can vouch that FirstLanguage NLP APIs are right up there with one of the biggest cloud service providers' speech SDK - and at a fraction of cost! And the co-founder is a PhD specializing in NLP herself.

    • @codebasics
      @codebasics  Před 2 lety

      Yup, indeed the co founder is quite knowledgeable and the platform is also very well built. I suggest people to try it out, it saves you a lot of money 💵

  • @anilgupta4801
    @anilgupta4801 Před rokem

    Great videos

  • @ajaythapar6169
    @ajaythapar6169 Před 9 měsíci

    You are exception the way you expose

  • @datayogi_
    @datayogi_ Před 2 lety +4

    Hi sir, can you please share your views about data analyst jobs in government bodies in india, the pros and cons of that.

  • @celalrehmanov7052
    @celalrehmanov7052 Před 6 měsíci

    Thank you for your compliment, I am one of your sensor student :D

  • @santoshsaklani5019
    @santoshsaklani5019 Před 2 lety

    Kindly make some videos on how to vectorize source code for training DL model

  • @leensmits
    @leensmits Před měsícem

    The referred book at page vi: "If you have never studied statistics, I think this book is a good place to start. And if you have taken a traditional statistics class, I hope this book will help
    repair the damage." 😄

  • @shashankk5953
    @shashankk5953 Před 2 lety +2

    Sir is it possible to create voice recreation??
    Please make video on it☺☺☺

  • @nimishshirodkar
    @nimishshirodkar Před 2 lety

    I tried the first problem on the entire pdf using PyPDF2 library but I get some non-urls also picked up

  • @vigneshpadmanabhan
    @vigneshpadmanabhan Před rokem

    i thats because, Dr. Strange has space inbetween. when its removed. the Dr.Strange is together in one sentence. Thanks for the videos!

  • @payalGupta-jc4ow
    @payalGupta-jc4ow Před 8 měsíci

    Indeed it's a very good playlist on NLP, but can u please do some hands-on experience on audio files also. i mean if u can help me with the audio files instead of text as a data set,

  • @anirudhsom6590
    @anirudhsom6590 Před 2 měsíci

    sir how r u getting recommendation of syntax while u typing the function ?

  • @jaswanth220
    @jaswanth220 Před 2 lety +1

    Hello Dhaval, Do you have any tutorial on Spiking nueral network, or guide that could help.
    By the i have following you awesome tutorials on Nueral networks, thanks a million

    • @codebasics
      @codebasics  Před 2 lety +2

      I dont have a video on that. But I can make a note of adding that one in the future. Thanks for your appreciation

  • @Pooria.Khorrami
    @Pooria.Khorrami Před 9 měsíci

    Perfectttttttttttttttttttttttttttttttttt

  • @jesuyanmifeegbewale3883
    @jesuyanmifeegbewale3883 Před 9 měsíci

    I made it here.
    Lets see how far i can go

  • @enggm.alimirzashortclipswh6010

    Love from Pakistan 🇵🇰

  • @PrabinKumarDas001
    @PrabinKumarDas001 Před 11 měsíci

    My spacy is tokenizing words like #hello to # and hello, I want to prevent that. Is there something I can do?

  • @kirtipant949
    @kirtipant949 Před 17 dny

    In my code like_email is giving empty list

  • @saurabhupadhyay1015
    @saurabhupadhyay1015 Před rokem

    Sir I tried this code: python -m spacy download en_core_web_sm again and again but getting errors. Help

    • @priyasahu7595
      @priyasahu7595 Před 8 měsíci

      I am facing same problem. Did you find how to correct that issue?

  • @Prim0rdiaL7
    @Prim0rdiaL7 Před 2 lety

    Data Analytics by Abhay Deol

    • @codebasics
      @codebasics  Před 2 lety +4

      Ha ha.. you are probably the 5th person comparing me with Abhay Deol. Others have called me Arvind SA and also Satya nadella with hair 😂😂🤗🧐

  • @ankitverma1790
    @ankitverma1790 Před 3 měsíci

    Why spacy is tokenizing ice and cream separately in "I love ice cream" ?

  • @anidea8012
    @anidea8012 Před 2 lety +9

    "hindi is the language of my country" , plz don't use this sentence next time. this information is miss leading

    • @ChildhoodSaver
      @ChildhoodSaver Před 3 měsíci +1

      it is the national language 👍

    • @ayushbhosale2004
      @ayushbhosale2004 Před měsícem +1

      ​@@ChildhoodSaverIndia does not has any national language it was 22 official languages and 2 administrative languages hindi and English. Hindi is not our national language

  • @ramandeepbains862
    @ramandeepbains862 Před 2 lety

    Excercise 1 Solution :
    for token in doc:
    if(token.like_url):
    print(token)

  • @varshagupta328
    @varshagupta328 Před rokem

    Please help! I am not able to download the "en_core_web_sm"

    • @priyasahu7595
      @priyasahu7595 Před 8 měsíci

      I am facing the same problem. Did you find a way how to correct that issue?

  • @nitinverma_121
    @nitinverma_121 Před rokem

    answer of exercise question 2 is little wrong for cases like " i have 500 $ and the quantity of good people in the company is 10"
    This is correct:
    # Extract money
    transactions = "Tony gave two $ to Peter, Bruce gave 500 € to Steve 10"
    doc= nlp(transactions)
    ans= []
    count= len(doc)
    for token in doc:
    if token.i != count-1:
    if token.like_num and doc[token.i + 1].is_currency:
    ans.append(token.text + ' '+ doc[token.i + 1].text)
    ans

  • @amandaahringer7466
    @amandaahringer7466 Před 2 lety +2

    Thanks!

    • @codebasics
      @codebasics  Před 2 lety

      Thanks Amanda for your generous contribution