Tokenization in Spacy: NLP Tutorial For Beginners - S1 E8

codebasics

zhlédnutí 66 578

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 25. 07. 2024
Word and sentence tokenization can be done easily using the spacy library in python. In this NLP tutorial, we will cover tokenization and a few related topics.
NLP platform: www.firstlanguage.in/
⭐️ Timestamps ⭐️
00:00 What is tokenization
02:35 Install spacy
02:49 Coding starts
03:23 Basic English word tokenization
14:15 Span object
15:00 Token attributes
18:40 Grab emails from the student information doc
23:58 Tokenization in Hindi
26:13 Customize tokenization rule
29:52 Sentence tokenization (or segmentation)
33:15 Exercise
Code: github.com/codebasics/nlp-tut...
Exercise: In the above code, go to the end and you will find exercises
Complete NLP Playlist: • NLP Tutorial Python
🔖Hashtags🔖
#nlp #nlptutorial #nlppython #spacytutorial #spacytutorialnlp #spacytutorialnlp #wordtokenization #tokenizerspacy #tokenizationnlp #wordtokenizerspacy #tokenizationandspacy #spacynlp
Do you want to learn technology from me? Check codebasics.io/?... for my affordable video courses.
Need help building software or data analytics and AI solutions? My company www.atliq.com/ can help. Click on the Contact button on that website.
🎥 Codebasics Hindi channel: / @codebasicshindi
#️⃣ Social Media #️⃣
🔗 Discord: / discord
📸 Instagram: / codebasicshub
🔊 Facebook: / codebasicshub
📱 Twitter: / codebasicshub
📝 Linkedin (Personal): / dhavalsays
📝 Linkedin (Codebasics): / codebasics
🔗 Patreon: www.patreon.com/codebasics?fa...
❗❗ DISCLAIMER: All opinions expressed in this video are of my own and not that of my employers'.

Komentáře • 58

@codebasics Před 2 lety ⁺²
Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced
@aakuthotaharibabu8244 Před rokem ⁺¹⁰
SPACY makes NPL implementation easy just like the way CODEBASICS making NLP learning easy.
@dikshyakasaju7541 Před 2 lety ⁺¹⁹
Really enjoying this playlist, and I've reached the 8th tutorial already just in 1 day. Thank you for making it interesting!
@codebasics Před 2 lety ⁺²
Glad you liked it 👍☺️
@somendrew Před 11 měsíci ⁺¹
in One day? Holy Moly , its my 3rd day..., How you mentally prepare for ?
@PrathmeshBodas Před 7 měsíci ⁺²
Thanks! Your videos are really helpful. You are making great job of explaining complex topics. Thanks once again
@gautamnayak8847 Před rokem ⁺¹
Pretty much loved it all on a watching spree 8th lesson in 24hours :)
@parttimelarry Před rokem ⁺³
Thanks!
@Breaking_Bold Před 10 měsíci ⁺¹
Very nice video ...explaining NLP !!!
@nadianizam6101 Před 11 měsíci
Excellent Explanation👍
@rajiv7 Před měsícem
Simply Superb !!! Thanks a ton !!!
@balamuralisrinivasan7297 Před 10 měsíci
Excellent and insightful
@ajaythapar6169 Před 9 měsíci ⁺²
I want to write this every time when I go through your CZcams videos (earlier Deep Leaning and now NLP)....
You are an outstanding educator. Your practice of illustrating complex concepts with pertinent use cases adds an engaging dimension to the learning experience.
Your proficiency in simplifying intricate ideas with clarity is truly remarkable. Your sense of timing in presenting crucial details is impeccable, and your suggested reading resources are exceptionally valuable.
Thank you for putting your efforts in creating such useful leaning material.
@codebasics Před 9 měsíci ⁺²
Ajay, thanks for the detailed feedback and I am glad these videos are helpful to you 👍🙏
@nimishshirodkar Před 2 lety ⁺³
You are the best Dhaval. I have seen many tutorials on different ML/DL/NLP topics but the way you teach is something different. It is very hands on and easy to understand. I really look forward to your videos. I recently did post graduate program in Data Science from Great Lakes but frankly, the teaching you provide is much better than some of the professors I had there. Keep it up!
@umeshtiwari800 Před rokem
Always very good👍
@harshalbhoir8986 Před rokem
Thank you so much!!
@codebasics Před 2 lety
Do you want to learn technology from me? codebasics.io is my website for video courses. First course going live in the last week of May, 2022
@amandaahringer7466 Před 2 lety
Thank you!
@saarthaksangamnerkar Před 2 lety ⁺³
Good intro into NLP concepts, Dhawal. Btw, as someone who has worked on a large scale NLP projects here in Toronto, I can vouch that FirstLanguage NLP APIs are right up there with one of the biggest cloud service providers' speech SDK - and at a fraction of cost! And the co-founder is a PhD specializing in NLP herself.
@codebasics Před 2 lety
Yup, indeed the co founder is quite knowledgeable and the platform is also very well built. I suggest people to try it out, it saves you a lot of money 💵
@anilgupta4801 Před rokem
Great videos
@ajaythapar6169 Před 9 měsíci
You are exception the way you expose
@datayogi_ Před 2 lety ⁺⁴
Hi sir, can you please share your views about data analyst jobs in government bodies in india, the pros and cons of that.
@celalrehmanov7052 Před 6 měsíci
Thank you for your compliment, I am one of your sensor student :D
@santoshsaklani5019 Před 2 lety
Kindly make some videos on how to vectorize source code for training DL model
@leensmits Před měsícem
The referred book at page vi: "If you have never studied statistics, I think this book is a good place to start. And if you have taken a traditional statistics class, I hope this book will help
repair the damage." 😄
@shashankk5953 Před 2 lety ⁺²
Sir is it possible to create voice recreation??
Please make video on it☺☺☺
@nimishshirodkar Před 2 lety
I tried the first problem on the entire pdf using PyPDF2 library but I get some non-urls also picked up
@vigneshpadmanabhan Před rokem
i thats because, Dr. Strange has space inbetween. when its removed. the Dr.Strange is together in one sentence. Thanks for the videos!
@payalGupta-jc4ow Před 8 měsíci
Indeed it's a very good playlist on NLP, but can u please do some hands-on experience on audio files also. i mean if u can help me with the audio files instead of text as a data set,
@anirudhsom6590 Před 2 měsíci
sir how r u getting recommendation of syntax while u typing the function ?
@jaswanth220 Před 2 lety ⁺¹
Hello Dhaval, Do you have any tutorial on Spiking nueral network, or guide that could help.
By the i have following you awesome tutorials on Nueral networks, thanks a million
@codebasics Před 2 lety ⁺²
I dont have a video on that. But I can make a note of adding that one in the future. Thanks for your appreciation
@Pooria.Khorrami Před 9 měsíci
Perfectttttttttttttttttttttttttttttttttt
@jesuyanmifeegbewale3883 Před 9 měsíci
I made it here.
Lets see how far i can go
@enggm.alimirzashortclipswh6010 Před 2 lety ⁺¹
Love from Pakistan 🇵🇰
@PrabinKumarDas001 Před 11 měsíci
My spacy is tokenizing words like #hello to # and hello, I want to prevent that. Is there something I can do?
@kirtipant949 Před 17 dny
In my code like_email is giving empty list
@saurabhupadhyay1015 Před rokem
Sir I tried this code: python -m spacy download en_core_web_sm again and again but getting errors. Help
@priyasahu7595 Před 8 měsíci
I am facing same problem. Did you find how to correct that issue?
@Prim0rdiaL7 Před 2 lety
Data Analytics by Abhay Deol
@codebasics Před 2 lety ⁺⁴
Ha ha.. you are probably the 5th person comparing me with Abhay Deol. Others have called me Arvind SA and also Satya nadella with hair 😂😂🤗🧐
@ankitverma1790 Před 3 měsíci
Why spacy is tokenizing ice and cream separately in "I love ice cream" ?
@pranavkanumuri1441 Před měsícem
Because they are seperate words
@anidea8012 Před 2 lety ⁺⁹
"hindi is the language of my country" , plz don't use this sentence next time. this information is miss leading
@ChildhoodSaver Před 3 měsíci ⁺¹
it is the national language 👍
@ayushbhosale2004 Před měsícem ⁺¹
@@ChildhoodSaverIndia does not has any national language it was 22 official languages and 2 administrative languages hindi and English. Hindi is not our national language
@ramandeepbains862 Před 2 lety
Excercise 1 Solution :
for token in doc:
if(token.like_url):
print(token)
@varshagupta328 Před rokem
Please help! I am not able to download the "en_core_web_sm"
@priyasahu7595 Před 8 měsíci
I am facing the same problem. Did you find a way how to correct that issue?
@nitinverma_121 Před rokem
answer of exercise question 2 is little wrong for cases like " i have 500 $ and the quantity of good people in the company is 10"
This is correct:
# Extract money
transactions = "Tony gave two $ to Peter, Bruce gave 500 € to Steve 10"
doc= nlp(transactions)
ans= []
count= len(doc)
for token in doc:
if token.i != count-1:
if token.like_num and doc[token.i + 1].is_currency:
ans.append(token.text + ' '+ doc[token.i + 1].text)
ans
@amandaahringer7466 Před 2 lety ⁺²
Thanks!
@codebasics Před 2 lety
Thanks Amanda for your generous contribution

Další v pořadí

Automatické přehrávání

Language Processing Pipeline in Spacy: NLP Tutorial For Beginners - S1 E9