Tokenization in Spacy: NLP Tutorial For Beginners - S1 E8
Vložit
- čas přidán 25. 07. 2024
- Word and sentence tokenization can be done easily using the spacy library in python. In this NLP tutorial, we will cover tokenization and a few related topics.
NLP platform: www.firstlanguage.in/
⭐️ Timestamps ⭐️
00:00 What is tokenization
02:35 Install spacy
02:49 Coding starts
03:23 Basic English word tokenization
14:15 Span object
15:00 Token attributes
18:40 Grab emails from the student information doc
23:58 Tokenization in Hindi
26:13 Customize tokenization rule
29:52 Sentence tokenization (or segmentation)
33:15 Exercise
Code: github.com/codebasics/nlp-tut...
Exercise: In the above code, go to the end and you will find exercises
Complete NLP Playlist: • NLP Tutorial Python
🔖Hashtags🔖
#nlp #nlptutorial #nlppython #spacytutorial #spacytutorialnlp #spacytutorialnlp #wordtokenization #tokenizerspacy #tokenizationnlp #wordtokenizerspacy #tokenizationandspacy #spacynlp
Do you want to learn technology from me? Check codebasics.io/?... for my affordable video courses.
Need help building software or data analytics and AI solutions? My company www.atliq.com/ can help. Click on the Contact button on that website.
🎥 Codebasics Hindi channel: / @codebasicshindi
#️⃣ Social Media #️⃣
🔗 Discord: / discord
📸 Instagram: / codebasicshub
🔊 Facebook: / codebasicshub
📱 Twitter: / codebasicshub
📝 Linkedin (Personal): / dhavalsays
📝 Linkedin (Codebasics): / codebasics
🔗 Patreon: www.patreon.com/codebasics?fa...
❗❗ DISCLAIMER: All opinions expressed in this video are of my own and not that of my employers'.
Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced
SPACY makes NPL implementation easy just like the way CODEBASICS making NLP learning easy.
Really enjoying this playlist, and I've reached the 8th tutorial already just in 1 day. Thank you for making it interesting!
Glad you liked it 👍☺️
in One day? Holy Moly , its my 3rd day..., How you mentally prepare for ?
Thanks! Your videos are really helpful. You are making great job of explaining complex topics. Thanks once again
Pretty much loved it all on a watching spree 8th lesson in 24hours :)
Thanks!
Very nice video ...explaining NLP !!!
Excellent Explanation👍
Simply Superb !!! Thanks a ton !!!
Excellent and insightful
I want to write this every time when I go through your CZcams videos (earlier Deep Leaning and now NLP)....
You are an outstanding educator. Your practice of illustrating complex concepts with pertinent use cases adds an engaging dimension to the learning experience.
Your proficiency in simplifying intricate ideas with clarity is truly remarkable. Your sense of timing in presenting crucial details is impeccable, and your suggested reading resources are exceptionally valuable.
Thank you for putting your efforts in creating such useful leaning material.
Ajay, thanks for the detailed feedback and I am glad these videos are helpful to you 👍🙏
You are the best Dhaval. I have seen many tutorials on different ML/DL/NLP topics but the way you teach is something different. It is very hands on and easy to understand. I really look forward to your videos. I recently did post graduate program in Data Science from Great Lakes but frankly, the teaching you provide is much better than some of the professors I had there. Keep it up!
Always very good👍
Thank you so much!!
Do you want to learn technology from me? codebasics.io is my website for video courses. First course going live in the last week of May, 2022
Thank you!
Good intro into NLP concepts, Dhawal. Btw, as someone who has worked on a large scale NLP projects here in Toronto, I can vouch that FirstLanguage NLP APIs are right up there with one of the biggest cloud service providers' speech SDK - and at a fraction of cost! And the co-founder is a PhD specializing in NLP herself.
Yup, indeed the co founder is quite knowledgeable and the platform is also very well built. I suggest people to try it out, it saves you a lot of money 💵
Great videos
You are exception the way you expose
Hi sir, can you please share your views about data analyst jobs in government bodies in india, the pros and cons of that.
Thank you for your compliment, I am one of your sensor student :D
Kindly make some videos on how to vectorize source code for training DL model
The referred book at page vi: "If you have never studied statistics, I think this book is a good place to start. And if you have taken a traditional statistics class, I hope this book will help
repair the damage." 😄
Sir is it possible to create voice recreation??
Please make video on it☺☺☺
I tried the first problem on the entire pdf using PyPDF2 library but I get some non-urls also picked up
i thats because, Dr. Strange has space inbetween. when its removed. the Dr.Strange is together in one sentence. Thanks for the videos!
Indeed it's a very good playlist on NLP, but can u please do some hands-on experience on audio files also. i mean if u can help me with the audio files instead of text as a data set,
sir how r u getting recommendation of syntax while u typing the function ?
Hello Dhaval, Do you have any tutorial on Spiking nueral network, or guide that could help.
By the i have following you awesome tutorials on Nueral networks, thanks a million
I dont have a video on that. But I can make a note of adding that one in the future. Thanks for your appreciation
Perfectttttttttttttttttttttttttttttttttt
I made it here.
Lets see how far i can go
Love from Pakistan 🇵🇰
My spacy is tokenizing words like #hello to # and hello, I want to prevent that. Is there something I can do?
In my code like_email is giving empty list
Sir I tried this code: python -m spacy download en_core_web_sm again and again but getting errors. Help
I am facing same problem. Did you find how to correct that issue?
Data Analytics by Abhay Deol
Ha ha.. you are probably the 5th person comparing me with Abhay Deol. Others have called me Arvind SA and also Satya nadella with hair 😂😂🤗🧐
Why spacy is tokenizing ice and cream separately in "I love ice cream" ?
Because they are seperate words
"hindi is the language of my country" , plz don't use this sentence next time. this information is miss leading
it is the national language 👍
@@ChildhoodSaverIndia does not has any national language it was 22 official languages and 2 administrative languages hindi and English. Hindi is not our national language
Excercise 1 Solution :
for token in doc:
if(token.like_url):
print(token)
Please help! I am not able to download the "en_core_web_sm"
I am facing the same problem. Did you find a way how to correct that issue?
answer of exercise question 2 is little wrong for cases like " i have 500 $ and the quantity of good people in the company is 10"
This is correct:
# Extract money
transactions = "Tony gave two $ to Peter, Bruce gave 500 € to Steve 10"
doc= nlp(transactions)
ans= []
count= len(doc)
for token in doc:
if token.i != count-1:
if token.like_num and doc[token.i + 1].is_currency:
ans.append(token.text + ' '+ doc[token.i + 1].text)
ans
Thanks!
Thanks Amanda for your generous contribution