Text Preprocessing | NLP Course Lecture 3

Sdílet
Vložit
  • čas přidán 15. 06. 2024
  • In this video, we'll break down the steps involved in getting text data ready for analysis. Think of it as cleaning and organizing text so that it's easier to understand and work with. This process helps us get valuable insights when we're dealing with large amounts of text information.
    Code used: www.kaggle.com/campusx/text-p...
    Assignment Links:
    api.themoviedb.org/3/movie/to...
    api.themoviedb.org/3/genre/mo...
    ============================
    Do you want to learn from me?
    Check my affordable mentorship program at : learnwith.campusx.in
    ============================
    📱 Grow with us:
    CampusX' LinkedIn: / campusx-official
    CampusX on Instagram for daily tips: / campusx.official
    My LinkedIn: / nitish-singh-03412789
    Discord: / discord
    E-mail us at support@campusx.in
    ✨ Hashtags✨
    #DataScience #TextPreprocessing #Stemming #Tokenization
    ⌚Time Stamps⌚
    00:00 - Intro
    1:01 - Introduction
    4:03 - Lowercasing
    7:53 - Remove HTML Tags
    12:44 - Remove URLs
    15:16 - Remove Punctuation
    23:29 - Chat word treatment
    26:20 - Spelling Correction
    28:11 - Removing Stop words
    31:25 - Handling Emojis
    34:11 - Tokenization
    49:18 - Stemming
    57:50 - Lemmatization
    1:01:33 - Assignment

Komentáře • 139

  • @harinair3002
    @harinair3002 Před 6 měsíci +36

    Anyone following this playlist, my recommendation to them is to please do the assignment, I was shocked at how little we learn by just watching, I did the assignment and what can I say, I was stuck a lot of times and at the end, I completed and now I regularly do Text Preprocessing by making my datasets from Rapid APIs, It gives one soo much flexibility to work on a dataset they created.

    • @surajnikam3327
      @surajnikam3327 Před 5 měsíci

      Mam can you explain me or refer some notes or videos on using API's and Create own Dataframe

    • @komalkumbhare4789
      @komalkumbhare4789 Před 4 měsíci +1

      Hey Hari! The assignment links given above are not directing to the tmdb website, and if I search of TMDB directly on google, it doesn't work as well. Can you tell me how you did that?

  • @sukantb1980
    @sukantb1980 Před 2 lety +20

    You are a rare gem , I can simply put that in clear short words❤️❤️

  • @GamerBoy-ii4jc
    @GamerBoy-ii4jc Před 2 lety +6

    Again Sir your are a great person on you tube.. your explanation in every domain and for every topic is great...i followed you ML playlist A-Z and now i start watching NLP.. i hope you will complete your ML series soon and this too and also making great series for us with new and needed emerging thigs ...Thanks Alot Sir!

  • @shikhasoni9346
    @shikhasoni9346 Před 2 lety +5

    your lectures really help me to understand NLP Text Preprocessing , Thank you so much!

  • @Riya-zb1iz
    @Riya-zb1iz Před rokem +2

    This series is amazing!

  • @siddharth4251
    @siddharth4251 Před 10 měsíci +1

    Thank a lot Nitish ....i dont have enough words to express my gratitude.

  • @sarithajaligama9548
    @sarithajaligama9548 Před 2 měsíci

    Very good explanation. your explaining every single details. it's very helpful for beginners. and assignements also very intresting.
    i feel like why im not found your channel before but lucky to have right now

  • @sachi-4750
    @sachi-4750 Před 2 lety +1

    You are really a great teacher, thank you so much for coming up with such informative videos, Thanks a lot

  • @prashantlakde
    @prashantlakde Před 2 lety +1

    Ur way of explaination shows ur concept clearity and ur efforts to prepare this topic...keep it up.

  • @abhishekpathak9654
    @abhishekpathak9654 Před 11 měsíci +3

    Your videos are full of knowledge. Thanks a lot for this 🙏 you deserve more subscribers... it can attract more viewers if you divide your videos into smaller parts. People generally don't want to engage with long lectures.

  • @siddharthbhardwaj7664
    @siddharthbhardwaj7664 Před 2 lety +4

    Hi, Could you please make the next video on the same IMDB data set and show us how to analyze the linguistic features of the training dataset? I have recently gone through your previous NLP (Movie Review Sentiment Analysis) videos. However, I was quite interested in finding out how can we analyze the linguistic features and what all different algorithms can we apply apart from the Naive Bayes on the same IMDB dataset. PS - your videos are amazing!!! the way you teach the concepts has helped me to understand the basics of NLP. Thank you so much!!

  • @miteshkumar7739
    @miteshkumar7739 Před 2 lety

    Your lecture are really helpful...all consept are very clear

  • @raj-nq8ke
    @raj-nq8ke Před 2 lety

    Gold contents. Thanks for the video

  • @NishantKumar-dw5er
    @NishantKumar-dw5er Před 11 měsíci

    very detailed explanation. Kudos to you.

  • @mohaiminrahat4974
    @mohaiminrahat4974 Před 2 lety +2

    Sir you are a lifesaver.Thankyouuuuuu

  • @raj4624
    @raj4624 Před 2 lety

    so far so good.....awesome x 100

  • @samt5682
    @samt5682 Před 2 lety

    Literally, All In One !

  • @rajeevranjan5007
    @rajeevranjan5007 Před 2 lety +1

    Nice assignment Sir. Thankyou

  • @saurabhdeshmane8714
    @saurabhdeshmane8714 Před rokem +5

    sir could you please share notebook, it is not available on given link

  • @manishachaurasia3405
    @manishachaurasia3405 Před 10 měsíci +1

    Series is amazing sir 👏 kindly provide the regex lecture in the description

  • @shipradhiman08
    @shipradhiman08 Před 2 lety +1

    Awesome lecture 🤗🤗🤗❤️❤️❤️❤️

  • @rafibasha4145
    @rafibasha4145 Před 2 lety +3

    please tag notbook in description,also please complete NLP playlist

  • @NaryVip
    @NaryVip Před 2 lety +2

    You didn't link the video for regular expression in description, can u update it

  • @pankajbeldar9799
    @pankajbeldar9799 Před rokem +1

    You are God for me in learning data science

  • @pralaymondal3324
    @pralaymondal3324 Před 2 lety +4

    Thank you, you are just awesome. Much waited for this video. You explain things better than other youtubers. Keep it up...!!!

  • @stunninghealer7442
    @stunninghealer7442 Před 3 měsíci

    You are the best sir😊.

  • @deepankarmullick3121
    @deepankarmullick3121 Před 2 lety

    Amazing video but from where can i download the notebooks.
    I would also request you to share the notebook url's in the video description.

  • @cipher4811
    @cipher4811 Před 2 lety +1

    Sir I have been following you for long time and glad that I found your channel and learning so much from you and for that I am greatful and thank you from bottom of my heart.
    Till now I was working with Google colab but as I am moving towards deep learning now I think it's time for me to buy high end laptop..
    But I am at a loss which one should I pic if I go for rtx 3080 then the price is way to much for me ... Having this confusion for past few weeks can you please please please suggest me a laptop for ml&Al&dl learning projects and my budget is 1400-1500$
    I will be greatful .
    Or you may make a video on this topic

  • @BTStechnicalchannel
    @BTStechnicalchannel Před rokem +2

    Thanks! for the great content!! One small suggestion can you also give us sometime to write code you are explaining otherwise it becomes theoritical.

  • @jandaabdulla9335
    @jandaabdulla9335 Před 2 lety

    Congo sir for third video🥳🥳

  • @kislaykrishna5599
    @kislaykrishna5599 Před 2 lety

    great content

  • @MRBAM
    @MRBAM Před rokem +1

    Its helpful for me ❤️

  • @tanmayshinde7853
    @tanmayshinde7853 Před 2 lety +1

    Does anyone know how to apply word/sentence tokenizer on columns? if you know please reply.

  • @gautampatadiya6096
    @gautampatadiya6096 Před 4 měsíci +1

    Thanks!

  • @jasonbourn29
    @jasonbourn29 Před 8 měsíci

    I checked both methods (removing punctuation)but they are similar in speed sometimes the second one is slower why is it so

  • @abdulqadar9580
    @abdulqadar9580 Před rokem

    You are Amazing Sir Love from Pakistan.

  • @Akashphs7217
    @Akashphs7217 Před 27 dny

    Hi Sir. Regarding the assignment, how can we meagre genre id and genre type with movies data-frame?
    I got stuck there.

  • @shaiksalavuddin5976
    @shaiksalavuddin5976 Před 2 lety

    Sir thank you so much😊

  • @rishabhvarshney2234
    @rishabhvarshney2234 Před 2 lety +1

    Can we get the pdf of code that you have written in ths vedio

  • @pradumankumar7607
    @pradumankumar7607 Před 2 lety

    sir can you please share the link of "chatword" used in chatword treatment

  • @trackbackresearch
    @trackbackresearch Před 2 lety

    Thankyou Sir .

  • @satyamtiwari7680
    @satyamtiwari7680 Před rokem +1

    Easy way to remove punctuations.
    import string
    import re
    def remove_punctuation(text):
    # Define the set of punctuation characters
    punctuations = string.punctuation
    # Remove punctuation using regular expressions
    text_no_punct = re.sub('[' + re.escape(punctuations) + ']', '', text)
    return text_no_punct

  • @anshuman_madhav
    @anshuman_madhav Před 2 lety +3

    While using the lowercase conversion function shown at 7:23 , I am getting below warning,even though conversion is successful. Can you let me know if any other way is there to do conversion or we can ignore the warning?
    A value is trying to be set on a copy of a slice from a DataFrame.
    Try using .loc[row_indexer,col_indexer] = value instead

  • @youtubekumar8590
    @youtubekumar8590 Před rokem

    Thanku Bhaiya

  • @unknown-ho4wk
    @unknown-ho4wk Před 5 měsíci

    that was awsome tutorial can you pls link to your Regular expression video ?

  • @dilipkumarbk7657
    @dilipkumarbk7657 Před rokem

    The way of teaching is cool loved it.
    One doubt 12:00 in remove_html_tags() it only removes the tags but in real time when we scrap data from a website it contains tags like style, script etc which aren't required in the text mining or NLP process.
    Just wanted to know is there any other better approach or method that could solve this thing.
    Thanks in advance for everyone who tries to solve this.

  • @ahmedullahkhan9166
    @ahmedullahkhan9166 Před 9 měsíci

    where is the notebook link?
    the above link only showing csv file.

  • @SurajitDas-gk1uv
    @SurajitDas-gk1uv Před 5 měsíci

    Thank you

  • @pankajnaik1574
    @pankajnaik1574 Před 10 měsíci

    You are the best

  • @kumarabhishek1064
    @kumarabhishek1064 Před 2 lety +1

    where is the template notebook?

  • @bhushanbowlekar4539
    @bhushanbowlekar4539 Před rokem

    Sir at timestamp 3.30 you said you will provide notebook , can you please provide that , Thank you

  • @shrutianand285
    @shrutianand285 Před 2 lety

    How to use textblob for a large dataset?

  • @gauravverma3700
    @gauravverma3700 Před 2 lety

    Awesome

  • @zkhan2023
    @zkhan2023 Před 2 lety

    Thanks sir

  • @ShivaniSharma-tk4bl
    @ShivaniSharma-tk4bl Před 11 měsíci

    @campusX I cant find the codes. can you plz plz give the link?

  • @bhanuprakash5060
    @bhanuprakash5060 Před rokem

    where is notebbok of this lecture?? could u please just upload the notebook

  • @snrmedia8965
    @snrmedia8965 Před 2 lety

    Nice video👍

  • @manucmgowda
    @manucmgowda Před rokem

    Sir the notebook link is dysfunctional .....pls upload the notebook discussed in the video

  • @furry2fun
    @furry2fun Před 10 měsíci

    can anyone send the link to the notebook, the given link does not work

  • @riiyyyaaaa
    @riiyyyaaaa Před 3 měsíci

    Hi Sir, Can you please re add the data links here as unable to load it.

  • @allwithinone1345
    @allwithinone1345 Před 2 lety

    thank sir

  • @ajitkulkarni1702
    @ajitkulkarni1702 Před 8 měsíci

    Hello Sir, can you reshare code, the link you shared has no code....Thanks !

  • @tusarmundhra5560
    @tusarmundhra5560 Před 7 měsíci

    awesome

  • @piyushpathak7311
    @piyushpathak7311 Před 2 lety +1

    Sir when you will start series on Deep learning..

  • @shyamtyagi95
    @shyamtyagi95 Před 2 lety

    Nice video

  • @anitabhandari3886
    @anitabhandari3886 Před 2 měsíci

    @campusX : can you please suggest how can we use text for regression (for eg. use comments to predict number of subscribers)

  • @sachin2725
    @sachin2725 Před rokem

    please tag notebook used in this video in description,

  • @ashishsom3849
    @ashishsom3849 Před 3 dny

    I am not able to find the notebook of the code.
    Could anyone please help?

  • @swet_gokugod9382
    @swet_gokugod9382 Před 7 měsíci

    Great

  • @siddharthkarale3100
    @siddharthkarale3100 Před 2 měsíci

    Getting problem while doing assignment as I have no idea how to get data into a dataframe using api.

  • @mridang2064
    @mridang2064 Před rokem +3

    Dhanyavaad. Can you also start a series on web development ?
    You're just an excellent teacher

    • @Codingon_lup
      @Codingon_lup Před rokem

      hey

    • @Codingon_lup
      @Codingon_lup Před rokem

      are you working in NLP or other in python?
      i need your help
      can you help me?

  • @bhushanbowlekar4539
    @bhushanbowlekar4539 Před rokem +1

    can you please share the colab file

  • @surajnikam3327
    @surajnikam3327 Před 5 měsíci

    Can Anyone explain me how to create dataframe for assignment using thia API . PLEASE!🙏

  • @adityasoni1639
    @adityasoni1639 Před 2 lety

    the notebook/code is not available .!!!

  • @user-sk6hn9jm3f
    @user-sk6hn9jm3f Před rokem

    how to make this dataset ?

  • @waqaralam7519
    @waqaralam7519 Před rokem

    sir code page nai mil raha hai kaggle me ,can any one help?

  • @ritakathrotiya
    @ritakathrotiya Před 2 měsíci

    In the assignment, Can anyone have the solution on how to change genres ID to it's Name ?

  • @dipeshsilwal8098
    @dipeshsilwal8098 Před 2 lety

    Hello sir your code is unavailable please make it available.

  • @SLADE-VA
    @SLADE-VA Před 4 měsíci

    Couldn't find the Notebook link!

  • @maheshbhatt1505
    @maheshbhatt1505 Před 10 měsíci

    please someone help me with converting that chat words file into dictionary

  • @KumR
    @KumR Před 3 měsíci

    Done.

  • @anupprasad695
    @anupprasad695 Před 2 lety +1

    One suggestion: sir, ek udemy course banaiye.... Data science bootcamp...

  • @piyushpawar75
    @piyushpawar75 Před 6 měsíci

    I got an error by using spacy library which is OSError

  • @faizahmed8015
    @faizahmed8015 Před 9 měsíci

    56:30 with 'e' probable hai...
    I understand but it was confusing me.
    And Thank you Sir such a good video ❤

  • @anshumanmahabhoi5771
    @anshumanmahabhoi5771 Před 5 měsíci

    where is the notebook ?

  • @rahulrajbhar7012
    @rahulrajbhar7012 Před 2 lety

    How to explain a data science project in interview for fresher please make it one video.

  • @anooshkaa
    @anooshkaa Před 3 měsíci

    notebook ka koi saved version nahi dikhara hai.

  • @imamasafeer4536
    @imamasafeer4536 Před 4 měsíci

    Where is the video on Regular Expressions?

  • @vijayraghuwanshi4486
    @vijayraghuwanshi4486 Před 10 měsíci

    anyone tried the assignment? if please reply I have few doubts

  • @yashjain6372
    @yashjain6372 Před rokem

    best

  • @potjason2132
    @potjason2132 Před 3 měsíci

    actually tokenization doesn't work in dataset. can u write code to tokenize only the reviews in ur dataset

  • @usmanyousaaf
    @usmanyousaaf Před rokem

    sir note book link ?

  • @bibasrai752
    @bibasrai752 Před rokem +1

    do you have videos on Nlp with deep learning ?

  • @tapanpati9452
    @tapanpati9452 Před 11 měsíci

    can any one share the notebook ?

  • @mdaliarmaghan8292
    @mdaliarmaghan8292 Před 22 dny

    Can you please provide solution for this assignment

  • @freshersadda8176
    @freshersadda8176 Před 2 lety

    ❤️

  • @MRBAM
    @MRBAM Před rokem +1

    👍

  • @AshishSharma-tf3fy
    @AshishSharma-tf3fy Před 18 dny

    sir TMDB website is blocked in india

  • @tanveer9348
    @tanveer9348 Před 2 lety +1

    how can i convert the chat txt data to a python dictionary?

    • @rupakjha539
      @rupakjha539 Před rokem

      mila kya iska solution?

    • @rupakjha539
      @rupakjha539 Před rokem

      text = '''AFAIK=As Far As I Know
      AFK=Away From Keyboard
      ASAP=As Soon As Possible
      ATK=At The Keyboard
      ATM=At The Moment
      A3=Anytime, Anywhere, Anyplace
      BAK=Back At Keyboard'''
      dictionary = {}
      # Split the text by new line and iterate over each line
      for line in text.split('
      '):
      # Split the line by the equal sign to get key and value
      key, value = line.split('=')
      # Add the key-value pair to the dictionary
      dictionary[key] = value
      print(dictionary)

  • @rupakjha539
    @rupakjha539 Před rokem

    Bhaiya how you converted chat text data to python dictionary?

    • @rupakjha539
      @rupakjha539 Před rokem

      text = '''AFAIK=As Far As I Know
      AFK=Away From Keyboard
      ASAP=As Soon As Possible
      ATK=At The Keyboard
      ATM=At The Moment
      A3=Anytime, Anywhere, Anyplace
      BAK=Back At Keyboard'''
      dictionary = {}
      # Split the text by new line and iterate over each line
      for line in text.split('
      '):
      # Split the line by the equal sign to get key and value
      key, value = line.split('=')
      # Add the key-value pair to the dictionary
      dictionary[key] = value
      print(dictionary)