Tutorial 3-End To End ML Project With Deployment-Project Problem Statement,EDA And Model Training

Sdílet
Vložit
  • čas přidán 13. 09. 2024

Komentáře • 219

  • @krishnaik06
    @krishnaik06  Před rokem +11

    Join this channel membership to get access to materials and connect with me:
    czcams.com/channels/NU_lfiiWBdtULKOw6X0Dig.htmljoin

    • @Pravin33unique95
      @Pravin33unique95 Před rokem +1

      I sir i have got problem as how to extract data into vs code CSV student daya

    • @SylvanusNusetorAtiku
      @SylvanusNusetorAtiku Před 3 měsíci

      @krish naik could youplease help with step by step how to commit to large files of the project to github?

  • @arias2832
    @arias2832 Před rokem +13

    I was lost after finish an ibm course in data science. Nobody give me a job because I don´t have experience. I think that with your videos i will get it. Thanks for your excelent work, really has helped me a lot. Greetings from Colombia!

  • @javeedtech
    @javeedtech Před rokem +8

    After this series I regained the interest in machine learning, thanks for timely series..👍

  • @gourabguha3167
    @gourabguha3167 Před rokem +8

    Thanks a lot Sir for extending the series..with ci/cd pipeline and mlops..Very much looking forward to it.

  • @beingbawe
    @beingbawe Před 2 měsíci +1

    i enjoy ur style of teaching. Thank you for all your hard work

  • @ramkisundararaman3711
    @ramkisundararaman3711 Před 7 měsíci +1

    Krish - You are awesome as always. I was out of the market for more than a year and now getting back to Data Science and your videos are helping me refresh my skills. Thanks

  • @jaysamirhegshetye5660
    @jaysamirhegshetye5660 Před 6 měsíci +1

    I really wanted to do a ML project where I can utilize all ML algorithms on single dataset. I find this playlist best for my project. Thank you a lot krish sir for making this informative and instructive tutorial !!!

  • @elnazfathi
    @elnazfathi Před 3 měsíci

    You are doing an amazing job presenting how to get this done in industrial environment. Thanks for your effort!

  • @talibdaryabi9434
    @talibdaryabi9434 Před rokem +20

    I wonder why most of those who watch it learn something well but don't press the like button. I think this is the least you can do it. So please support those who try to teach the world-class method for free.

  • @bilelkhelifi897
    @bilelkhelifi897 Před 7 měsíci +1

    Best Playlist and channel i ve ever seen, THANK YOU SO MUCH KRISH

  • @insaneclutchesyt948
    @insaneclutchesyt948 Před rokem +1

    thank you very much you are the very few of those ppl who shows errors , it helps a lot , keep going!!

  • @rajeshvenaganti6797
    @rajeshvenaganti6797 Před rokem +2

    I have gone through your Python & Ml Playlist and it was a great learning experience, Thanks for this End to End Ml Project playlist, and thanks for your note that you will extend this deep learning & NLP, I am eagerly waiting for your session for deep learning & NLP implementation in this playlist,

  • @ashishsharma214
    @ashishsharma214 Před rokem +2

    Thank you for this amazing playlist Krish!
    God bless you.

  • @sudhirmalik100
    @sudhirmalik100 Před 8 měsíci +3

    I think there is one improvement required here which is we should split data first then do fit_transform on training data and then transform data on test set.

  • @opman5657
    @opman5657 Před rokem +1

    very well explained and good job for learners like me. Thanks. Gob bless you

  • @Pravin33unique95
    @Pravin33unique95 Před rokem +4

    How you create notebook folder and in that EDA and model training I don't understand

  • @khirooo
    @khirooo Před 11 měsíci +5

    Hi Sensei, I am following your projects and every detail, and I am very thankful for your valuable content. but I think I found a code mistake at EDA part.
    with this new code :
    works well .... regards

    • @yashpisat9267
      @yashpisat9267 Před 9 měsíci

      Thankyou for the solution, I am facing a similar issue with "df.groupby('parental_level_of_education').agg('mean').plot(kind='barh',figsize=(10,10))" at this statement

    • @HimanshuBisht94
      @HimanshuBisht94 Před 9 měsíci

      @@yashpisat9267 Here is the solution.
      df.groupby('parental_level_of_education')[['math_score','reading_score','writing_score','average']].agg('mean').plot(kind='barh',figsize=(10,10))

    • @shivankvishwakarma2994
      @shivankvishwakarma2994 Před 9 měsíci

      Thanks man!!

    • @user-dc5ow6ip3j
      @user-dc5ow6ip3j Před 8 měsíci

      how u solved that error
      please tell emmediately @@shivankvishwakarma2994

    • @KAKAROT808
      @KAKAROT808 Před 20 dny

      ​@@yashpisat9267Hey I have questions....can you explain this specific code please

  • @mayankamble2588
    @mayankamble2588 Před 4 měsíci

    Amazing Playlist. Learning alot. One point on transformation: Standard scalar should be applied separately on training and then use that scaler to transform the test data. This way there wont be any data leakage in the testing set.

  • @sudeepmathur9654
    @sudeepmathur9654 Před rokem +1

    Excellent video . I retired recently & just thought to keep myself engaged by learning new things , saw your video & found it very useful. Keep it up & best wishes for all the hard work you are doing in spreading knowledge. -- Sudeep Mathur

  • @ransinghray3688
    @ransinghray3688 Před 7 měsíci

    Krish you are really redefining the tech educational system, you are awesome!!

  • @mahdiaspanani8004
    @mahdiaspanani8004 Před 3 měsíci

    I love the parts that you have an error, but you don't stop recording. and this teaches us every person might have these problems.

  • @tatakae6666
    @tatakae6666 Před 11 měsíci +1

    This series is absolute gold

  • @apurvtewari3779
    @apurvtewari3779 Před 6 měsíci

    What this entire project be described as to mention in my resume as a fresher.
    Help will be appreciated all. :)

  • @harishs-dm8mm
    @harishs-dm8mm Před rokem +3

    Hii Krish thanks for extending the project. Please include Data and model versioning and mlops practices

  • @utkarshapadhye7656
    @utkarshapadhye7656 Před 11 měsíci +3

    I am getting packages not found in vcode , though I followed all the steps from start and also in my global env all packages are installed.

  • @sabinadhikari2643
    @sabinadhikari2643 Před rokem +3

    For the sns.countplot() function we have to pass the value for x. i.e sns.countplot(x=data) will work. otherwise sns.coutplot(data) will give an error.

  • @sajjaduddin8188
    @sajjaduddin8188 Před 6 dny

    Thank you sir

  • @list10001
    @list10001 Před 7 měsíci

    Thank you for the valuable tutorials.

  • @kumaronlineplay
    @kumaronlineplay Před rokem

    Excellent Video.. Thanks for sharing it.

  • @karanbais1843
    @karanbais1843 Před 5 měsíci

    loving the playlist sir thank you for it

  • @aslanali9977
    @aslanali9977 Před rokem +2

    Thanks!

  • @mahikhan5716
    @mahikhan5716 Před rokem

    Nothing be better than . He just poured everything as data science needs . Owe to him

  • @mohitpansari6603
    @mohitpansari6603 Před měsícem +1

    There is one issue - we have made here Standard Scaling using whole X rather we should have done that using only X_train -> fit_transform and X_test -> transform

  • @shahilshrestha3700
    @shahilshrestha3700 Před rokem

    thank you so much sir for your video. I am learning a lot of new concept in better explaination all because of you

  • @maximilianlossl226
    @maximilianlossl226 Před rokem +2

    Sorry, but I really don't understand how to get the data and jupyter files into my Visual Studio Code, can you help me?

  • @nikhildoye9671
    @nikhildoye9671 Před 9 měsíci +2

    Shouldn't we split the data first and then apply transformation? Won't this lead to data leakage?

  • @suvarnapawar3186
    @suvarnapawar3186 Před rokem

    very nice explaination sir...thank u very much

  • @pandalanhukuk804
    @pandalanhukuk804 Před 2 měsíci

    Thanks.

  • @AkilaDS-kz6yv
    @AkilaDS-kz6yv Před rokem +2

    How can we export data and eda codings could you please explain that part?

  • @rahulsharma5693
    @rahulsharma5693 Před rokem +5

    Hi Krish, when I try to do from src.logger import logging, it gives error no module named src, but if i do from logger import logging then it works? any idea???

    • @karishmamehar4081
      @karishmamehar4081 Před rokem

      bcz both .py files are present in same module so we can directly import it, if exception is present outside of src then src.logger will work

    • @rahulsharma5693
      @rahulsharma5693 Před rokem +1

      @@karishmamehar4081 yep i was able to understand that but why did it work for krish in the video and me getting error

    • @avbendre
      @avbendre Před rokem

      @@rahulsharma5693 yes same doubt i think it has something to do with magic

  • @sparshjain7542
    @sparshjain7542 Před rokem

    you are the why I love machine learning

  • @royalchallengersbangalore535

    your teaching is like ❤❤

  • @DhaneshRamesh-p9b
    @DhaneshRamesh-p9b Před rokem +3

    Hey Krish i have all my libraries such as numpy but when i try to run it through the ipy kernal it shows numpy not found

    • @prianshmadan
      @prianshmadan Před rokem

      Bro, were you able to resolve this?

    • @riachoudhari7297
      @riachoudhari7297 Před rokem

      @@prianshmadan Kindly let me know the solution for the same please

    • @shitikanthabagh9859
      @shitikanthabagh9859 Před 5 měsíci

      Install the ipykernel again, it would may be upgrade to the latest python package available not 3.8 used in this video conda install -p environment path ipykernel --update-deps --force-reinstall and then in the interpreter selecte the correct jupyter kernel, it should work

  • @AnimeAficionado28
    @AnimeAficionado28 Před rokem

    we thankful for your wonderful knowledge on ML and i have a wish if could make a Deep Learning project playlist from scratch it would be very grateful of you.

  • @manar4944
    @manar4944 Před 3 měsíci

    That's super great work, it really helped me.
    But i'm surprised that you've done columns transformation "Standard Scaler" before splitting the train/test sets, most articles said it will result in data leakage, can you please elaborate

  • @nanditagautam6310
    @nanditagautam6310 Před rokem

    Great ! Thanks Krish

  • @matindram
    @matindram Před rokem

    Thank you for the series

  • @asieharati
    @asieharati Před rokem +1

    You should't do the preprocessing on X. You should fit to X_train and fit_transform on X_test

  • @robinchriqui2407
    @robinchriqui2407 Před rokem

    Thank you, Krish. It refreshed a lot of information and skills I'm looking forward to seeing the automation and deployment part of it. Will you integrate the ML Ops part in the future?

  • @nirbhaysedha8541
    @nirbhaysedha8541 Před rokem

    thanks sir it will help us a lot🙏

  • @sagarthacker5114
    @sagarthacker5114 Před rokem +3

    Hello Krish, I was hoping to ask for your opinion on a particular aspect of data preprocessing. Shouldn't we perform data splitting first to prevent data leakage, as standard scaling considers the mean and variance of the entire dataset? This may include the test set, leading to potential data leakage. Would you kindly share your thoughts on this topic? Thank you very much.

    • @krishnaik06
      @krishnaik06  Před rokem +1

      Yes i will take care of it while writing in a modular way...

  • @shalakam1617
    @shalakam1617 Před rokem

    Thanks for this series

  • @Laizin
    @Laizin Před rokem +4

    anyone getting an error related to installing catboost??
    or is it just me?

    • @lionsinescanor405
      @lionsinescanor405 Před rokem +1

      same here. Did u find the solution?

    • @mhapich
      @mhapich Před rokem

      Me too - Failed to build catboost
      ERROR: Could not build wheels for catboost, which is required to install pyproject.toml-based projects

  • @Mery._.11111
    @Mery._.11111 Před rokem

    Thank you so much !

  • @abhijeetjain8228
    @abhijeetjain8228 Před rokem

    Thanks a lot Sir

  • @rupindersingh1312
    @rupindersingh1312 Před rokem +1

    thanks for this video
    10-07-2023

  • @sibaprasadnaikbehera3442

    Thank you for problem statement sir we are eagerly waiting for that

  • @sanket_a2033
    @sanket_a2033 Před rokem

    Thank You Sir..

  • @yashsoni1153
    @yashsoni1153 Před rokem +2

    Sir I am not able to open jupyter notebook in vs code I thing there is error in file
    Pls help me to resolve this...

  • @pankajkumarbarman765
    @pankajkumarbarman765 Před rokem

    Thank You So much sir 💗

  • @shalakam1617
    @shalakam1617 Před rokem

    Thank You for series

  • @sharifdeenashshak4496

    looking forward for deep learning end to end series

  • @SaranyaDass-l9h
    @SaranyaDass-l9h Před rokem +1

    Hi, How are we importing the data of the csv file into VS ?

  • @brightlyricsmusic
    @brightlyricsmusic Před rokem

    i am getting error while reading dataset using relative path can someone help

  • @user-mn1kz9gx4f
    @user-mn1kz9gx4f Před 9 měsíci +1

    how I can add notebook folder..? you did'nt tell about notebook and csv.

    • @AmbarGharat
      @AmbarGharat Před 5 měsíci

      add it from vs code

    • @KAKAROT808
      @KAKAROT808 Před 20 dny +1

      Or Direct Go the file and add in ml projects folder notebook and then go to note book folder and add data folder and paste the file and vs code will analyse this and you successfully add the file

  • @rahulnakka87
    @rahulnakka87 Před měsícem

    # Remove duplicates
    df_no_duplicates = df.drop_duplicates()
    # Keep the last occurrence of each duplicate
    df_no_duplicates = df.drop_duplicates(keep='last')

  • @aravinda1595
    @aravinda1595 Před 5 měsíci

    Sir amazing video btw
    Boys have performed really well in MATHSSS

  • @rohitbharti2882
    @rohitbharti2882 Před rokem

    So much happy sir ❤❤❤

  • @gauravmishra7591
    @gauravmishra7591 Před rokem +2

    Please make a Pyspark end to end project like a real world

  • @amazingplaytv2661
    @amazingplaytv2661 Před rokem +2

    where do I get the data files. I mean the contents of the notebook folder?? I am coding along the series

  • @abubakarsaddiq4098
    @abubakarsaddiq4098 Před 4 měsíci

    stuning

  • @surajpandey-er4nt
    @surajpandey-er4nt Před rokem +1

    Hi Krish, just a small doubt I am facing an issue with installing everything through requirements.txt instead I had to install everything separately. what could be the issue here?

  • @shahirajlakade7921
    @shahirajlakade7921 Před rokem +1

    sir any upcoming data analyst batch missed 50% off offer😔

  • @laxmanteja
    @laxmanteja Před rokem

    I'm very interested Krish about your teaching techniques and in this end-to-end project, can I expect automation of the project with code

  • @shruthakeerthipurushothkum2724

    hi @krishnaik06 , can you kindly show how did you work with jupyter in vscode , i mean did you do the eda in jupyter notebook and then converted to vscode ..

  • @javeedtech
    @javeedtech Před rokem

    Thanks again

  • @sh__--
    @sh__-- Před rokem

    Thanks 😊

  • @mayankporwal4858
    @mayankporwal4858 Před rokem +1

    Here we did not discuss about catboost_info file that is present, why is there and what is it's use??
    please explain Krish sir.

    • @AmbarGharat
      @AmbarGharat Před 5 měsíci

      It will automatically come once you install catboost and IDK why.

  • @ahmedullahkhan9166
    @ahmedullahkhan9166 Před 11 měsíci

    this line of code giving error
    gender_group = df.groupby('gender').mean()
    gender_group

    • @areejahmad3998
      @areejahmad3998 Před 10 měsíci

      Did you find any solution

    • @ahmedullahkhan9166
      @ahmedullahkhan9166 Před 10 měsíci

      @@areejahmad3998 not yet.

    • @ravularamakrishna9962
      @ravularamakrishna9962 Před 10 měsíci

      gender_group = df.set_index('gender').groupby(level=0)[['math_score','reading_score','writing_score','average']].agg('mean')
      replace it with this it will work and credits goes to @khireddinemammar565

  • @video89652
    @video89652 Před rokem

    Hello Krish, is there any projects that solved Direction of Arrival Problem in Audio Signal Processing. Can you do a tutorial on it

  • @pacito6709
    @pacito6709 Před rokem

    can any one explain to me why I am unable to get venv (python)in the kernel 8:07

  • @ShivamPatel-yg3kd
    @ShivamPatel-yg3kd Před rokem +1

    Hi Sir, I have a doubt, as you have created "total marks" and "average marks" as two separate independent features, and you are doing EDA for both the features, suggested to create 2 separate models for each of them as well. But my doubt is why do we need to do the same things for both of them separately as average marks is directly correlated with total marks(total marks/3). Am I missing something? Please clarify. Love your videos 😊

    • @uditthakkar8130
      @uditthakkar8130 Před rokem +1

      Hey Shivam,
      It kind of depends on the problem that we are trying to solve. Suppose, what we are doing here is all self learning but there must be a target decided by the stakeholders/clients.
      If you are trying to predict a student's eligibility for a scholarship, the total marks might be more important than the average marks since scholarships may be based on total marks.
      If the model indicates that the total marks are a strong predictor of a student's performance, it may be harder to understand how much the average marks contributed to that prediction if they are not considered separately.
      Also, elimination of noise and variability in the data is a factor here!

  • @PavanReddy-xl1uu
    @PavanReddy-xl1uu Před rokem +2

    Hi everyone,
    I'm getting the below error when I'm trying to run "exception.py" file.
    (c:\Users\pavva\OneDrive\Documents\AI Project\venv) C:\Users\pavva\OneDrive\Documents\AI Project>python src/exception.py
    Traceback (most recent call last):
    File "src/exception.py", line 2, in
    from src.logger import logging
    ModuleNotFoundError: No module named 'src'
    I did import this line "from src.logger import logging" in exception.py.
    All the files name are correct and it's in proper order.
    Can someone help me?
    Thank you.

    • @ramin.nourizade
      @ramin.nourizade Před rokem

      Hi, remove src/ from import

    • @PavanKumar-ut2lo
      @PavanKumar-ut2lo Před rokem

      @@ramin.nourizade if i only use "import logging" then it's not updating in the logging file.

    • @Pravin33unique95
      @Pravin33unique95 Před rokem

      I also getting same problem syntax invalide at line 9

    • @imadsyed6417
      @imadsyed6417 Před rokem

      refresh your vscode

    • @saikrishna887
      @saikrishna887 Před rokem +2

      import sys
      sys.path.append(os.path.abspath('C:/Users/xxxx/MLProject/src'))
      # Now you can import 'logging' from 'logger' module
      from logger import logging
      Try adding the code,This will resolve your issue.

  • @hmtbt4122
    @hmtbt4122 Před rokem

    thanks

  • @abhijeetrokade2349
    @abhijeetrokade2349 Před rokem

    Which kinds of project need to choose when we preparing for interview?

  • @MartinBurleston
    @MartinBurleston Před měsícem

    how did you get the dataset and import it on vscode
    i dont understand

  • @nishantverma2966
    @nishantverma2966 Před rokem +1

    How we can export the EDA, model training and data file to Visual Studio

  • @user-jf7je4yv3b
    @user-jf7je4yv3b Před rokem +1

    error occured
    -----------------------------------
    from datetime import datetime
    ModuleNotFoundError: No module named 'datetime'

    • @mhapich
      @mhapich Před rokem

      Same here - and in trying to fix it I've created more errors 😵‍💫

  • @mrityunjayupadhyay7332

    great

  • @swL1941
    @swL1941 Před rokem

    @Krish Naik
    Hello Sir, why didn't you use Cross Validation instead of Train-Test-Split ?

    • @subrataassam
      @subrataassam Před rokem

      Could have used CV or might be any other data splitting techniques but I guess the main aim of this tutorial was to build a framework for an ML project. Can improvise later as per new ideas or in-depth explorations.

  • @ncheymbamalu4013
    @ncheymbamalu4013 Před rokem

    Adjusted R² instead of R² for the evaluation metric.

  • @swL1941
    @swL1941 Před rokem

    If Boosting and Bagging methods are very powerful then why a simple Ridge Reg has more R2 score ??

  • @ravulapallivenkatagurnadha9605

    Neeed more videos like this

  • @ridj41
    @ridj41 Před rokem

    I have become mad after again and again setting up the environment,since even after installing all of the libraries but running the code it says that library is not there.

  • @varunraj5543
    @varunraj5543 Před rokem

    name 'cat_feature' is not defined . I'm getting this error , please help

  • @r.ranjankumar2106
    @r.ranjankumar2106 Před rokem

    Im facing Module not found error even after creating the enviornment. Any help me how to fix this

  • @Pravin33unique95
    @Pravin33unique95 Před rokem

    I have doubt while running code I face syntax error in exception handling

  • @krislai7453
    @krislai7453 Před rokem +1

    why is that when i tried to import library it says "no module named 'numpy'

  • @Svilco
    @Svilco Před rokem

    What is the correct sci-kit learn version for 3.8

  • @KunalSingh-fy1bk
    @KunalSingh-fy1bk Před 6 měsíci

    bro your eda notebook have lots of errors and , i tried solving them ( i am new to data science) and i did resolve most of them, still bivarite analysis aint working , please update and fix the issue, its hard to understand it with errors , coz we cant see the output.
    @ Krish Naik