Harry's Data Journey
Harry's Data Journey
  • 3
  • 9 016
End-to-End ML/Data Science Project (with XGBoost) | Car Insurance Claims Prediction
My Code: github.com/harryallum/Data-Science-Projects/tree/main/Car%20Insurance%20Claim%20Prediction
Dataset: www.kaggle.com/datasets/xiaomengsun/car-insurance-claim-data
📖 Relevant Articles 📖
Stratified Sampling: medium.com/analytics-vidhya/stratified-sampling-in-machine-learning-f5112b5b9cfe
KNN Imputation: medium.com/@kyawsawhtoon/a-guide-to-knn-imputation-95e2dc496e
⏳ TIMESTAMPS ⏳
00:00​​​ | Intro
01:03​ | Basic Data Cleaning
04:33 | Create Train/Test Split
07:45​ | Exploratory Data Analysis
10:19 | Advanced Data Cleaning & Preprocessing
18:10 | Classification Model Selection
20:16 | Feature Engineering
22:49 | Creating the Model Pipeline
26:56 | Model Tuning
29:13 | Classification Model Evaluation
30:28 | Regression
🔗 KEEP IN TOUCH 🔗
📸 Instagram: harrysdatajourney
💻 GitHub: github.com/harryallum
📝 LinkedIn: www.linkedin.com/in/harry-allum/
WHO AM I?
My name is Harry 👋 I'm an Electro-Mechanical Engineer and aspiring Data Scientist, documenting my journey of trying to land my first job in Data Science. Come and follow along as I document my journey! Along the way, I'll be talking about my favourite learning resources, online course reviews and original tutorials.
⭐️ Tags ⭐️
#XGBoost #ML #DataScienceProjects #DataScience #DataScienceForBeginners #DataScienceProjects #pythonprogramming #Python #​SQL #DataAnalyst #Beginners​ #Tutorial​ #Data​ #Analysis​ ​ #Programming​ #Coding #DataJourney
zhlédnutí: 2 783

Video

How to Create and Deploy a Multi-Page Python Dashboard with Plotly Dash | Data Portfolio Project
zhlédnutí 6KPřed 6 měsíci
My Dashboard: www.thepropertydashboard.co.uk/ Project GitHub: github.com/harryallum/Dash-Property-Dashboard Dataset: www.gov.uk/government/statistical-data-sets/price-paid-data-downloads Plotly Dash: dash.plotly.com/ Dash Bootstrap Components: dash-bootstrap-components.opensource.faculty.ai/ ⏳ TIMESTAMPS ⏳ 00:00​​​ | Intro 01:51​ | Data Processing 11:46​ | Creating Single Page Dashboards 22:48 ...
I want to be a DATA SCIENTIST.
zhlédnutí 383Před 7 měsíci
My first video of a series documenting my journey of trying to break into Data Science as someone with no background in the field. ⏳ TIMESTAMPS ⏳ 00:00​​​ | Introduction 03:20​ | Education 08:06​ | Experience 09:06 | Why Data Science? 11:00 | The Challenge 12:03 | Progress So Far 🔗 KEEP IN TOUCH 🔗 📸 Instagram: harrysdatajourney 💻 GitHub: github.com/harryallum 📝 LinkedIn: www.link...

Komentáře

  • @michaelgeorgiou7738
    @michaelgeorgiou7738 Před 14 dny

    I'm completely new to all this only started working with python this month, I'm amazed, how did you make your vscode function like this? Is this setup specific to data engineering, specifically when you execute a function it appears below with a processing time indicator, amazed

    • @harrysdatajourney
      @harrysdatajourney Před 14 dny

      I think you might be talking about Jupyter notebooks, or ipynb files. These are used a lot in different data fields. They let you run sections of code with annotations. Give it a search, I hope it helps!

    • @michaelgeorgiou7738
      @michaelgeorgiou7738 Před 14 dny

      @@harrysdatajourney Thanks a million for the reply, I'll check out Jupyter notebooks I'm sure that's what's I was looking for!

  • @clipstok788
    @clipstok788 Před 21 dnem

    what is the name of vs code theme?

  • @ejiroerhue
    @ejiroerhue Před měsícem

    I’m a fresh mechanical engineering graduate with an interest in data science. I really enjoyed your story and I look forward to witnessing your journey.

  • @user-vd9nd3gm4n
    @user-vd9nd3gm4n Před měsícem

    Are you actually typing that fast? 😮

  • @TheMISBlog
    @TheMISBlog Před měsícem

    Good Luck Harry, just Subscribed

  • @imfinitiamusic.4632
    @imfinitiamusic.4632 Před měsícem

    Respect deserves a sub!!!

  • @vishukumar6477
    @vishukumar6477 Před měsícem

    I am Aspiring Data Scientist it's very helpful and Awesome ' ✌

  • @thelogiclabio
    @thelogiclabio Před měsícem

    Great stuff Allum

  • @bilal-khan
    @bilal-khan Před měsícem

    A question, if you have a large number of features. How do you choose between different categorical encoding? Do you attend to features on individual basis and then decide what encoding should be used?

  • @pent1162
    @pent1162 Před měsícem

    I got this error: "All estimators should implement fit and transform, or can be 'drop' or 'passthrough' specifiers. 'Pipeline(steps=[('col_dropper', ColumnDropper(columns_to_drop=['red_vehicle']))])' (type <class 'sklearn.pipeline.Pipeline'>) doesn't." From Chatgpt, "According to the error message, the issue lies with the cols_to_drop_pipeline in the ColumnTransformer. In the ColumnTransformer, the output of cols_to_drop_pipeline should be directly discarded rather than being processed as a complete transformer." But, does anyone meet the error?

    • @harrysdatajourney
      @harrysdatajourney Před měsícem

      Your pipeline code looks fine. Can you share your code for the custom transformer?

    • @pent1162
      @pent1162 Před měsícem

      I found the typo in class: "The ColumnDropper class in this code has a spelling error; transfrom should be changed to transform. This spelling error occurs in the initial definition of the ColumnDropper class." While I correct it, all is fine. :P

    • @harrysdatajourney
      @harrysdatajourney Před měsícem

      @@pent1162 Glad to hear it!

  • @learner8324
    @learner8324 Před měsícem

    great content, eagerly waiting for the deployment part.....

  • @abhinavmallick3413
    @abhinavmallick3413 Před 2 měsíci

    would be amazing if you share the resources on python that you mentioned you were studying from!best of luck, and thank you in advance!

  • @hoangha6680
    @hoangha6680 Před 2 měsíci

    thanks for the video. Just a small suggestion that an "end-to-end" data science project also includes model deployment such as on a web-app, etc. I hope that your future 'end-to-end' DS project will also have this part.

    • @harrysdatajourney
      @harrysdatajourney Před 2 měsíci

      Your right! I wanted to look at covering model deployment in a separate video as this one was already quite long. Thanks for the suggestion!

    • @hoangha6680
      @hoangha6680 Před 2 měsíci

      @@harrysdatajourney no prob, in my opinion, long and detailed videos attract me the most since they cover the full picture,.It doesn't matter if you have long video such as more than 1 hour ^^

    • @harrysdatajourney
      @harrysdatajourney Před 2 měsíci

      @@hoangha6680good to know! Thanks for the feedback 😀

  • @DarkOceanShark
    @DarkOceanShark Před 2 měsíci

    Thanks Harry, I am looking forward for more such content from you. :)

    • @harrysdatajourney
      @harrysdatajourney Před 2 měsíci

      Thanks! Let me know if you have any suggestions on what you’d like me to cover next 😀

  • @shailendra_kunwar
    @shailendra_kunwar Před 2 měsíci

    I was just learning about classification models and then I got this recommendation from You tube. Awesome video Harry.

  • @candypopz7865
    @candypopz7865 Před 2 měsíci

    As someone who wants to become a business insights analyst, this is very helpful. Thanks Harry! ❤

  • @santiagotabordagiraldo7759
    @santiagotabordagiraldo7759 Před 2 měsíci

    Hi Harry, thanks for your video, I have got a question while watching... Those 'pages' could be used just like simple pages on another web design, something like a Django project and still working the same?

  • @Louis-cm4er
    @Louis-cm4er Před 5 měsíci

    Thanks for the vid !! it was perfect :)

  • @datawithtess
    @datawithtess Před 6 měsíci

    Harry you just got a subcribe from me

  • @John-xi2im
    @John-xi2im Před 6 měsíci

    the data (4.7 gb overall size) is too huge for my laptop processor (AMD Athlon silver 3050u with radeon graphics × 2 and graphics: RAVEN (raven, LLVM 15.0.7, DRM 3.54, 6.5.0-18-generic)) to complete the ddf.compute() step as the kernel keeps on dying on that stage. I guess I have to download 3 or 4 individual year files from uk.gov , concat them and then follow the plotly tutorial, as the real interesting thing is how to create multi page plotly dashboard !

    • @John-xi2im
      @John-xi2im Před 6 měsíci

      def collating_yearly_data(): raw_data_df = pd.DataFrame() for fname in glob.glob(path): raw_data_df = pd.concat([raw_data_df, pd.read_csv(fname) return raw_data_df

    • @John-xi2im
      @John-xi2im Před 6 měsíci

      using the above function (glob use is the only new thing in this), I am using 6 years data to move ahead with the project 👍

    • @harrysdatajourney
      @harrysdatajourney Před 6 měsíci

      Yep! It's a very large dataset. Using just a few select years is a great approach if you're more interesting it creating the dashboard itself!

  • @nanshibukawa7576
    @nanshibukawa7576 Před 6 měsíci

    Have you ever used streamlit ?? if positive, which do you think is better?? streamlit or plotly dash

    • @harrysdatajourney
      @harrysdatajourney Před 6 měsíci

      I haven't tried using Streamlit yet. I do plan on trying at some point soon, so I may cover it in a future video!

    • @ntran04299
      @ntran04299 Před měsícem

      @@harrysdatajourney yes please! I'm looking into Streamlit myself too

  • @elio3232
    @elio3232 Před 6 měsíci

    Hi !!! Thanks for do this from th simply to complex. It's really helps. I have a question with a multipage App. I need that one click on an y-axes from a figure in a page A, trigger an update on a figure from another page B. What need to Do that?

    • @harrysdatajourney
      @harrysdatajourney Před 6 měsíci

      Thanks! For you situation, I'd use dcc.Store, part of Plotly Dash. You can use this to store your selection on page A in the browser as JSON, then load the plot using this on page B. You can find the documentation on dcc.Store here: dash.plotly.com/dash-core-components/store

    • @elio3232
      @elio3232 Před 6 měsíci

      @harrysdatajourney thank you so much. I will try. and the I will tell you.

    • @elio3232
      @elio3232 Před 6 měsíci

      @@harrysdatajourney Hi again, i'm trying to use dcc.store y dcc.link in page A to trigger the page B when a click event ocurs in a figure from the page A. If in dcc.link i use target='_self' the page B loads succesfully and print the value stored from the click but if target='_blank' the page B it's opened in a tab (that's what i want) but seems that the value stored== None. I don't know why

  • @John-xi2im
    @John-xi2im Před 6 měsíci

    after installing dask, while running dd.read_csv() method, pyarrow>= 10.0.1 import error is coming even though pyarrow versions above > 10 (all from 11 to 15) are already installed. Could it be the effect of an upstream problem (deprecation warning while importing dask, which is replaced by dask-expr)?

    • @harrysdatajourney
      @harrysdatajourney Před 6 měsíci

      It might be worth just trying to install the version the error message is asking for with pip install pyarrow==10.0.1

    • @John-xi2im
      @John-xi2im Před 6 měsíci

      @@harrysdatajourney thanks for your kind response, had to reinstall ubuntu , will try and let you know !

    • @John-xi2im
      @John-xi2im Před 6 měsíci

      this time i tried and no pyarrow error occured (probably it was spyder ide that was the issue). 👍

    • @harrysdatajourney
      @harrysdatajourney Před 6 měsíci

      Glad to hear it!

  • @thelogiclabio
    @thelogiclabio Před 6 měsíci

    Great stuff Harry!