Building a Machine Learning Pipeline with Python and Scikit-Learn | Step-by-Step Tutorial

What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipeline (2024)

Implementing Machine Learninng Pipelines USsing Sklearn And Python

My little bro is funny😁 @artur-boy

REAKCE NA SONG BUNNY HOP S FRANKEM WILDEM A AINKOU!

AI: Giganti, horečka a konec světa | KOVY

Professional Preprocessing with Pipelines in Python

NeuralNine

zhlédnutí 58 597

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 19. 03. 2022
In this video, we learn about preprocessing pipelines and how to professionally prepare data for machine learning.
◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾
📚 Programming Books & Merch 📚
🐍 The Python Bible Book: www.neuralnine.com/books/
💻 The Algorithm Bible Book: www.neuralnine.com/books/
👕 Programming Merch: www.neuralnine.com/shop
🌐 Social Media & Contact 🌐
📱 Website: www.neuralnine.com/
📷 Instagram: / neuralnine
🐦 Twitter: / neuralnine
🤵 LinkedIn: / neuralnine
📁 GitHub: github.com/NeuralNine
🎙 Discord: / discord
🎵 Outro Music From: www.bensound.com/
Věda a technologie

Komentáře • 39

@vzinko Před 6 měsíci ⁺⁹
Rather than creating a class for each step, another much easier approach is to make use of sklearn's FunctionTransformer. This basically allows you to write a custom function and turn it into a transformer object, which can then be fed through a pipeline as per normal
@randomfinn404 Před 2 lety ⁺¹⁶
For those who noticed that the encoder seems to sort the values alphabetically and messes up the job column names, instead of manually typing column names you can do:
matrix = encoder.fit_transform(X[['Job']]).toarray()
column_names = sorted([i for i in df['Job'].unique()])
This will also work if there are more /new jobs and values added and makes a column for each unique value while keeping the order.
Good tutorial in any case!
@jacksummers3918 Před rokem ⁺⁵
Use
pd.get_dummies(X.Job, prefix="Job")
Much neater
@onecarry1532 Před 2 lety ⁺⁷
Hey man, great channel! Love the topic based tutorials ❤️
Video Suggestion: Can I suggest you attempt making a video on: Using Python and the Tree Algorithm to make an autocomplete Python CLI program.
Haven’t seen this anywhere and I guess it’s a great way to understand why the Tree algorithm might be the best solution for an autocomplete program.
Thanks! Sure we all appreciate what you do for the community ♥️ 🌻
@isaacandrewdixon Před rokem ⁺⁵
This was awesome and very informative. Many thanks from a machine learning novice!
@nathanhaynes2856 Před 4 měsíci ⁺²
Nice. For this example I might use the ColumnTransformer class, its perfect for dropping columns and integrating imputers and scalers on select features.
@dmitriidavs4181 Před 2 lety ⁺¹
Fantastic video, always wondered the reasoning behind using classes in ml, thank you!!!
@Deacc Před 2 lety ⁺¹
This video is pure gold. Thank you so much!
@niv_syt6315 Před 2 lety ⁺¹
I remember when I took courses from udemy in ML and took more time from this video, keeps to continue creating more videos from the same subject.
@vlplbl85 Před 2 lety ⁺²
I find using FunctionTransformer much easier. It turns each of your custom functions into a transformer and you don't need to write a class, but just a function.
@manyes7577 Před rokem
wow this technique is amazing. thanks for sharing us with brilliant knowledge
@tharakawickramasinghe3762 Před rokem
Thank you. This is very helpful.
@apheironnn Před 11 měsíci
That was really helpful, thanks!
@MrTactics26 Před 4 měsíci
Sick video bro! 😎
@Juzz_RSA Před rokem
Thank you, this was informative 😁
@juandiegoorozco5531 Před 9 měsíci
really useful, thank you very much
@sviteribuben7245 Před 2 lety
Very usefull! Thx!
@gasfeesofficial3557 Před 20 dny
bro great video!!
@jelcroospockt Před 7 měsíci
I would really like to find a tutorial on how to pass arguments to an pipeline function you created yourself, like the namedropper. So i can use the gridsearch to try out dropping different features.
@pakaponwiwat2405 Před 6 měsíci
Thank you, sir!
@juanbetancourt5106 Před 2 lety
Great!
@allanmachado2011 Před 3 měsíci
Thank you!
@thomasgoodwin2648 Před 2 lety
With an eye towards the love that programming has gotten from the ml community lately, it occurs to me that perhaps ml could also be used more in the data preprocessing role.
For example: Choosing encoding types, handling missing values, flattening, etc could all be automated.
Just a thought.
2nd random thought. I know random noise has been added to features in an attempt to get the models to generalize better but did not fare well.
However I have not seen that anyone has tried simply using noise generators (normal, gaussian, etc) as individual features and allowing the model itself to choose when and where noise might be effective.
@nikulnayi3271 Před rokem
Thank you so much nicely explained
with what you showed i created pipeline and dumped it as pikle file but when i tryinng to load that model and using it. i have been facing an error : AttributeError: Can't get attribute 'NullEncoder' on
@nachoeigu Před rokem ⁺⁴
I have a big one question: What is the difference of build a Machine Learning application with Pipeline and to build a machine learning application with a OOP technique? I see that it is the same.
@adriandiaz5688 Před rokem
Yeah, this is a great video but that's something I'm curious about as well.
@__wouks__ Před 2 lety
I think your feature encoder has some faulty logic for the "Job" column. The df2 for example shows 1 x writer, 3 x programmer and 1 x teacher, but afterwards there isn't even a "teacher" column. And if you were to recreate the single columns using 1 or 0 from the features you created you wouldn't get the same dataframe.
@o1techacademy Před 9 měsíci
Awesome
@aayushpatel2904 Před rokem
Thanks Sir
@736939 Před rokem ⁺¹
16:42 I think it's wrong to use fit_transform in transform method, because it will cause to memory leakage, after you divide data into two parts train/test - where transform on the test dataset will recalculate imputer.
@falkstankat6511 Před rokem
Yeah, thought the Same
@rohscx Před 2 lety
What is the opening song of this videos name?
@MalcombBrown Před 2 lety ⁺²
Could you use the get_dummies pandas method for the One Hot Encoding?
@lexcheshir6416 Před 2 lety
yep
@slothner943 Před 8 měsíci
Are you swedish? 😮
@dilshodfayzullayev924 Před 5 měsíci
where do you work #admin
@bellabella-tv8zg Před 2 lety
1st

Další v pořadí

Automatické přehrávání

Building a Machine Learning Pipeline with Python and Scikit-Learn | Step-by-Step Tutorial

Building a Machine Learning Pipeline with Python and Scikit-Learn | Step-by-Step Tutorial

What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipeline (2024)

What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipeline (2024)

Implementing Machine Learninng Pipelines USsing Sklearn And Python

Implementing Machine Learninng Pipelines USsing Sklearn And Python

My little bro is funny😁 @artur-boy

My little bro is funny😁 @artur-boy

REAKCE NA SONG BUNNY HOP S FRANKEM WILDEM A AINKOU!

REAKCE NA SONG BUNNY HOP S FRANKEM WILDEM A AINKOU!

AI: Giganti, horečka a konec světa | KOVY

AI: Giganti, horečka a konec světa | KOVY

Un homme de confiance 😂😂

Un homme de confiance 😂😂

Titanic Survival Prediction in Python - Machine Learning Project

Titanic Survival Prediction in Python - Machine Learning Project

Modern Graphical User Interfaces in Python

Modern Graphical User Interfaces in Python

The BEST library for building Data Pipelines...

The BEST library for building Data Pipelines...

How do I encode categorical features using scikit-learn?

How do I encode categorical features using scikit-learn?

Scikit-Learn Model Pipeline Tutorial

Scikit-Learn Model Pipeline Tutorial

Turn An Excel Sheet Into An Interactive Dashboard Using Python (Streamlit)

Turn An Excel Sheet Into An Interactive Dashboard Using Python (Streamlit)

Simplify Data Preprocessing with Python's Column Transformer: A Step-by-Step Guide

Simplify Data Preprocessing with Python's Column Transformer: A Step-by-Step Guide

How I would learn Machine Learning (if I could start over)

How I would learn Machine Learning (if I could start over)

Data Pipelines Explained

Data Pipelines Explained

Would you gift it to your bestie 💞🥰 #miniphone #smartphone #iphone #samsung #fyp

Would you gift it to your bestie 💞🥰 #miniphone #smartphone #iphone #samsung #fyp

Урна с айфонами!

Урна с айфонами!

Smart appliances - new gadgets, versatile utensils, tool items #gadgets #shorts

Smart appliances - new gadgets, versatile utensils, tool items #gadgets #shorts

💅🏻Айфон vs Андроид🤮

💅🏻Айфон vs Андроид🤮

Hisense Official Flagship Store Hisense is the champion What is going on?

Hisense Official Flagship Store Hisense is the champion What is going on?

No fear of others peeping anymore☺️ #phoneaccessories #screenprotector #ezglaz #privacyfilm #fyp

No fear of others peeping anymore☺️ #phoneaccessories #screenprotector #ezglaz #privacyfilm #fyp

Gizli Apple Watch Özelliği😱

Gizli Apple Watch Özelliği😱

It's a THICK tablet and I'm kinda into that - Minisforum V3

It's a THICK tablet and I'm kinda into that - Minisforum V3