Vincent Warmerdam: How to Constrain Artificial Stupidity | PyData London 2019

Maria Khalusova: Machine Learning Model Evaluation Metrics | PyData LA 2019

A Bluffer's Guide to Dimension Reduction - Leland McInnes

BUCHINGER vs. LEV 🦁

Amazing weight loss transformation !! 😱😱

Llegó al techo 😱

Sergey Feldman: You Should Probably Be Doing Nested Cross-Validation | PyData Miami 2019

PyData

zhlédnutí 10 144

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 25. 07. 2024
It is common to perform model selection while also attempting to estimate accuracy on a held-out set. The traditional solution is to split a data set into training, validation, and test subsets. On small datasets, however, this strategy suffers from high variance. A common approach to reusing a small number of samples for model selection is cross-validation, which typically is applied across an entire dataset. Then the best model is evaluated on the test set. This approach has a fundamental flaw: if the test is small, the performance estimate is high variance. The solution is double (or nested) cross-validation, which will be explained in this talk.
www.pydata.org
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.
Want to help add timestamps to our CZcams videos to help with discoverability? Find out more here: github.com/numfocus/CZcamsVi...
Věda a technologie

Komentáře • 6

@iancherabier5920 Před 5 měsíci
Thanks a lot, an extremely clear explanation of nested CV!
@bryanparis7779 Před rokem
THANK YOU so helpful! so interesting so so so :)
@BulkySplash169 Před 2 lety
Nice, thx!
@QIQIWU-fd1xz Před rokem ⁺¹
This is really helpful! Thanks for sharing. One question, at 10:55, when running the 5-fold CV, shouldn't we use X_train_val instead of X_train? Because the splitting is done by sklearn, thus we don't need to hold out a validation set.
@sergey_of_fields Před 10 měsíci
Yes! Sorry that was a bug in the code.
@nespereira Před 9 dny
Very useful! One question: in many medical datasets, especially in single-group research settings, the sample sizes are more around 100 or less (being in the thousands is rare). With this number of subjects, one worry is that putting away subjects for testing removes samples in a context where there really is not much data to begin with. Then you need to think about how many features you can afford etc...
Don't get me wrong, I'm all in for nested cross-validation, but I am curious to hear your thoughts on this type of scenario, where getting data is really expensive.

Další v pořadí

Automatické přehrávání

Vincent Warmerdam: How to Constrain Artificial Stupidity | PyData London 2019

Vincent Warmerdam: How to Constrain Artificial Stupidity | PyData London 2019

Maria Khalusova: Machine Learning Model Evaluation Metrics | PyData LA 2019

Maria Khalusova: Machine Learning Model Evaluation Metrics | PyData LA 2019

A Bluffer's Guide to Dimension Reduction - Leland McInnes

A Bluffer's Guide to Dimension Reduction - Leland McInnes

BUCHINGER vs. LEV 🦁

BUCHINGER vs. LEV 🦁

Amazing weight loss transformation !! 😱😱

Amazing weight loss transformation !! 😱😱

Llegó al techo 😱

Llegó al techo 😱

YZO & PTK - NO SLEEP GANG / GET LOW (official double music video)

YZO & PTK - NO SLEEP GANG / GET LOW (official double music video)

Dan Ryan: Efficient and Flexible Hyperparameter Optimization | PyData Miami 2019

Dan Ryan: Efficient and Flexible Hyperparameter Optimization | PyData Miami 2019

Can one do better than XGBoost? - Mateusz Susik

Can one do better than XGBoost? - Mateusz Susik

Vincent D Warmerdam - The Duct Tape of Heroes Bayesian statistics

Vincent D Warmerdam - The Duct Tape of Heroes Bayesian statistics

Physicists Have Proven That the Universe Does Not Exist!

Physicists Have Proven That the Universe Does Not Exist!

How Paris Pulled Off One Of The Cheapest Olympics

How Paris Pulled Off One Of The Cheapest Olympics

Chris Fonnesbeck: An introduction to Markov Chain Monte Carlo using PyMC3 | PyData London 2019

Chris Fonnesbeck: An introduction to Markov Chain Monte Carlo using PyMC3 | PyData London 2019

US Stock Market See Their Worst Day Since 2022, AI & Tech Stocks Bleed | Vantage with Palki Sharma

US Stock Market See Their Worst Day Since 2022, AI & Tech Stocks Bleed | Vantage with Palki Sharma

Опасность фирменной зарядки Apple

Опасность фирменной зарядки Apple

Privacy on iPhone | Flock | Apple

Privacy on iPhone | Flock | Apple

Nejlepší SD Karta Na Hry😳

Nejlepší SD Karta Na Hry😳

Kopírování klíče do skříňky

Kopírování klíče do skříňky

Cheapest gaming phone? 🤭 #miniphone #smartphone #iphone #fy

Cheapest gaming phone? 🤭 #miniphone #smartphone #iphone #fy

Airpods Fit Inside The Galaxy Buds 3 Pro Case...?

Airpods Fit Inside The Galaxy Buds 3 Pro Case...?

Он придумал гениальную идею, как исправить разбитый экран! 🤯 | Credit : gertieinar (TT)

Он придумал гениальную идею, как исправить разбитый экран! 🤯 | Credit : gertieinar (TT)

Samsung’s Techs Voiding TV Warranties?

Samsung’s Techs Voiding TV Warranties?