- 34
- 170 478
A Data Odyssey
Ireland
Registrace 15. 02. 2014
Data exploration, interpretable machine learning, explainable AI and algorithm fairness
Friedman's H-statistic for Analysing Interactions | Maths and Intuition
We dive deep into Friedman's H-statistic also known as the H-stat or H-index. This is a popular explainable AI (XAI) method. It is a powerful metric for analyzing interactions between features in your machine learning model. We will:
- Build intuition for the method by comparing it to PDPs and ICE Plots.
- Explain the mathematics behind the pairwise, overall and unnormalised formulas.
- Discuss its limitations including computational complexity and spurious interactions caused by multicollinearity.
We will see that there are two versions of the statistic. One for the interaction between two features and another for the interaction between a feature and all the other features. We will also explore the difference between the normalized and unnormalized H-stat. We will understand how these all work together to find and analyse interactions in your data and models.
🚀 Free Course 🚀
Signup here: mailchi.mp/40909011987b/signup
XAI course: adataodyssey.com/courses/xai-with-python/
SHAP course: adataodyssey.com/courses/shap-with-python/
🚀 Companion article with link to code (no-paywall link): 🚀
medium.com/towards-data-science/understanding-freidmans-h-statistic-h-stat-for-interactions-43fb5e31a586?sk=33c8f3eee9106d35d069f39077e7fcf9
🚀 Useful playlists 🚀
XAI: czcams.com/play/PLqDyyww9y-1SwNZ-6CmvfXDAOdLS7yUQ4.html
SHAP: czcams.com/play/PLqDyyww9y-1SJgMw92x90qPYpHgahDLIK.html
Algorithm fairness: czcams.com/play/PLqDyyww9y-1Q0zWbng6vUOG1p3oReE2xS.html
🚀 Get in touch 🚀
Medium: conorosullyds.medium.com/
Threads: www.threads.net/@conorosullyds
Twitter: conorosullyDS
Website: adataodyssey.com/
🚀 Chapters 🚀
00:00 Introduction
01:20 Definition of an interaction
02:58 Intuition
05:18 Mathematics
07:50 Pairwise H-stat
09:35 Overall H-stat
11:10 Limitations
- Build intuition for the method by comparing it to PDPs and ICE Plots.
- Explain the mathematics behind the pairwise, overall and unnormalised formulas.
- Discuss its limitations including computational complexity and spurious interactions caused by multicollinearity.
We will see that there are two versions of the statistic. One for the interaction between two features and another for the interaction between a feature and all the other features. We will also explore the difference between the normalized and unnormalized H-stat. We will understand how these all work together to find and analyse interactions in your data and models.
🚀 Free Course 🚀
Signup here: mailchi.mp/40909011987b/signup
XAI course: adataodyssey.com/courses/xai-with-python/
SHAP course: adataodyssey.com/courses/shap-with-python/
🚀 Companion article with link to code (no-paywall link): 🚀
medium.com/towards-data-science/understanding-freidmans-h-statistic-h-stat-for-interactions-43fb5e31a586?sk=33c8f3eee9106d35d069f39077e7fcf9
🚀 Useful playlists 🚀
XAI: czcams.com/play/PLqDyyww9y-1SwNZ-6CmvfXDAOdLS7yUQ4.html
SHAP: czcams.com/play/PLqDyyww9y-1SJgMw92x90qPYpHgahDLIK.html
Algorithm fairness: czcams.com/play/PLqDyyww9y-1Q0zWbng6vUOG1p3oReE2xS.html
🚀 Get in touch 🚀
Medium: conorosullyds.medium.com/
Threads: www.threads.net/@conorosullyds
Twitter: conorosullyDS
Website: adataodyssey.com/
🚀 Chapters 🚀
00:00 Introduction
01:20 Definition of an interaction
02:58 Intuition
05:18 Mathematics
07:50 Pairwise H-stat
09:35 Overall H-stat
11:10 Limitations
zhlédnutí: 748
Video
Accumulated Local Effect Plots (ALEs) | Explanation & Python Code
zhlédnutí 406Před dnem
Highly correlated features can wreak havoc on your machine-learning model interpretations. To overcome this, we could rely on good feature selection. But there are still cases when a feature, although highly correlated, will provide some unique information leading to a more accurate model. So we need a method that can provide clear interpretations, even with multicollinearity. Thankfully we can...
PDPs and ICE Plots | Python Code | scikit-learn Package
zhlédnutí 163Před 21 dnem
Both Partial Dependence Plots (PDPs) and Individual Conditional Expectation (ICE) plots are a popular explainable AI (XAI) method. They can visualise the relationships used by a machine learning model to make predictions. In this video, we will see how to apply the methods using Python. We will use the scikit-learn package and the PartialDependenceDisplay & partial_dependence functions. We will...
Partial Dependence (PDPs) and Individual Conditional Expectation (ICE) Plots | Intuition and Math
zhlédnutí 360Před měsícem
Both Partial Dependence (PDPs) and Individual Conditional Expectation (ICE) Plots are used to understand and explain machine learning models. PDPs can tell us if a relationship between a model feature and target variable is linear, non-linear or if there is no relationship. Similarly, ICE plots are used to visualise interactions. Now, at first glance, these plots may look complicated. But you w...
Permutation Feature Importance from Scratch | Explanation & Python Code
zhlédnutí 433Před měsícem
Feature importance scores are a collection of methods all used to answer one question: which machine learning model features have contributed the most to predictions in general? Amongst all these methods, permutation feature importance is the most popular. This is due to it’s intuitive calculation and because it can be applied to any machine learning model. Understanding PFI is also an importan...
Model Agnostic Methods for XAI | Global v.s. Local | Permutation v.s. Surrogate Models
zhlédnutí 252Před měsícem
Model agnostic method can be used with any model. In Explainable AI (XAI), this means we can use them to interpret models without looking at their interworkings. This gives us a powerful way to interpret and explain complex black-box machine learning models. We will elaborate on this definition. We will also discuss the taxonomy of model agnostic methods for interpretability. They can be classi...
8 Plots for Explaining Linear Regression | Residuals, Weight, Effect & SHAP
zhlédnutí 675Před měsícem
For data scientists, a regression summary might be all that's needed to understand a linear model. However, when explaining these models to a non-technical audience, it’s crucial to employ more digestible visual explanations. These 8 methods not only make linear regression more accessible but also enrich your analytical storytelling, making your findings resonate with any audience. We understan...
Feature Selection using Hierarchical Clustering | Python Tutorial
zhlédnutí 1,1KPřed 2 měsíci
In this comprehensive Python tutorial, we delve into feature selection for machine learning with hierarchical clustering. We guide you through the essentials of partitioning features into cohesive groups to minimize redundancy in model training. This technique is particularly important as your dataset expands, offering a structured alternative to manual grouping. What you'll learn: - The import...
8 Characteristics of a Good Machine Learning Feature | Predictive, Variety, Interpretability, Ethics
zhlédnutí 180Před 2 měsíci
Feature selection is hard! So, I explain how you can use a combination of variable clustering and feature importance to help create a shortlist. I will also explain the key factors you need to consider when selecting features. The most important are predictive power and predictor variety. But there are also other considerations including data quality and availability, feature stability, interpr...
Interpretable Feature Engineering | How to Build Intuitive Machine Learning Features
zhlédnutí 382Před 2 měsíci
There are many ways to capture underlying relationships in your data. Some will be easier to explain as they align with the intuition of your audience. So we should really be doing feature engineering not just for predictability but also for interpretability. We’re going to discuss how to reformulate features with the goal of interpretability. At the same time, we’re going to understand how to ...
Modelling Non-linear Relationships with Regression
zhlédnutí 386Před 2 měsíci
This video is an advocacy for linear models. Its goal is to convince you that they should always be your first choice. Especially, if you care about model interpretability. This is because they are easier to explain, widely understood and accepted in many industries. Building them also requires you to think more critically about your problem and data. Most importantly, a well-structured linear ...
Explaining Machine Learning to a Non-technical Audience
zhlédnutí 443Před 2 měsíci
An important part of a data scientist’s job is to explain machine learning model predictions. Often, the person receiving the explanation will be non-technical. If you start talking about cost functions, hyperparameters or p-values you will be met with blank stares. We need to translate these technical concepts into layman’s terms. This process can be more challenging than building the model it...
Get more out of Explainable AI (XAI): 10 Tips
zhlédnutí 507Před 3 měsíci
Explainable Artificial intelligence (XAI), also known as Interpretable Machine Learning (IML), can explain complex machine learning models. But, the methods are not a golden bullet. You can’t simply fire them at black-box models and expect reasonable explanations for their inner workings. Yet, they can still provide incredible insight if used correctly. So, I give 10 tips for getting the most o...
The 6 Benefits of Explainable AI (XAI) | Improve accuracy, decrease harm and tell better stories
zhlédnutí 578Před 3 měsíci
Explainable AI (XAI), also known as interpretable machine learning (IML), can help you understand and explain your model. This has many benefits. It can help decrease harm and increase trust in machine learning. You can also gain knowledge of your dataset and tell better stories about your results. You can even improve the accuracy of your models and performance in production. We will discuss t...
Introduction to Explainable AI (XAI) | Interpretable models, agnostic methods, counterfactuals
zhlédnutí 1,9KPřed 3 měsíci
Artificial intelligence (AI) and machine learning (ML) impact our lives in many ways. From mundane tasks to critical decision-making processes, AI's role is becoming more central. As a result, the need for transparency and interpretability of these systems is growing. This is why we need to field of Expianable AI (XAI) also known as interpretable machine learning (IML). We will take a brief loo...
Data Science vs Science | Differences & Bridging the Gap
zhlédnutí 295Před 8 měsíci
Data Science vs Science | Differences & Bridging the Gap
About the Channel and my Background | ML, XAI and Remote Sensing
zhlédnutí 806Před 8 měsíci
About the Channel and my Background | ML, XAI and Remote Sensing
SHAP for Binary and Multiclass Target Variables | Code and Explanations for Classification Problems
zhlédnutí 7KPřed 8 měsíci
SHAP for Binary and Multiclass Target Variables | Code and Explanations for Classification Problems
Introduction to Algorithm Fairness | Causes, Measuring & Preventing Unfairness in Machine Learning
zhlédnutí 1,2KPřed 9 měsíci
Introduction to Algorithm Fairness | Causes, Measuring & Preventing Unfairness in Machine Learning
SHAP Violin and Heatmap Plots | Interpretations and New Insights
zhlédnutí 3,8KPřed 9 měsíci
SHAP Violin and Heatmap Plots | Interpretations and New Insights
Correcting Unfairness in Machine Learning | Pre-processing, In-processing, Post-processing
zhlédnutí 649Před 9 měsíci
Correcting Unfairness in Machine Learning | Pre-processing, In-processing, Post-processing
Definitions of Fairness in Machine Learning | Equal Opportunity, Equalized Odds & Disparate Impact
zhlédnutí 2KPřed 10 měsíci
Definitions of Fairness in Machine Learning | Equal Opportunity, Equalized Odds & Disparate Impact
Exploratory Fairness Analysis | Quantifying Unfairness in Data
zhlédnutí 749Před 10 měsíci
Exploratory Fairness Analysis | Quantifying Unfairness in Data
5 Reasons for Unfair Models | Proxy Variables, Unbalanced Samples & Negative Feedback Loops
zhlédnutí 583Před 10 měsíci
5 Reasons for Unfair Models | Proxy Variables, Unbalanced Samples & Negative Feedback Loops
Feature Engineering with Image Data | Aims, Techniques & Limitations
zhlédnutí 721Před rokem
Feature Engineering with Image Data | Aims, Techniques & Limitations
Image Augmentation for Deep Learning | Benefits, Techniques & Best Practices
zhlédnutí 1KPřed rokem
Image Augmentation for Deep Learning | Benefits, Techniques & Best Practices
Interpretable vs Explainable Machine Learning
zhlédnutí 15KPřed rokem
Interpretable vs Explainable Machine Learning
Jumping between what you are explaining and yourself is distracting
Thanks for the feedback!
Liked, and subscribed! Amazing content keep it up ! Can you suggest 2 or 3 data sets I could test this on?
Thanks, Woj! You can try these. They have some interesting interactions. archive.ics.uci.edu/dataset/1/abalone www.kaggle.com/datasets/conorsully1/pdp-and-ice-plots
Hi I have a question at 5:45, wanna know based on which pattern of the plot you said the "km_driven" is less equally distributed and skewed to the left? 😄
I'm looking at the bars on the x-axis. This is known as a "rug plot". 10% of the dataset falls before the first bar, 20% before the second bar and so on... You can see that the bars are shifted towards the left. This means that most of the dataset has a lower km_driven value. I hope that makes sense?
🚀 Free Course 🚀 Signup here: mailchi.mp/40909011987b/signup XAI course: adataodyssey.com/courses/xai-with-python/ SHAP course: adataodyssey.com/courses/shap-with-python/
Hi, I'm struggling with explaining GRU and LSTM models with SHAP. Encouraged by your videos, I am considering buying the course, but does it cover working with 3D data? Is even possible to implement SHAP and obtain reliable plots (without flattening the data) for time-series models?
Hi Sonia, unfortunately, the course focuses on tabular data and models like XGBoost, Random Forest and CatBoost. There is one lesson on SHAP for image data but it doesn't sound like that will help you much. If you are working with PyTorch, these articles might help you get started with applying SHAP: towardsdatascience.com/image-classification-with-pytorch-and-shap-can-you-trust-an-automated-car-4d8d12714eea?sk=b04dcbb8a09f049f605d2110b5c8d851 towardsdatascience.com/using-shap-to-debug-a-pytorch-image-regression-model-4b562ddef30d?sk=7eb3016839186f1ba2a6f1f105f8ff64
Best channel to dig deep into XAI. It would be great a video about the state of art of XAI applied on LLMs.
Thanks Santi! I will consider this however my interests are more in computer vision at the moment
Great recommendation from the youtube algorithm! Loving the content- keep it up!
Thanks Theo! Will do
I am getting error near model.fit my data has text and numeric So can you help me resolving it
🚀 Free Course 🚀 Signup here: mailchi.mp/40909011987b/signup XAI course: adataodyssey.com/courses/xai-with-python/ SHAP course: adataodyssey.com/courses/shap-with-python/
🚀 Free Course 🚀 Signup here: mailchi.mp/40909011987b/signup XAI course: adataodyssey.com/courses/xai-with-python/ SHAP course: adataodyssey.com/courses/shap-with-python/
Thank you so much for this awesome video. When I use this code in the #Train model section, I encounter this error. What is the solution?[17:50:59] C:\buildkite-agent\builds\buildkite-windows-cpu-autoscaling-group-i-0b3782d1791676daf-1\xgboost\xgboost-ci-windows\src\data\array_interface.h:492: Unicode-7 is not supported.
There could be many things going wrong. You can try creating a Python environment and downloading the XGBoost package and only the other ones necessary to train the model.
what is the article reference for this information i need it for my studies emergency, please
czcams.com/users/redirect?event=video_description&redir_token=QUFFLUhqbktFYXFNVHVzc3NsTWpaYkc4Y3l3alZ0N3dmZ3xBQ3Jtc0trX2c3WmlOUVQwYW1USmJsaDh4YnpLV191dk5tOEdnOUtnVF9vZm5BbG8yTmRTaU56RXZNSE12Nkh2MjRITUZSLUZINUNPWmM3WFRlbnVGZWlscDFLZnFOZy1Xb0JiYm1RMnlQbVU2MEJ4R0hoUmJxMA&q=https%3A%2F%2Ftowardsdatascience.com%2Ffrom-shapley-to-shap-understanding-the-math-e7155414213b%3Fsk%3D329a1f042a0167162487f7bb3f0ffd46&v=UJeu29wq7d0
Would be nice if the pdp had some kind of confidence interval that varied with the feature value.
That's a good idea! You might be able to use the std of the prediction around each point. It would be related to the ICE plot where a point would have a larger std if not all the individual lines follow the same trend.
I don't understand
Great explanation, thank you very much
Thanks!
Can you make a video on how recruitment decision is made?
Do you mean how automated decisions are made or decisions for data scientists in general?
how do you know which parameter of image manipulation that will be robust for any data will be faced in the future?
This is a difficult question to answer as it will depend on your problem. In general, you will need a robust dataset that includes images taken under all conditions for which the model is expected to operate. Then you can evaluate the models trained using different feature engineering methods on this dataset.
🚀 Free Course 🚀 Signup here: mailchi.mp/40909011987b/signup XAI course: adataodyssey.com/courses/xai-with-python/ SHAP course: adataodyssey.com/courses/shap-with-python/
Really useful , thank you
No problem Felice!
Personally, the distinction is not necessary.
I agree :) But I did think it was important when I first got into XAI.
Great content.
Thank you, Grazia!
Great content
🚀 Free Course 🚀 Signup here: mailchi.mp/40909011987b/signup XAI course: adataodyssey.com/courses/xai-with-python/ SHAP course: adataodyssey.com/courses/shap-with-python/
just stick to the explanations no need for the jarring adlibs
That's boring...
Appreciate a lot Prof Odyssey!Shaply values is now a more clear concept in my mind!
Thanks Ye! I'm glad you found it useful :)
🚀 Free Course 🚀 Signup here: mailchi.mp/40909011987b/signup XAI course: adataodyssey.com/courses/xai-with-python/ SHAP course: adataodyssey.com/courses/shap-with-python/
Hello. Thanks for the tutorial. Regarding your XAI and SHAP courses, is there an order to how we should take the courses. Should we take the XAI before SHAP or vice versa. Thanks
No problem! It is better to take XAI first then SHAP. XAI covers more of the basics in the field and other useful model agnostic methods. But the SHAP course still gives some basics so it is not necessary to do the entire XAI course (or even any of it) if all you care about it learning SHAP :)
@@adataodysseyAwesome. Thank you.
Excellent explanation, just what I needed. Thank you.
I’m glad you found it useful, Innocent :)
Thanks Bruh! Great Content! Would be happy if you upload a video comparing Shap with LIME and Integrated Gradients. Its a hot topic rn in data science interviews.
Thanks for the suggestion! Would this be w.r.t. computer vision models and deep learning?
Great! always clear
That’s my goal!
🚀 Free Course 🚀 Signup here: mailchi.mp/40909011987b/signup XAI course: adataodyssey.com/courses/xai-with-python/ SHAP course: adataodyssey.com/courses/shap-with-python/
This is awesome!
Thanks!
Thanks for the content on XAI and particularly SHAP, it's given me a good overview before I jump into the details. I have a sci-fi book recommendation for you: Hyperion and The Fall of Hyperion by Dan Simmons =) The first book is told from the perspective of 7 characters as they visit/revisit the planet of Hyperion that they've had dealings with in the past. Hyperion is a fringe planet in the Hegemony of Man, not connected via Farcaster, and thus a visit incurs significant time dilation. On the planet are artefacts from another intelligent force: the Time Tombs, a location with whacky time reversal effects, a 3 meter tall metallic creature covered in spikes known as the Shrike (which also has time manipulation abilities), and more. Identified as the only significant anomaly in the AI faction's predictions, everything seems to be converging on Hyperion as the Time Tombs open... Genuinely incredible read
Thanks! I actually just finished a book so this is good timing :)
I have recently joined a course on eXplainable Artificial Intelligence (XAI) of yours and I am interested in applying the concepts of interpretability to image data while ensuring that the model's accuracy is preserved. please do create some videos on that topic.
You're in luck! The next course I want to create will be XAI for computer vision. So expect to see some content soon.
I have recently joined a course on eXplainable Artificial Intelligence (XAI) of yours and I am interested in applying the concepts of interpretability to image data while ensuring that the model's accuracy is preserved. please do create some videos on that topic. Thank you!
Thanks, I was recently reading a post in LinkedIn how to eliminate highly correlated features with hierarchical clustering, but that was not clear but this is much better explained.
Thanks Karthikeya! I'm glad you found it useful. I have another video coming out tomorrow about explaining linear models.
I can't access SHAP python course. Could you please give me the access
Hi Mulusew, the SHAP course is no longer free. But you will now get free access to my XAI course if you sign up to the newsletter
Thank you so much! I was stuck at a hierarchical analysis as I did not know that I need to transpose my dataframe. Great video!
I’m glad you found this useful ☺️
🚀 Free Course 🚀 Signup here: mailchi.mp/40909011987b/signup XAI course: adataodyssey.com/courses/xai-with-python/ SHAP course: adataodyssey.com/courses/shap-with-python/
This is the best channel so far for XAI content. Keep going!
Thank you! I appreciate that :D
🚀 Free Course 🚀 Signup here: mailchi.mp/40909011987b/signup XAI course: adataodyssey.com/courses/xai-with-python/ SHAP course: adataodyssey.com/courses/shap-with-python/
🚀 Free Course 🚀 Signup here: mailchi.mp/40909011987b/signup XAI course: adataodyssey.com/courses/xai-with-python/ SHAP course: adataodyssey.com/courses/shap-with-python/
🚀 Free Course 🚀 Signup here: mailchi.mp/40909011987b/signup XAI course: adataodyssey.com/courses/xai-with-python/ SHAP course: adataodyssey.com/courses/shap-with-python/
🚀 Free Course 🚀 Signup here: mailchi.mp/40909011987b/signup XAI course: adataodyssey.com/courses/xai-with-python/ SHAP course: adataodyssey.com/courses/shap-with-python/
can you give an example of how to plot heatmaps for a PyTorch model?
I will keep this in mind. I am planning to do a few tutorials using different packages --- Scikit learn, catboost, pytorch etc...
Excellent video ❤❤❤❤❤❤
Thank you ☺️ I’m glad it could help
whats the color of your eyes?
Blue :)
Bro that was a nice explanation. thanks so much.
No problem :) I’m glad it was useful
Excellent! You gave me an idea 💡 Great job!
Thanks Jose! I'm glad I could help
Where is the link for the code for the insurance model?
github.com/a-data-odyssey/XAI-tutorial/blob/main/src/intro/human_friendly_explanations.ipynb