Shapley Additive Explanations (SHAP)

Sdílet
Vložit
  • čas přidán 25. 06. 2024
  • In this video you'll learn a bit more about:
    - A detailed and visual explanation of the mathematical foundations that comes from the Shapley Values problem;
    - How does SHAP (Shapley Additive Explanations) reframes the Shapey Value problem
    - What is Local Accuracy, Missingness, and Consistency in the context of explainable models
    - What is the Shapley Kernel
    - An example that shows how a prediction can be examined using Kernel SHAP.
    Author:
    Rob Geada
    - e-mail: rgeada@redhat.com
    - LinkedIn: / rob-geada-7a8486111
    00:00 Introduction
    00:23 Shapley Values
    02:27 Shapley Additive Explanations
    04:16 Local Accuracy, Missingness, and Consistency
    05:58 Shapley Kernel
    08:06 Example
  • Věda a technologie

Komentáře • 81

  • @user-td8vz8cn1h
    @user-td8vz8cn1h Před 6 měsíci +9

    This is literally the best explanation of shapley values I've found on CZcams, and probably on the entire internet, the voice, visualizations - everything on the top level

  • @cornevanzyl5880
    @cornevanzyl5880 Před 10 dny

    As my PhD involves understanding how SHAP works for model explainablity, this video is by far the most accurate and indepth explanation of what it is and how it works. You demonstrate a very good grasp of the topic😊

  • @chinameng4636
    @chinameng4636 Před 2 lety +18

    really brilliant work! I've seen so many videos but none of them talk about the backgroud data! Your video go into this question deep enough in such a short time! THANKS A LOT!

  • @ssethia86
    @ssethia86 Před 3 lety +6

    concise, clean, and clear. Nicely delivered!! Bravo!!

  • @GleipnirHoldsFenrir
    @GleipnirHoldsFenrir Před 3 lety +2

    Best video on that topic I have seen so far. Thanks for your work.

  • @dianegenereux1264
    @dianegenereux1264 Před 2 lety +1

    I really appreciated the clear conceptual explanation at the very start of the video. Thank you!

  • @NilayBadavne
    @NilayBadavne Před 2 lety +4

    Thank you so much for this video. Really well articulated.
    You start from the basics - which is what many are missing from their blogs/videos.

  • @mohammadsharara3170
    @mohammadsharara3170 Před rokem +2

    Very clear explanation! I've watched several videos, so far this is the best. Thank you

  • @zahrabounik3390
    @zahrabounik3390 Před rokem

    This is a fantastic explanation for SHAP. Thank you so much for sharing your knowledge.

  • @mehulsingh3497
    @mehulsingh3497 Před 3 lety +4

    Thanks for sharing ! It’s the best explanation for SHAP. You are an absolute rockstar \m/

  • @SanderJanssenBSc
    @SanderJanssenBSc Před 8 měsíci

    Such an excellent video, very high value and useful! Thanks fort taking the time out of your life to produce such value for us!

  • @yedmitry
    @yedmitry Před rokem

    Great explanation. Thank you very much!

  • @Jorvanius
    @Jorvanius Před 2 lety

    Thank you very much for the awesome explanation 👍

  • @murilopalomosebilla2999
    @murilopalomosebilla2999 Před 2 lety +1

    Excellent work!

  • @hyunkang2090
    @hyunkang2090 Před rokem

    Thank you. It was the best presentation on SHAP

  • @Sam-vi8iw
    @Sam-vi8iw Před rokem +1

    Awesome video! Love that.

  • @giuliasantai4853
    @giuliasantai4853 Před 2 lety +1

    This is just great!! Thanks a lot

  • @user-iw8dc5je9s
    @user-iw8dc5je9s Před 2 lety

    So clear! Thanks.

  • @user-si2tj6gw2l
    @user-si2tj6gw2l Před rokem

    really nice explanation, i thought understanding this concept is difficult but it's actually really easy with good explanation

  • @caiyu538
    @caiyu538 Před rokem

    great to revisit again.

  • @marcelbritsch6233
    @marcelbritsch6233 Před měsícem

    brilliant. Thank you!!!!!

  • @captainmarshalliii3304
    @captainmarshalliii3304 Před 2 lety +4

    Awesome video and explanation! Are you going to release your implementation? If so where? Thanks.

  • @joshinkumar
    @joshinkumar Před rokem

    Nicely explained.

  • @tashtanudji4756
    @tashtanudji4756 Před 2 lety

    Really helpful thanks!

  • @DrJalal90
    @DrJalal90 Před 2 lety

    Great video indeed!

  • @jamalnuman
    @jamalnuman Před 4 měsíci

    really great

  • @user-uq7ri1pz2c
    @user-uq7ri1pz2c Před 2 lety +2

    amazing

  • @arunmohan1211
    @arunmohan1211 Před 2 lety +1

    Nice. best one

  • @kjlmomjihnugbzvftcrdes

    Nice one.

  • @juanete69
    @juanete69 Před rokem +2

    Hello.
    If we apply SHAP to a linear regression model... are those Phi_i equivalent to the coefficients of the regression model? Do they also take into account the variance as the p-values do?
    How is the SHAP value for a variable different from the partial R^2?

  • @apah
    @apah Před 11 měsíci +2

    Excellent video !
    I'm wondering however, isn't the difference between SHAP delta and actual delta due to the possible interactions between the "lower status" feature and the others ?
    If i'm understanding it correctly, your computation of "actual delta" is equivalent to a permutation importance whereas SHAP takes into account the interactions through averaging the score over subsets "excluding" our feature of interest.

  • @gustavhartz6153
    @gustavhartz6153 Před rokem +1

    When you pass the data point back through the model at 10:35 which value do you replace the last feature with. You say "values from the background dataset, " but it can't just be a random value. Is it the average?

  • @JK-co3du
    @JK-co3du Před rokem

    Thank you very much for this informative video. Could you explain why we use the train set as background but test set to calculate the shap values?

    • @robgeada6618
      @robgeada6618 Před rokem +1

      Hi JK; the background simply needs to be taken from a pool of "representative" values that the model expects; in this case a subset of the data that was used to train the model makes a lot of sense for that. Meanwhile, computing Shap values for a particular point is simply done to explain how the model behaves given this particular input; there is no requirement that this input be anything similar to what the model has seen before. Basically, the background set needs to come from 'representative' data, but we can then compute Shap values from any arbitrary point. In this case, we pick a point from the test set, as in real-world XAI usecases you are explaining novel points that do not neccesarily have corresponding ground-truth values, i.e., the same reason that we use train/test splits when evaluating models.

  • @juanete69
    @juanete69 Před rokem

    What are the advantages of SHAP vs LIME (Local Interpretable Model Agnostic Explanation) and ALE (Accumulated Local Effects)?

  • @avddva1367
    @avddva1367 Před rokem +4

    Hello, I really appreciate the video! I have one question: how are the number of coalitions calculated? I thought it would be 2^(number of features)

    • @kruan2661
      @kruan2661 Před rokem +1

      It depends on whether the order of feature matter. If not then 2^k. If yes then sum up all permutations

    • @AJMJstudios
      @AJMJstudios Před rokem +1

      @@kruan2661 Still not getting 64 coalitions for 4 features even if order matters.

    • @andyrobertshaw9120
      @andyrobertshaw9120 Před rokem +1

      @@AJMJstudios you do.
      If all 4 are there, we have 4! = 24.
      If 3 are included then we have 4x3x2 = 24
      If 2 are include we have 4x3 = 12
      If just 1 is included, we have 4.
      24+24+12+4=64

  • @cleverclover7
    @cleverclover7 Před 2 lety

    Great video! I have many questions on this subject but here's one(ish):
    It strikes me that the Background sample is not irrelevant and you must assume it is sufficiently random, iid. There is at least one case - the case where the background sample is the data point being tested, where this is certainly not true. So my question is, if you were to run the experiment again for every possible data point instead of a single background chunk of size 100, and took the average of these, would you get perfect accuracy?

    • @robgeada6618
      @robgeada6618 Před 2 lety

      Yeah, so choice of background data is a really interesting question, one that I think about quite a bit! In terms of your idea, choosing every available training data point as your background does well-represent the distribution of your data, but that gets pretty expensive: SHAP will need to run num_samples * background_size datapoints through the model. For a larger dataset like those seen in ML work, that could be hundreds of millions of model evaluations. One way to get around this is use something like kmeans clustering on your training data, with k set to something like 100. The centerpoints of your clusters are then a great representation of the training data distribution, which means when you use them for SHAP you end up with very similar results to using the entire training data as background. The advantage of this is that it's a lot cheaper, in that k~100 is usually much, much smaller than the full training dataset.

  • @joshuadarville1915
    @joshuadarville1915 Před 2 lety +3

    Love the video. However, I am a bit confused about how the total numbers of coalitions is calculated. The samples at time 1:58 shows 15 coalitions for 4 features, but at time 5:55 you state we need to sample 64 coalitions for 4 features. I think the discrepancy comes for calculating coalitions using combination initially vs permutation later on. Thanks again for the video!

    • @robgeada6618
      @robgeada6618 Před 2 lety +2

      Yes, you are exactly right, it's an error in the video: 4 features should be 2^4=16 coalitions.

    • @astaragmohapatra9
      @astaragmohapatra9 Před rokem

      @Rob Geada, how is it 2 to the power number of number of features? For 4 features we can 3,2 or 1 possible combination. For each it is 3CN. It should be around 7 (3C1+3C2+3C3). So the total is 28 for four features. Am I right?

    • @AJMJstudios
      @AJMJstudios Před rokem

      @@astaragmohapatra9 It's 4C0 + 4C1 + 4C2 + 4C3 + 4C4 = 16

  • @rusmannlalana8702
    @rusmannlalana8702 Před 2 lety +2

    "TENTARA ITU HARUS HITAM"
    This video :

  • @caiyu538
    @caiyu538 Před rokem

    I studied it again. Shap is a brutal force search of features by considering all kinds of feature combinations. Shap is a trick to reduce such complexities. Is my understanding correct? How to reduce computation complexity?

  • @arunshankar4845
    @arunshankar4845 Před 6 měsíci

    How exactly did you say 4 features requires sampling 64 coalitions?

  • @xaviergonzalez5465
    @xaviergonzalez5465 Před 2 lety +1

    What does it mean for the original input x and the simplified x' to be approximately equal? Isn't x' a binary vector of features, whereas X presumably lives in Euclidean space?

    • @robgeada6618
      @robgeada6618 Před 2 lety +1

      Yeah, you're exactly right that x' is binary and x is Euclidean. In the video I'm making a bit of a simplication; in real usage the simplified x' will have some translation function h that converts the binary vector back to the original datapoint x, i.e, h(x') = x. The full definition of local accuracy states that g(x') = f(x) if h(x') = x.

  • @ea2187
    @ea2187 Před 2 lety +2

    Thanks for sharing.
    I'm currently developing a Multi-Class-Classifier (via XGB-Classification) and would like to know whether SHAP can be used in multi-class-classification-problems? During my research I could only find that SHAP can be used for classification problems which output probabilities (my model outputs three classes). Can anyone help?

    • @robgeada6618
      @robgeada6618 Před 2 lety

      I asnwered this question in a private message, but I'll post the answer here as well:
      Yes, because the XGBClassifier does indeed output probabilities (or more specifically, margins), they're just hidden by default. However, you can use these margins and probabilities to compute SHAP values, which will then indicate how much each feature contributed to the margins or probabilities.

  • @sehaconsulting
    @sehaconsulting Před 2 lety +1

    Hi,
    In the video you said for calculating coalitions that if a model has 4 features it must calculate 64 coalitions but for 32 features it is 16 billion or so. Can you explain the math behind it. In your example you had 4 features exemplified by the four dots and it only amounted to 16 coalitions didn’t it?

    • @robgeada6618
      @robgeada6618 Před 2 lety +2

      Hi, you're exactly right; as I've said elsewhere in the comments it's a mistake in the video. 4 features indeed have 16 possible coalitions, it's always 2^(number of features).

    • @sehaconsulting
      @sehaconsulting Před rokem

      @@robgeada6618 Thank you!

    • @KountayDwivedi
      @KountayDwivedi Před rokem

      @@robgeada6618 Thanks. I came to the comment section just for clarification. Btw, great video !! 😎
      :-}

  • @yuchenyue1243
    @yuchenyue1243 Před 2 lety +1

    Thanks for sharing! Can anyone explain why are there 64 coalitions to sample for 4 features? at 5:52

    • @robgeada6618
      @robgeada6618 Před 2 lety +1

      Hi, looking at it again, that's a mistake on my part. It should be 16 coalitions for 4 features, i.e:
      (4 choose 4) + (4 choose 3) + (4 choose 2) + (4 choose 1) + (4 choose 0)
      = 1 + 4 + 6 + 4 + 1
      = 16

    • @yuchenyue1243
      @yuchenyue1243 Před 2 lety

      @@robgeada6618 (4 choose 4) + (4 choose 3) + (4 choose 2) + (4 choose 1) + (4 choose 0) = 2^4, is it generally true that for n features there are 2^n coalitions to sample?

    • @robgeada6618
      @robgeada6618 Před 2 lety

      @@yuchenyue1243 Yep, exactly. One way to think about is by writing out each feature combination as a vector, with a 1 if a feature is included in the coalition and 0 if it isn't. Doing this for 4 features, you'd have something like 0000, 0001, 0010, 0011, ..., all the way to 1111. This means that enumerating every possible feature combination is the same as counting in binary from 0 to 1111. That means that for n features, the number of coalitions to sample is always equivalent to the number of integers that can be represented by n bits in binary: 2^n.

  • @user-yc2hc8xt3y
    @user-yc2hc8xt3y Před rokem

    can i have the ppt document of this presentation please ??

  • @caiyu538
    @caiyu538 Před 2 lety

    I am confused with at 9:55 where is variable test_point, it is previous x_train or y_train at 8:28?

    • @robgeada6618
      @robgeada6618 Před 2 lety +1

      Should have showed that, sorry! test_point is the first datapoint of x_test: test_point = x_test[0]

  • @chinuverma5374
    @chinuverma5374 Před 2 lety +1

    Thanks for wonderful session sir.With the help of shap we can find top features graph,correlated features graphs using PDP.but simple feature selection and ranking algo in machine learning can also give us top features used in model according to rank even we can plot graphs of correlated features like shap using feature selection algorithms in machine learning.I am confused but extra explainable model is doing here to explain the predictions.Pls clear my doubt currently I am doing research in this area.

    • @robgeada6618
      @robgeada6618 Před 2 lety +1

      So if I understand correctly, you're wondering about the explanatory model is at 4:00? Essentially, the explanatory model g(x') is what SHAP builds to produce its explanation of your actual model f(x). By passing a lot of different permutations of features through the actual model f(x), the algorithm creates a huge number of samples of inputs and outputs of your real model that it can then try and build a linear explanation model g(x') that produces the same outputs given the same inputs. Therefore, the linear explanation model should treat the features of this datapoint in the same way as the actual model would, meaning we can use it to explain the actual model's predictions. So in a way, SHAP explanations are actually explaining g(x'), but since the algorithm is designed such that if x'≈x, g(x')≈f(x), the explanations of g(x') are equally valid as explanations of f(x). Does the clear it up?

  • @pedrogallego1673
    @pedrogallego1673 Před rokem

    At 05:58 , is it possible that the number of total coalitions with 32 features is wrong? I think that it is 32*2*2³¹ = 2³⁷ (and 17.1billion =approx 2³⁴)

    • @robgeada6618
      @robgeada6618 Před rokem +1

      Hi Pedro; as I've said elsewhere in the comments I made a mistake when calculating the total coalition count; it should always be always 2^(number of features), so 32 features is 2^32 or ~4.2 billion.

    • @pedrogallego1673
      @pedrogallego1673 Před rokem

      @@robgeada6618 Thanks. However It's a really nice video!

  • @minhaoling3056
    @minhaoling3056 Před 2 lety

    Does kernel shap ignores feature dependence?

    • @diaaalmohamad2166
      @diaaalmohamad2166 Před 2 lety

      I'm also wondering there. The paper of Lundberg assumes independent features to be able to estimate the contributions. Still, the reason for having all possible coalitions is to count for mutual effect!
      On the other hand, a paper appeared las year (Explaining individual predictions when features are dependent) addresses the SHAP under dependence of features (shapr is their R package). The estimate the joint conditional distribution of the features provided the current coalition using copulas (and other methods). Still, their implementation has quite some computation limitations

    • @minhaoling3056
      @minhaoling3056 Před 2 lety

      @@diaaalmohamad2166 it seems like most explainable AI methods are quite limited for image data. Do you know any method that are implemented in R for image data ?

    • @diaaalmohamad2166
      @diaaalmohamad2166 Před 2 lety

      sorry, I do not know of R packages specific for image analysis. I tried package "iml", there you can find different methods to explain features contribution. I did not check their limitations. Worst case, you may use the python package "shap" inside Rmarkdown code chunk.

  • @Brume_
    @Brume_ Před 2 lety +1

    Hi, im writing my report.
    I have 2 questions very important to ask to you
    1) how many coalitions are selected when i compute my explainer?
    2) are the coalitions taking all of the value in background ? 6:38 y is the mean of N output if the background size in N rows?
    Thank you a lot
    Sorry for bad english im french

    • @robgeada6618
      @robgeada6618 Před 2 lety +3

      Hi Brume!
      1) The number of coalitions is typically the number of samples, usually configurable in the implementation. By default in our implementation and in the original Python one by Scott Lundberg, the default value is (2 * num_features) + 2048 coalitions unless the user specifies otherwise.
      2) Correct, the coalition value is the mean value over the N background datapoints.

    • @ron769
      @ron769 Před 2 lety

      Thanks Rob! So since that the number of coalitions is not all possible combination (NP hard), how can we assure that the Shap value are closley enough to the original shapely value?

  • @navalo2814
    @navalo2814 Před 2 lety +1

    Shap shap shap

  • @nate4511
    @nate4511 Před 2 lety

    theme black and white....

  • @blueprint5300
    @blueprint5300 Před 12 dny

    In response to the discrepancy that you call a mistake between 'SHAP delta' and 'Actual delta' in 10:57, those two values are not meant to be the same. Shapley values are the average contribution of all subsets of features. 'Actual delta' would be only one of the terms in this average. The Shapley value of feature X DOES NOT represent the difference in output that you would get when removing feature X from the model.

  • @exmanitor
    @exmanitor Před 2 lety +1

    With regards to your last point, that the "SHAP Delta" does not match the "Actual Delta": I think that you are misunderstanding what these values represent. The SHAP value of a specific feature do not represent the difference in prediction if we were to exclude/remove that feature from the model. Instead, the SHAP value of a specific feature represents the average contribution of the feature across all coalitions. This is why your "SHAP Delta" and "Actual Delta" do not match, the "Actual Delta" is just the contribution of the feature in a single coalition.
    Other than that, great video!

    • @robgeada6618
      @robgeada6618 Před 2 lety +4

      Thanks! Two quick points: first, the "actual" delta I showed there is the average model output when that specific feature is replaced with each value from the background while all other features are held fixed. It's what that feature's SHAP value would be if the background dataset only had variance in that one specific feature column and was otherwise identical to the explained datapoint. So yeah, it absolutely was not an accurate measurement of what a SHAP value is really doing mathematically.
      But second, that was deliberate: SHAP is advertised as producing explanations that are linearly independent measurements of each feature's contribution, but as our result showed, the SHAP value wasn't actually reflective of how this particular model behaved when you removed the feature. And of course, that's because of the exact reasons you pointed out, that a SHAP value is a measurement of the difference between that feature's presence and absence in every possible coalition of the background, not an indication of the effects of pure removal/exclusion.
      So in essence, that's the exact point I was trying to make: SHAP values do indeed encode all kinds of subtle information about feature dependence and all of the comparisons against the specific background dataset chosen, but were relatively inaccurate in the measuring the effect of replacing a single feature with background values. This difference is what I was trying to show but I definitely should have been clearer about: for models with a lot of feature interaction, SHAP will sacrifice single-feature effect accuracy for accurately representing all feature interations against the background, and whether that is a desirable attribute will depend on specific use-case and user preference.