MIT PhD Thesis Defense: Techniques for Interpretability and Transparency of Black-Box Models

Sdílet
Vložit
  • čas přidán 1. 08. 2024
  • Abstract: Recently, black-box models, such as neural networks, have been increasingly adopted in many tasks. However, their opacity, or the inability to understand their inner-workings, has hindered the deployment in high-stakes domains such as healthcare or finance. In this talk, I describe my research in interpretability and transparency to address this issue.
    In the interpretability category, I introduce two fundamental properties of good explanations for model predictions, correctness and understandability. Correctness captures the notion that the explanations should faithfully represent the model’s decision making logic, and understandability reflects the requirement that these explanations should be reliably understood by human users. For both properties, I propose evaluation metrics as well as methods that improve upon existing ones, while identifying avenues for future work.
    In the transparency category, I present the transparency-by-example framework, a Bayesian sampling formulation to inspect models and identify a wide range of model behaviors. I demonstrate the flexibility of this Bayesian approach by applying it to both deep neural networks and non-differentiable robot controllers, revealing hidden and hard-to-find insights in both cases.
    00:00 Opening remarks
    03:12 Introduction and background
    12:23 Definition-evaluation duality
    18:45 Explanation correctness
    27:21 Explanation understandability
    35:26 Transparency-by-example
    40:52 Conclusion and outlook
    42:49 Acknowledgment
    44:51 Q&A

Komentáře • 4