Interpretable and explainable models

Alex Reinhart – Updated November 6, 2019 notebooks · refsmmat.com

When discussing data ethics and how algorithms should be used to make decisions about people, one criterion that often comes up is that models should be “interpretable” or “explainable.” An inscrutable algorithm that produces accurate predictions but whose predictions cannot be examined or explained leaves no avenue for due process; an algorithm that gives predictions based on clear rules can be challenged when one of the rules is obviously wrong or biased.

But how do we produce explainable models, and is explainabaility enough?

See also Privacy and surveillance for more on the power dynamics of possessing and using data, and Machine learning and law on legal implications.

Andrew D. Selbst and Solon Barocas, “The Intuitive Appeal of Explainable Machines”, Fordham Law Review (2018). https://ssrn.com/abstract=3126971

Argues that calls for explainable decisions are not quite enough. Decision-making models can be inscrutable, meaning they are too complex to be easily understood, but even scrutable models can be non-intuitive: they can pick out relationships we cannot explain and which are not obviously connected to the outcome measure. We can require explanations of individual decisions, but inscrutable models are difficult to explain and non-intuitive explanations are difficult to understand; further, if the goal is to detect disparate outcomes or bias, we need to see the whole method, not just individual decisions. Advocates instead for documentation of model-building decisions so the construction of the model can be justified as well as its decisions, and which takes into account the purposes for which the model is used as well as the ways it makes decisions. (A perfectly scrutable and intuitive model can be used for hidden nefarious purposes.)
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1, 206–215. doi:10.1038/s42256-019-0048-x

Argues that
- fancy black-box models are rarely actually necessary, since simpler explainable models often perform just as well once the data is suitably well-understood and possibly pre-processed
- post-hoc attempts to explain black boxes by fitting models to their output (a real direction in explainable model research!) is doomed to failure, since the fitted models are always going to be inaccurate in various ways and will summarize the output, not how the output is produced
- current incentives are for black boxes, since it’s hard to make money off a simple interpretable model.
Discusses some examples where interpretability was very useful (such as a group who “noticed that their neural network was picking up on the word ‘portable’ within an x-ray image, representing the type of x-ray equipment rather than the medical content of the image”). Reviews some recent work on algorithms to produce high-performance interpretable models, such as to produce optimal rule lists or sparse points-based scoring systems. These problems are still quite hard to solve, primarily because the optimization problems are intractable, but there is some progress.