When discussing data ethics and how algorithms should be used to make decisions about people, one criterion that often comes up is that models should be “interpretable” or “explainable.” An inscrutable algorithm that produces accurate predictions but whose predictions cannot be examined or explained leaves no avenue for due process; an algorithm that gives predictions based on clear rules can be challenged when one of the rules is obviously wrong or biased.
But how do we produce explainable models, and is explainabaility enough?
See also Privacy and surveillance for more on the power dynamics of possessing and using data, and Machine learning and law on legal implications.
Andrew D. Selbst and Solon Barocas, “The Intuitive Appeal of Explainable Machines”, Fordham Law Review (2018). https://ssrn.com/abstract=3126971
Argues that calls for explainable decisions are not quite enough. Decision-making models can be inscrutable, meaning they are too complex to be easily understood, but even scrutable models can be non-intuitive: they can pick out relationships we cannot explain and which are not obviously connected to the outcome measure. We can require explanations of individual decisions, but inscrutable models are difficult to explain and non-intuitive explanations are difficult to understand; further, if the goal is to detect disparate outcomes or bias, we need to see the whole method, not just individual decisions. Advocates instead for documentation of model-building decisions so the construction of the model can be justified as well as its decisions, and which takes into account the purposes for which the model is used as well as the ways it makes decisions. (A perfectly scrutable and intuitive model can be used for hidden nefarious purposes.)
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1, 206–215. doi:10.1038/s42256-019-0048-x
Argues that
Discusses some examples where interpretability was very useful (such as a group who “noticed that their neural network was picking up on the word ‘portable’ within an x-ray image, representing the type of x-ray equipment rather than the medical content of the image”). Reviews some recent work on algorithms to produce high-performance interpretable models, such as to produce optimal rule lists or sparse points-based scoring systems. These problems are still quite hard to solve, primarily because the optimization problems are intractable, but there is some progress.