AI transparency: how the Local Interpretable Model-agnostic Explanation Framework works?

As Artificial Intelligence is solving increasingly hard problems, it’s becoming more and more complex. This complexity leads to an often overlooked issue: the lack of transparency. This is problematic, because by taking answers at face value from an uninterpretable model (a black box), we’re trading accuracy for transparency. This is bad for a couple of reasons:

Debugging. While it may be possible to figure out what’s wrong with a car just by hearing it squealing and whirring, opening up the engine lid and inspecting everything is way more efficient. The same goes for Artificial Intelligence debugging. How one can understand what’s wrong without being able to examine the model? Of course, it’s always possible to make assumptions or use a brute force approach, but it’s not elegant in our civilized age™.
Trust. If one cannot understand why a decision was made, how can it be trusted? Is the accuracy of a model enough of a justification to accept its answers. In general, can a model be deemed legitimate solely on its reported performances? This depends on the stakes at play. But if we are using AI to solve real-world problems, knowing how the input data was used adds a lot of confidence.

Because of the huge number of parameters in Machine Learning and Deep Learning and the way they are connected together, it’s complicated to summarize their behaviors. But we can always approximate them.

LIME #

Local Interpretable Model-agnostic Explanations (available as a Python library) is a framework which identifies the parts of the input that contributed the most to the output. While it doesn’t allow you to inspect all the parameters and their interactions, it’s sill useful and provide a solution to the problems outlined above. It has two compelling advantages. Firstly, as its name states, it’s model-agnostic. It doesn’t make any assumption about the Machine Learning or Deep Learning model, meaning that, to some extent, it will work with models that are yet to be created. Secondly, it works for a wide range of data. LIME provides explanations for text, images and tabular data (continuous or categorical).

How does it work? #

According to the published paper, to find how a data point was used by a complex Deep Learning or Machine Learning model, LIME trains a simple model (which is easy to interpret) that emulates the complex one. LIME samples data points from the original dataset. The samples are close to the input point for which we want explanations according to a given distance measure.

The goal of drawing samples from the original dataset is to simplify the emulation: it’s much easier to emulate a complex model on a restricted input space than on the full dataset. The distance measure allows to measure the similarity between the input point we want explanations for and the samples drawn from the dataset. This distance gives a weight to the samples allowing the simpler model to focus on emulating the complex one for inputs similar to the one we’re interested in.

LIME finds the best simplified model by minimizing:

$$\mathcal{L}(f, g, \Pi_x) + \Omega(g)$$

Where:

$f$ is the complex model already trained
$g$ is the simplified model
$\Pi_x$ is a distance measure from $x$
$\Omega$ is a complexity measure
$\mathcal{L}$ tells how similar $g$ is to $f$ for input data sampled with $\Pi$

This means samples from the original dataset are fed to both $f$ and $g$, the outputs are compared and then weighted by the distance function $\Pi_x$. To make $g$ even easier to interpret, a simplified representation of the input is used for it (but not for $f$).

In simple cases, $\mathcal{L}$ can be:

$$\mathcal{L} = \sum_{z, z’} \Pi_x(z)(f(x) - g(z’))^2$$

Where:

$z$ is a input point from the dataset samples
$z’$ is the simplified version of $z$

Once the best simplified model is selected (the one that best approximates the complex, uninterpretable model), the original data point is fed through it and the parameters of the model that contributed the most to the result are used to provide an explanation.

To sum up, given a trained model $f$, a dataset $D$ and a data point $x$, LIME roughly works as follows:

Samples $z$ points from $D$ which are close to $x$, measure the distances from $x$ to $z$
Find the model $g$ which behaves similarly to $f$ for the points $z$. $z$ is simplified to $z’$ before being seen by $g$
Analyze the behavior of $g$ on $x$ to give explanations

AI transparency: how the Local Interpretable Model-agnostic Explanation Framework works?

LIME #

How does it work? #

Search

Pages

Tags

Links