As Artificial Intelligence is solving increasingly hard problems, it’s becoming more and more complex. This complexity leads to an often overlooked issue: the lack of transparency. This is problematic, because by taking answers at face value from an uninterpretable model (a black box), we’re trading accuracy for transparency. This is bad for a couple of reasons:

Because of the huge number of parameters in Machine Learning and Deep Learning and the way they are connected together, it’s complicated to summarize their behaviors. But we can always approximate them.


Local Interpretable Model-agnostic Explanations (available as a Python library) is a framework which identifies the parts of the input that contributed the most to the output. While it doesn’t allow you to inspect all the parameters and their interactions, it’s sill useful and provide a solution to the problems outlined above. It has two compelling advantages. Firstly, as its name states, it’s model-agnostic. It doesn’t make any assumption about the Machine Learning or Deep Learning model, meaning that, to some extent, it will work with models that are yet to be created. Secondly, it works for a wide range of data. LIME provides explanations for text, images and tabular data (continuous or categorical).

How does it work?

According to the published paper, to find how a data point was used by a complex Deep Learning or Machine Learning model, LIME trains a simple model (which is easy to interpret) that emulates the complex one. LIME samples data points from the original dataset. The samples are close to the input point for which we want explanations according to a given distance measure.

The goal of drawing samples from the original dataset is to simplify the emulation: it’s much easier to emulate a complex model on a restricted input space than on the full dataset. The distance measure allows to measure the similarity between the input point we want explanations for and the samples drawn from the dataset. This distance gives a weight to the samples allowing the simpler model to focus on emulating the complex one for inputs similar to the one we’re interested in.

LIME finds the best simplified model by minimizing:

$$\mathcal{L}(f, g, \Pi_x) + \Omega(g)$$


This means samples from the original dataset are fed to both $f$ and $g$, the outputs are compared and then weighted by the distance function $\Pi_x$. To make $g$ even easier to interpret, a simplified representation of the input is used for it (but not for $f$).

In simple cases, $\mathcal{L}$ can be:

$$\mathcal{L} = \sum_{z, z’} \Pi_x(z)(f(x) - g(z’))^2$$


Once the best simplified model is selected (the one that best approximates the complex, uninterpretable model), the original data point is fed through it and the parameters of the model that contributed the most to the result are used to provide an explanation.

To sum up, given a trained model $f$, a dataset $D$ and a data point $x$, LIME roughly works as follows:

  1. Samples $z$ points from $D$ which are close to $x$, measure the distances from $x$ to $z$
  2. Find the model $g$ which behaves similarly to $f$ for the points $z$. $z$ is simplified to $z'$ before being seen by $g$
  3. Analyze the behavior of $g$ on $x$ to give explanations