AI transparency: how the Local Interpretable Model-agnostic Explanation Framework works?

As Artificial Intelligence is solving increasingly hard problems, it’s becoming more and more complex. This complexity leads to an often overlooked issue: the lack of transparency. This is problematic, because by taking answers at face value from an uninterpretable model (a black box), we’re trading accuracy for transparency. This is bad for a couple of reasons:

Read More

A PyTorch RNN with variable sequence lengths

A Recurrent Neural Network (RNN) often uses ordered sequences as inputs. Real-world sequences have different lengths, especially in Natural Language Processing (NLP) because all words don’t have the same number of characters and all sentences don’t have the same number of words. In PyTorch, the inputs of a neural network are often managed by a DataLoader. A DataLoader groups the input in batches. This is better for training a neural network because it’s faster and more efficient than sending the inputs one by one to the neural network. The issue with this approach is that it assumes every input has the same shape. As stated before, sequences don’t have a consistent shape, so how one can train a RNN in PyTorch with variable-length sequences and still benefit from the DataLoader class?

Read More

Algorithms fairness

Software being more and more used to get metrics and insights for critical areas of our societies such as our healthcare system, crime recidivism risk assessment, job application review or loan approval, the question of algorithms fairness is becoming more important than ever. As algorithms learn from human-generated data, they often magnify human bias in decision making, making them prone to judging something in an unfair way. For example, the Amazon CV review program was found to be unfair to women. Because the program learned from already reviewed resumes (with unbalanced genders), it learned to dislike resume of women.

Read More

Classifying Names With a Character Level RNN (GRU-Powered)

Wanting to brush up my PyTorch skills, I’ve started to follow this tutorial. It explains how to create a deep learning model able to predict the origin of a name. At the end of the tutorial, there’s an invitation to try to improve the model. Which I did. Note that the point of the tutorial is not to create the most performant model but rather to demonstrate and explain PyTorch’s capabilities. Here’s a comparison between the model described in the tutorial and the one I’ve built.

Read More

Why is online privacy understated

There’s a lot of guides explaining how to protect your online privacy, but none of them tell why they exist in the first place. They exist because privacy is understated. We don’t value it enough. Here are the reasons.

Threats to privacy are not obvious

Despite recent attempts to regulate online data processing (e.g the GDPR in the EU) as well as privacy breaches, it’s still not clear why all of that threatens privacy. By collecting data on individuals, corporations can have an accurate description of who you are, For example, in this study, researchers show that the profiling of Facebook (based on likes) is a more accurate representation of people’s personalities than their friends view. On top of that, they also explain that this digital profiling is very powerful. It does a great job at predicting offline data: political opinions, health status, and much more. This is one reason why online privacy should not be overlooked. By tracking you and gathering always more data, private companies can draw a model that represents you and gives them a lot of power.

Read More

Text representations for Machine Learning and Deep Learning

Despite what the bad media are saying, computers haven’t understood human language (yet). We need to turn sentences and words into a format that can be effectively manipulated by a Machine Learning or Deep Learning algorithm. This is called language modeling. Here I will explain several methods that can turn words into a meaningful representation.

Integer encoding

This approach is the simplest. Once we have a list of the tokens composing the vocabulary, we associate each one with an integer. For example, if the vocabulary is “Roses, are, red, Violets, blue”, we can create a mapping: Roses : 0, are: 1, red: 2, Violets: 3, blue: 4.

Read More