A PyTorch RNN with variable sequence lengths

A Recurrent Neural Network (RNN) often uses ordered sequences as inputs. Real-world sequences have different lengths, especially in Natural Language Processing (NLP) because all words don’t have the same number of characters and all sentences don’t have the same number of words. In PyTorch, the inputs of a neural network are often managed by a DataLoader. A DataLoader groups the input in batches. This is better for training a neural network because it’s faster and more efficient than sending the inputs one by one to the neural network.
Read More

Algorithms fairness

Software being more and more used to get metrics and insights for critical areas of our societies such as our healthcare system, crime recidivism risk assessment, job application review or loan approval, the question of algorithms fairness is becoming more important than ever. As algorithms learn from human-generated data, they often magnify human bias in decision making, making them prone to judging something in an unfair way. For example, the Amazon CV review program was found to be unfair to women.
Read More

Classifying Names With a Character Level RNN (GRU-Powered)

Wanting to brush up my PyTorch skills, I’ve started to follow this tutorial. It explains how to create a deep learning model able to predict the origin of a name. At the end of the tutorial, there’s an invitation to try to improve the model. Which I did. Note that the point of the tutorial is not to create the most performant model but rather to demonstrate and explain PyTorch’s capabilities.
Read More

Why is online privacy understated

There’s a lot of guides explaining how to protect your online privacy, but none of them tell why they exist in the first place. They exist because privacy is understated. We don’t value it enough. Here are the reasons. Threats to privacy are not obvious Despite recent attempts to regulate online data processing (e.g the GDPR in the EU) as well as privacy breaches, it’s still not clear why all of that threatens privacy.
Read More

Text representations for Machine Learning and Deep Learning

Despite what the bad media are saying, computers haven’t understood human language (yet). We need to turn sentences and words into a format that can be effectively manipulated by a Machine Learning or Deep Learning algorithm. This is called language modeling. Here I will explain several methods that can turn words into a meaningful representation. Integer encoding This approach is the simplest. Once we have a list of the tokens composing the vocabulary, we associate each one with an integer.
Read More

How to install cuda 10.0, cudnn 7.4, Tensorflow, PyTorch on Fedora 29

This procedure has been tested on Fedora 29, on a HP laptop with this graphical card: NVIDIA Corporation GP107M GeForce GTX 1050 Mobile (rev a1) The commands have to be run as the root user. This tutorial assumes the nvidia driver is already working. Install pip dnf install python3-pip Install Cuda 10.0 Download the installer from the Nvidia website and run it. Make sure to install the Perl module Term::ReadLine::Gnu beforehand because the cuda installer relies on it.
Read More