Deep Learning

Tackling AI obesity with LoRA

2025-09-02 4 minutes read AI Deep Learning

As deep learning models grow, trying to get a finer and finer grasp of reality, the number of parameters composing them increased, making training more and more expensive. Here we delve into Low-Rank Adaptation LoRA, a method aiming to reduce the dimensionality of the training space within deep learning models during fine-tuning.

Artificial intelligence is not willing to be correct

2023-01-29 3 minutes read AI Deep learning

As deep learning models get better at representing human language, telling whether a text was written by a human being or a deep learning model becomes harder and harder. And because language models reproduce text found online (often without attribution); the risk of considering their output as if they were written by a human changes the reading experience for the reader.

The deep learning obesity crisis

2022-07-03 6 minutes read AI Deep learning

Deep learning have made dramatic improvements over the last decades. Part of this is attributed to improved methods that allowed training wider and deeper neural networks. This can also be attributed to better hardware, as well as the development of techniques to use this hardware efficiently. All of this leads to neural networks that grow exponentially in size. But is continuing down this path the best avenue for success?

How the Integrated Gradients method works?

2021-10-15 4 minutes read AI Deep learning

For artificial intelligence (AI) transparency and to better shape upcoming policies, we need to better understand the AI’s output. In particular, one may want to understand the role attributed to each input. This is hard, because in neural networks input variables don’t have a single weight that could serve as a proxy for determining their importance with regard to the output. Therefore, one have to consider all the neural network’s weights, which may be all interconnected. Here is how Integrated Gradients does this.

What does a transformer?

2020-11-03 4 minutes read Deep Learning AI

Transformers are giant robots coming from Cybertron. There are two Transformer tribes: the Autobots and the Decepticons. They have been fighting each other over the Allspark, a mythical artifact capable of building worlds and mechanical beings. Well, there is also another kind of Transformers, but those are not about warfare. However they are pretty good at language understanding. Let’s see how!

A PyTorch RNN with variable sequence lengths

2019-11-23 7 minutes read Deep Learning

A Recurrent Neural Network (RNN) often uses ordered sequences as inputs. Real-world sequences have different lengths, especially in Natural Language Processing (NLP) because all words don’t have the same number of characters and all sentences don’t have the same number of words. In PyTorch, the inputs of a neural network are often managed by a DataLoader. A DataLoader groups the input in batches. This is better for training a neural network because it’s faster and more efficient than sending the inputs one by one to the neural network. The issue with this approach is that it assumes every input has the same shape. As stated before, sequences don’t have a consistent shape, so how one can train a RNN in PyTorch with variable-length sequences and still benefit from the DataLoader class?

Classifying Names With a Character Level RNN (GRU-Powered)

2019-06-12 5 minutes read Deep Learning

Wanting to brush up my PyTorch skills, I’ve started to follow this tutorial. It explains how to create a deep learning model able to predict the origin of a name. At the end of the tutorial, there’s an invitation to try to improve the model. Which I did. Note that the point of the tutorial is not to create the most performant model but rather to demonstrate and explain PyTorch’s capabilities. Here’s a comparison between the model described in the tutorial and the one I’ve built.

Text representations for Machine Learning and Deep Learning

2019-04-01 6 minutes read Deep Learning

Despite what the bad media are saying, computers haven’t understood human language (yet). We need to turn sentences and words into a format that can be effectively manipulated by a Machine Learning or Deep Learning algorithm. This is called language modeling. Here I will explain several methods that can turn words into a meaningful representation.

Integer encoding

This approach is the simplest. Once we have a list of the tokens composing the vocabulary, we associate each one with an integer. For example, if the vocabulary is “Roses, are, red, Violets, blue”, we can create a mapping: Roses : 0, are: 1, red: 2, Violets: 3, blue: 4.

How to install cuda 10.0, cudnn 7.4, Tensorflow, PyTorch on Fedora 29

2019-02-23 2 minutes read Deep Learning

This procedure has been tested on Fedora 29, on a HP laptop with this graphical card: NVIDIA Corporation GP107M GeForce GTX 1050 Mobile (rev a1)

The commands have to be run as the root user. This tutorial assumes the nvidia driver is already working.

Install pip

dnf install python3-pip

Install Cuda 10.0

Download the installer from the Nvidia website and run it. Make sure to install the Perl module Term::ReadLine::Gnu beforehand because the cuda installer relies on it.

Stochastic Gradient Descent and its variants

2018-11-13 3 minutes read Deep Learning

Stochastic Gradient Descent (SGD) is used in many Deep Learning models as an algorithm to optimize the parameters (the weights of each layer). Here is how it works:

At each step in the training process, the goal is to update the weights towards the optimal value. For this, SGD uses the equation: