Posts with the tag Deep Learning:

Artificial intelligence is not willing to be correct

As deep learning models get better at representing human language, telling whether a text was written by a human being or a deep learning model becomes harder and harder. And because language models reproduce text found online (often without attribution); the risk of considering their output as if they were written by a human changes the reading experience for the reader.

The last year has been incredible for natural (and programming) language processing. GitHub’s Copilot has been out of technical preview since June, and ChatGPT was released in November. Copilot is based on OpenAI Codex and acts as a source code generator (which raises several issues of its own). ChatGPT is a language model built for dialogue, where a user can chat with the AI, ask questions and have them answered. Both are trained with data from web scrapping, with source code for Copilot and webpages for ChatGPT. Those models work particularly well for their respective purposes, and can thus be used to generate seemingly convincing source code or prose.

The deep learning obesity crisis

Deep learning have made dramatic improvements over the last decades. Part of this is attributed to improved methods that allowed training wider and deeper neural networks. This can also be attributed to better hardware, as well as the development of techniques to use this hardware efficiently. All of this leads to neural networks that grow exponentially in size. But is continuing down this path the best avenue for success?

How the Integrated Gradients method works?

For artificial intelligence (AI) transparency and to better shape upcoming policies, we need to better understand the AI’s output. In particular, one may want to understand the role attributed to each input. This is hard, because in neural networks input variables don’t have a single weight that could serve as a proxy for determining their importance with regard to the output. Therefore, one have to consider all the neural network’s weights, which may be all interconnected. Here is how Integrated Gradients does this.

What does a transformer?

Transformers are giant robots coming from Cybertron. There are two Transformer tribes: the Autobots and the Decepticons. They have been fighting each other over the Allspark, a mythical artifact capable of building worlds and mechanical beings. Well, there is also another kind of Transformers, but those are not about warfare. However they are pretty good at language understanding. Let’s see how!

A PyTorch RNN with variable sequence lengths

A Recurrent Neural Network (RNN) often uses ordered sequences as inputs. Real-world sequences have different lengths, especially in Natural Language Processing (NLP) because all words don’t have the same number of characters and all sentences don’t have the same number of words. In PyTorch, the inputs of a neural network are often managed by a DataLoader. A DataLoader groups the input in batches. This is better for training a neural network because it’s faster and more efficient than sending the inputs one by one to the neural network. The issue with this approach is that it assumes every input has the same shape. As stated before, sequences don’t have a consistent shape, so how one can train a RNN in PyTorch with variable-length sequences and still benefit from the DataLoader class?

Classifying Names With a Character Level RNN (GRU-Powered)

Wanting to brush up my PyTorch skills, I’ve started to follow this tutorial. It explains how to create a deep learning model able to predict the origin of a name. At the end of the tutorial, there’s an invitation to try to improve the model. Which I did. Note that the point of the tutorial is not to create the most performant model but rather to demonstrate and explain PyTorch’s capabilities. Here’s a comparison between the model described in the tutorial and the one I’ve built.

Text representations for Machine Learning and Deep Learning

Despite what the bad media are saying, computers haven’t understood human language (yet). We need to turn sentences and words into a format that can be effectively manipulated by a Machine Learning or Deep Learning algorithm. This is called language modeling. Here I will explain several methods that can turn words into a meaningful representation.

Integer encoding

This approach is the simplest. Once we have a list of the tokens composing the vocabulary, we associate each one with an integer. For example, if the vocabulary is “Roses, are, red, Violets, blue”, we can create a mapping: Roses : 0, are: 1, red: 2, Violets: 3, blue: 4.

How to install cuda 10.0, cudnn 7.4, Tensorflow, PyTorch on Fedora 29

This procedure has been tested on Fedora 29, on a HP laptop with this graphical card: NVIDIA Corporation GP107M GeForce GTX 1050 Mobile (rev a1)

The commands have to be run as the root user. This tutorial assumes the nvidia driver is already working.

Install pip

dnf install python3-pip

Install Cuda 10.0

Download the installer from the Nvidia website and run it. Make sure to install the Perl module Term::ReadLine::Gnu beforehand because the cuda installer relies on it.

Stochastic Gradient Descent and its variants

Stochastic Gradient Descent (SGD) is used in many Deep Learning models as an algorithm to optimize the parameters (the weights of each layer). Here is how it works:

At each step in the training process, the goal is to update the weights towards the optimal value. For this, SGD uses the equation:

On Deep Learning and Free Software

As Deep learning is becoming more and more popular, there is an ongoing debate on whether it’s possible to create Deep Learning applications with a Free Software license. See for example this discussion on the debian-devel mailing list.

The argument we often see is that:

  • It’s impossible to study the inner workings of a Deep Learning software (for example, an image classifier or a text generator) or improve it, because one cannot understand how it’s going to make predictions only by looking at the weights of the Deep Learning model
  • Training a Deep Learning model requires a specialized and expensive hardware that runs non-Free software

But the first statement misses the point of Deep Learning programs. We should not treat deep learning programs as the “regular” ones. A regular program contains a set of tasks the computer has to do. The human has the knowledge of how the tasks that should be completed. But this is not true for Deep Learning. The software is not the set of actions that solve the problem, it is the set of instructions used to learn how to solve it. So the Deep Learning program is not the knowledge (the weights) used to perform the mission, it’s how to guide computers to that knowledge. In a way, this is similar to the compilation of a large program to assembly. The compilation output is hardly readable and editable, but the program can easily be studied and analyzed. The same goes for Deep Learning if we consider the model weights as the compilation output. They are not meant to be edited by hand.