Posts with the tag Deep Learning:

Artificial intelligence is not willing to be correct

As deep learning models get better at representing human language, telling whether a text was written by a human being or a deep learning model becomes harder and harder. And because language models reproduce text found online (often without attribution); the risk of considering their output as if they were written by a human changes the reading experience for the reader. The last year has been incredible for natural (and programming) language processing.

The deep learning obesity crisis

Deep learning have made dramatic improvements over the last decades. Part of this is attributed to improved methods that allowed training wider and deeper neural networks. This can also be attributed to better hardware, as well as the development of techniques to use this hardware efficiently. All of this leads to neural networks that grow exponentially in size. But is continuing down this path the best avenue for success?

How the Integrated Gradients method works?

For artificial intelligence (AI) transparency and to better shape upcoming policies, we need to better understand the AI’s output. In particular, one may want to understand the role attributed to each input. This is hard, because in neural networks input variables don’t have a single weight that could serve as a proxy for determining their importance with regard to the output. Therefore, one have to consider all the neural network’s weights, which may be all interconnected. Here is how Integrated Gradients does this.

What does a transformer?

Transformers are giant robots coming from Cybertron. There are two Transformer tribes: the Autobots and the Decepticons. They have been fighting each other over the Allspark, a mythical artifact capable of building worlds and mechanical beings. Well, there is also another kind of Transformers, but those are not about warfare. However they are pretty good at language understanding. Let’s see how!

A PyTorch RNN with variable sequence lengths

A Recurrent Neural Network (RNN) often uses ordered sequences as inputs. Real-world sequences have different lengths, especially in Natural Language Processing (NLP) because all words don’t have the same number of characters and all sentences don’t have the same number of words. In PyTorch, the inputs of a neural network are often managed by a DataLoader. A DataLoader groups the input in batches. This is better for training a neural network because it’s faster and more efficient than sending the inputs one by one to the neural network.

Classifying Names With a Character Level RNN (GRU-Powered)

Wanting to brush up my PyTorch skills, I’ve started to follow this tutorial. It explains how to create a deep learning model able to predict the origin of a name. At the end of the tutorial, there’s an invitation to try to improve the model. Which I did. Note that the point of the tutorial is not to create the most performant model but rather to demonstrate and explain PyTorch’s capabilities.

Text representations for Machine Learning and Deep Learning

Despite what the bad media are saying, computers haven’t understood human language (yet). We need to turn sentences and words into a format that can be effectively manipulated by a Machine Learning or Deep Learning algorithm. This is called language modeling. Here I will explain several methods that can turn words into a meaningful representation. Integer encoding This approach is the simplest. Once we have a list of the tokens composing the vocabulary, we associate each one with an integer.

How to install cuda 10.0, cudnn 7.4, Tensorflow, PyTorch on Fedora 29

This procedure has been tested on Fedora 29, on a HP laptop with this graphical card: NVIDIA Corporation GP107M GeForce GTX 1050 Mobile (rev a1) The commands have to be run as the root user. This tutorial assumes the nvidia driver is already working. Install pip dnf install python3-pip Install Cuda 10.0 Download the installer from the Nvidia website and run it. Make sure to install the Perl module Term::ReadLine::Gnu beforehand because the cuda installer relies on it.

Stochastic Gradient Descent and its variants

Stochastic Gradient Descent (SGD) is used in many Deep Learning models as an algorithm to optimize the parameters (the weights of each layer). Here is how it works:

At each step in the training process, the goal is to update the weights towards the optimal value. For this, SGD uses the equation:

On Deep Learning and Free Software

As Deep learning is becoming more and more popular, there is an ongoing debate on whether it’s possible to create Deep Learning applications with a Free Software license. See for example this discussion on the debian-devel mailing list. The argument we often see is that: It’s impossible to study the inner workings of a Deep Learning software (for example, an image classifier or a text generator) or improve it, because one cannot understand how it’s going to make predictions only by looking at the weights of the Deep Learning model Training a Deep Learning model requires a specialized and expensive hardware that runs non-Free software But the first statement misses the point of Deep Learning programs.