# Deep Learning

• A Recurrent Neural Network (RNN) often uses ordered sequences as inputs. Real-world sequences have different lengths, especially in Natural Language Processing (NLP) because all words don’t have the same number of characters and all sentences don’t have the same number of words. In PyTorch, the inputs of a neural network are often managed by a DataLoader. A DataLoader groups the input in batches. This is better for training a neural network because it’s faster and more efficient than sending the inputs one by one to the neural network.

• Wanting to brush up my PyTorchskills, I’ve started to follow thistutorial. It explains how to create a deep learning model able to predict the origin of a name. At the end of the tutorial, there’s an invitation to try to improve the model. Which I did. Note that the point of the tutorial is not to create the most performant model but rather to demonstrate and explain PyTorch’s capabilities. Here’s a comparison between the model described in the tutorial and the one I’ve built.

• Despite what the bad media are saying, computers haven’t understood human language (yet). We need to turn sentences and words into a format that can be effectively manipulated by a Machine Learning or Deep Learning algorithm. This is called language modeling. Here I will explain several methods that can turn words into a meaningful representation. Integer encoding This approach is the simplest. Once we have a list of the tokens composing the vocabulary, we associate each one with an integer.

• This procedure has been tested on Fedora 29, on a HP laptop with this graphical card: NVIDIA Corporation GP107M GeForce GTX 1050 Mobile (rev a1) The commands have to be run as the root user. This tutorial assumes the nvidia driver is already working. Install pip dnf install python3-pip Install Cuda 10.0 Download the installer from the Nvidia website and run it. Make sure to install the Perl module Term::ReadLine::Gnu beforehand because the cuda installer relies on it.

• Stochastic Gradient Descent (SGD) is used in many Deep Learning models as an algorithm to optimize the parameters (the weights if each layer). Here is how it works: At each step in the training process, the goal is to update the weights towards the optimal value. For this, SGD uses the equation: $$new\;estimate = current\;estimate - (\nabla \times learning\;rate)$$ In this equation, the gradient ∇ indicates the direction towards the solution, (above or below it) and how far we are from it.

• As Deep learning is becoming more and more popular, there is an ongoing debate on whether it’s possible to create Deep Learning applications with a Free Software license. See for example this discussion on the debian-devel mailing list. The argument we often see is that: It’s impossible to study the inner workings of a Deep Learning software (for example, an image classifier or a text generator) or improve it, because one cannot understand how it’s going to make predictions only by looking at the weights of the Deep Learning model Training a Deep Learning model requires a specialized and expensive hardware that runs non-Free software But the first statement misses the point of Deep Learning programs.

