Updated: 03 September 2023
Deep Learning Fundamentals
Introduction to Deep Learning
What is a Neural Network
For more look at: Micheal Nielson and Andrew Ng
A NN’s main function is to receive an input, do some calculations, and based on that solve some sort of problem
NN’s can be done for a few different things, such as classification
It is made up of a series of classifiers layered after one another, this makes use of an Input Layer, an Output Layer, and a few Hidden Layers
The process of going from Input to Output is known as Forward Propogation
Neural Nets are also known as Multi Layer Perceptrons, each node is not necessarily a perceptron but may be a slightly more complex node
NN’s make use of weights and biases to place different importance of inputs. We train a network by comparing the predicted output to the actual output and modify weights and biases in order to train and become more accurate
Why Deep Learning?
NN’s are exceptionally good at finding complex patterns as well as enabling us to train them by making use of GPUs
If the data has many different inputs NN’s tend to become better than other classifiers
When we have many features and combinations we need a Deep Net in order to properly classify the data due to the complexity in patterns
Deep nets break down complex patterns into many simpler patterns and combine these
The problem is that Deep Nets take very long to train, we can however use high speed GPU’s to train NN’s faster
Different Deep Nets
- Unlabelled Data
- RBM
- Autoencoder
- Labelled
- Text Processing
- RNTN
- Recurrent Net
- Image Recognition
- DBN
- Convolutional Net
- Object Recognition
- Convolutional Net
- RNTN
- Speech Recognition
- Recurrent Net
- Text Processing
- General
- Classification
- MLP
- RELU
- Time Series
- Recurrent Net
- Classification
The Vanishing Gradient
Deep Nets have been around for a long time, but they are really difficult to train with back propogation due to a problem known as the Vanishing Gradient
The gradient is the rate that the cost will change given a change in weights or biases
When a gradient is large, the net will train faster than when the gradient is small
Early Layers have the smallest gradient but are the most important part as these being poorly trained can result in the later layers being affected
Due to the way the bias multiplication works, Back propogation performs poorly due to the fact that the biases keep getting smaller towards the later layers thus leadning to a smaller and smaller gradient
Deep Learning Models
The major breakthrough came after three papers by Hinton, Lecun and Bengio in 2006 and 2007
Restricted Boltzmann Machines
The RBM is a shallow, two layer net. Each node is connected to each previous layer, but not to any node on their layer
An RBM is trained to reconstruct the input data through a series of forward and backward passes
RBM’s make use of KL Divergence to train them
RBM Data does not need to be labelled. An RBM makes decisions about what features are important and how they should be combined. RBM is part of a family of NN’s known as Autoencoders which are able to extract features
Deep Belief Nets
A DBN is a combination of RBM’s. A DBN is identical to an MLP but is trained in a different way which is the differentiating factor
Every set of two layers is used trained as an RBM and tunes the entire model simultaneously
To finish the training process we take a small set of labelled samples which will slightly affect the biases in the net but increase the accuracy
Convolutional Nets
The CNN has dominated the Image Recognition space. CNN’s were developed by Yann Lecun
For more detailed information look at Andrej Karpathy’s CS231 Notes
A CNN consists of many components
The first component is the Convolutional Layer, this is used to identify a specific pattern such as an edge, this creates a filter. We use multiple simultaneous filters to look for different patterns
The net uses Convolution to search for a specific pattern
In the Convolutoion layer the neurons does convlution. Each neuron is only connected to some input neurons, not all
The next two layers are RELU and Pooling. CNN’s combine multiple Convolutional Layers, RELU and Pooling layers. The Pooling layers help to reduce the complexity between layers
At the end there is a Fully Connected net which helps to classify the output data from the Pooling Layer
CNN’s are supervised models which mean they require a lot of labelled data which can be difficult to come across
Recurrent Nets
Jurgen Schmidhuber, Sepp Horchreiter and Alex Graves
These can be applied to anything from speech recognition to driverless cars
These networks have a feedback loop in which the output is fed back into the input layer
A recurrent net can receive a sequence and output a sequence
RNN’s can be stacked to perform more complex operations
RNN’s are dificult to train and result in an extreme vanishing gradient
There are multiple solutions to this problem, the most popular is to use Gating units like LSTM and GRU
Gating helps the net figure out when to remember and forget a specific input
GPU’s are the usual tool for training an RNN
Feed Forward Nets output one Value, whereas an RNN can output a sequence of values such as in the case of forecasting
Autoencoders
Autoencoders help us to figure out the underlying stucture of a data set, these are a family of NN’s that help us to extract features
AE’s are typically very shallow, an RBM is an AE with only 2 layers
Autoencoders are using a backpropogation value of loss, which is a measure of the information loss
Deep AE’s are useful for maintaining information while reducing the dimensionality of data
Deep AE’s work better than PCA’s which are their predecesors
Recursive Neural Tensor Nets
Richard Socher
It may be useful to discover the heirarchy of a data structure, this is something that RNTN’s can be helpful for
These were initally designed in order to solve sentiment analysis problems
These networks consist of root and leaf nodes into what form a binary tree
The leaves receive input and the Root outputs a class and a score
Data moves recursively within these networks
These nets work best with specific vector representations that are able to encode the similarity between inputs best
The net will typically look at different combinations of parsing methods and using a scoring system to select the optimal tree structure
RNTS’s are trained by back propogation and is used for syntactic parsing, sentiment analysis, image parsing with many components among other things
Uses for Deep Learning
Some of the biggest usecases for Deep Learning are in the spaces such as
- Machine Vision
- Object Recognition -> Clarifai uses a CNN
- Speech Recognition
- Machine Translation
- Fact Extraction
- Sentiment Analysis -> Metamind
- Cancer Detection
- Drug Discovery
- Radiology
- Finance
- Digital Advertising
- Fraud Detection
- Customer Intel
Deep Learning Platforms
Deep learning platforms provide users with a set of tools for training custom nets
- Platform
- Do not need to know to code
- Constrained by selection of Nets and Config
- Library
- Flexibility
- Set of functions that we can use with code
H2O.ai
H2O is a software platform that provides one Deep Net (MLP) and a few other Machine Learning Algorithms, the platform also offers data preprocessing and model management
H2O allows easy integration to external Data Sources while also allowing you to plug into other services for data processing
H2O is downloadable and can be deployed and managed on your own
Data Graphlab
Graphlab offers 2 Deep Nets as well as Machine Learning and Graph models
Graphlab has a CNN and MLP among tools for Classification, Regression, Text analysis, and Clustering
Built in integration for external data sources as well as tools for visualizations
Graphlab is downloadable and needs to be locally managed
Deep Learning Libraries
A Library is a premade set of tools that can be used by our code
Some libraries that are suitable for commercial use are
- Deeplearning4j
- Torch
- Caffe
For educational uses libraries like Theano can be useful
Theano
Theano provides functions for building deep nets that can train quickly
Developed by Machine Learning Group at the University of Montreal
Theano is a Python libraries that makes use net representations as Matrix Representations for the Network structure. Theano therefore allows for fast training due to the parallel optimizations available for training deep nets
Theano requires you to build your NN’s from the ground up, specifying everything from layers to activation functions
The Blocks library allows to build on top of Theano, and the Lasagne/Keras libraries allow us to build on top of Theano by providing the Net’s hyperparameters layer by layer
Libraries like Passage are also useful for RNN training for textual analysis
Caffe
Caffe is a deep learning library for machine vision and forecasting which allows you to train custom nets as well as use prebuilt nets via the community
Caffe is well suited for CNN’s among other types of nets and is written in C++ and can be accessed with Matlab and Python. It provides the user an ability to very flexibly define nets and net parameters as needed. CaffeNets are uploaded to what is called the Model Zoo
Tensorflow
Tensorflow is a library built by Google for building Commercial Deep Learning applications. The goal was to build a Machine Learning model that can be deployed on a variety of end devices
Tensorflow is based on the concept of a Computational Graph in which Tensors flow along graph connections
If Hyperparameter interfaces are available by way of the Keras library
Tensorboard allows you to view visualizations about the network with methods such as Network architecture as well as model progression