Preface
1 Introduction to R
2 Linear Algebra
2.1 Linear Algebra with R
2.1.1 Introduction
2.1.2 Matrix Notation
3 Introduction to Machine Learning and Deep Learning
3.1 Training, Validation and Test Data 3.2 Bias and Variance
3.3 Underfitting and Overfitting
3.3.1 Bayes Error
3.4 Maximum Likelihood Estimation
3.5 Quantifying Loss 3.5.1 The Cross-Entropy Loss
3.5.2 Negative Log-Likelihood
3.5.3 Entropy
3.5.4 Cross-Entropy
3.5.5 Kullback-Leibler Divergence
3.5.6 Summarizing the Measurement of Loss
4 Introduction to Neural Networks
4.1 Types of Neural Network Architectures
4.1.1 Feedforward Neural Networks (FFNNs)
4.1.2 Convolutional Neural Networks (Convnets) 4.1.3 Recurrent Neural Networks (RNNs)
4.2 Forward Propagation
4.2.1 Notations
4.2.2 Input Matrix
4.2.3 Bias matrix 4.2.4 Weight matrix for Layer-1
4.2.5 Activation function at Layer-1
4.2.6 Weights matrix of Layer-2
4.2.7 Activation function at Layer-2
4.3 Activation Functions 4.3.1 Sigmoid
4.3.2 Hyperbolic tangent (tanh)
4.3.3 Rectified Linear Unit (ReLU)
4.3.4 leakyReLU
4.3.5 Softmax
4.4 Derivatives of Activation Functions
4.4.1 Derivative of the Sigmoid
4.4.2 Derivative of the tanh
4.4.3 Derivative of the ReLU
CONTENTS
4.4.4 Derivative of the lReLU
4.4.5 Derivative of the Softmax
4.5 Loss Functions
4.6 Derivative of the Cost Function
4.6.1 Derivative of Cross Entropy Loss with Sigmoid
4.6.2 Derivative of Cross Entropy Loss with Softmax
4.7 Back Propagation 4.7.1 Backpropagate to the output layer
4.7.2 Backpropagate to the second hidden layer
4.7.3 Backpropagate to the _rst hidden layer
4.7.4 Vectorization of backprop equations
4.8 Writing a Simple Neural Network Application
4.8.1 Image Classi_cation using Sigmoid Activation Neural Network
4.8.2 Importance of Normalization 5 Deep Neural Networks
5.1 Writing a Deep Neural Network (DNN) algorithm
5.2 Implementing a DNN using Keras
6 Regularization and Hyperparameter Tuning
6.1 Initialization
6.1.1 Zero initialization
6.1.2 Random initialization
6.1.3 Xavier initialization
6.1.4 He initialization
6.2 Gradient Descent
6.2.1 Gradient Descent or Batch Gradient Descent
6.2.2 Stochastic Gradient Descent 6.2.3 Mini Batch Gradient Descent
6.3 Dealing with NaNs
6.3.1 Hyperparameters and Weight Initialization
6.3.2 Normalization
6.3.3 Using di_erent Activation functions 6.3.4 Use of NanGuardMode, DebugMode, or MonitorMode
6.3.5 Numerical Stability
6.3.6 Algorithm Related
6.3.7 NaN Introduced by AllocEmpty
6.4 Optimization Algorithms
6.4.1 Simple Update
6.4.2 Momentum based Optimization Update
6.4.3 Nesterov Momentum Optimization Update
6.4.4 Adagrad (Adaptive Gradient Algorithm) Optimization Update
6.4.5 RMSProp (Root Mean Square Propagation) with Momentum Optimization Update
6.4.6 Adam Optimization (Adaptive Moment Estimation) with Momentum Update
6.4.7 Vanishing Gradient and Numerical stability 6.5 Gradient Checking
6.6 Second order methods
6.7 Per-parameter adaptive learning rate methods
6.8 Annealing the learning rate
6.9 Regularization
6.9.1 Dropout Regularization
6.9.2 `2 Regularization
6.9.3 Combining dropout and `2 regularization?
6.10
About the Author:
Abhijit Ghatak is a Data Scientist and holds an M.E. in Engineering and M.S. in Data Science from Stevens Institute of Technology, USA. He began his career as a submarine engineer officer in the Indian Navy and worked on various data-intensive projects involving submarine operations and construction. Thereafter he has worked in academia, technology companies and as a research scientist in the area of Internet of Things (IoT) and pattern recognition for the European Union (EU). He has published several papers in the areas of engineering and machine learning and is currently a consultant in the area of machine learning and deep learning. His research interests include IoT, stream analytics and design of deep learning systems.