Neural Networks & Deep Learning

CIS1902 Python Programming

Reminders

  1. HW1 is due tonight!
  2. HW2 will be released soon. Expect a data science portion (pandas manipulation), training a model using scikit-learn, then (maybe) fiddling with a neural network.

Agenda

  1. What is a neural network?
  2. Why are neural networks good?
  3. How do we train a neural network?
  4. MNIST Lab

Neural Networks: Inspiration

  • What if we used a model that was representative of how our brain works?
  • Roughly, our brains have neurons that activate based on various inputs
  • The output of neurons is fed to other neurons in a network and eventually leads to an output

brain neural net

Artificial Neurons

How do we model a neuron?

  1. Inputs
  2. Weights
  3. Bias
  4. Activation Function

perceptron

Artificial Neurons

Generally, an artificial neuron can be described as

Where is the vector of inputs, is a vector of weights, and is typically some sort of threshold function.

Perceptron

The perceptron model is the earliest artificial neuron that uses a simple threshold function [McCulloch & Pitts 1943].

unit step

Sigmoid Neuron

The sigmoid model uses a continuous threshold instead (this is the same function as the logistic regression!)

sigmoid

Neural Networks

A neural network can be thought of as a directed graph of artificial neurons.

Typically, neurons are organized in layers. The last layer is considered the output.

Outputs might represent things like probability or likelihood an input belongs to a category, i.e. cat vs dog in image.

neural net w: 450

Neural Networks

Why are neural networks so powerful?

Roughly, the Universal Approximation Theorem states that for any function , there exists a neural network that can approximate to any degree of accuracy.

Note that it does not say what that neural network is! However, the property of existance alone suggests that neural networks can learn complex patterns and relationships.

Training Neural Networks

How do we train our neural network?

First, we must define a loss function, i.e. how far is the output from what we want? We want to minimize the loss function, so typically we try to pick functions that are continuous (has a derivative).

Then, we simply aim to pick the weights for each neuron that minimize loss. But how do we do this efficiently?

Backpropagation & Gradient Descent

Backpropagation is a method to determine the gradient of the loss function with respect to the weights of all neurons.

Once the gradient is determined, a technique called gradient descent is used to determine a potential minimum of the loss function.

Backpropagation & Gradient Descent

Think of rolling a ball from a random point on the loss function: eventually, it should reach some sort of "bottom". This is the key idea behind gradient descent.

Each "step" that we take tells us the direction we should adjust our weights, until we eventually reach a minimum!

gradient descent

Recap

To train a neural network:

  1. Pick a neuron model and create a network shape
  2. Pick random weights
  3. Use backpropagation and gradient descent to update weights
  4. Repeat until model converges

Bad news: steps 2-3 are very computationally complex steps and may even require specialized hardware (lots of matrix multiplication = need for GPUs)

Keras

Good news: you don't have to implement any of this!

Keras is an API for deep learning that plugs into various backends. Backends are codebases that have implemented the aforementioned heavy lifting for you, e.g. PyTorch, Jax, or TensorFlow.

Keras provides a coding framework to easily define your neural nets, loss functions, etc.

MNIST Lab