Machine Learning Introduction

CIS1902 Python Programming

Agenda

  1. Types of Machine Learning
  2. Defining a Model
  3. Diamond Pricing Lab
  4. Titanic Lab

Machine Learning

There are a few different types of machine learning you may have heard of:

  • Supervised
    • Labeled data, e.g. past exams with solutions
  • Unsupervised
    • Unlabeled data, e.g. just past exams
  • Reinforcement
    • Getting feedback as you take the exam

Defining a Model

  • Problem Type: What is the goal of the output? Classification or value prediction?
  • Feature selection: What are some insights about that data that are useful?
  • Model selection: Given what I know above, what is the best class of model to pick for my problem?

Decision Tree

  • One of the simplest and most intuitive models.

  • Machine learning "magic" optimizes ways to determine the best splits.

decision tree

Random Forest

  • Decision trees are simple but can easily "overfit" to their training set
  • What if instead we had a bunch of decision trees, not necessarily the same?
  • This is a random forest! Typically, the majority or mean value is taken
  • Machine learning "magic" helps generate lots of trees

Linear Regression

A linear regression assumes that the dependent variable (y) has a linear relationship with one or more of the independent variables (x). These are also called regressors or predictors.

Specifically, if we have regressions, for the th label, we model

which we can represent succinctly as a matrix

Linear Regression

  • The goal of a linear regression is to determine the vector that minimizes the error .

  • Visually, machine learning "magic" is picking the line that minimizes the total distance of the green lines.

linear regression

Logistic Regression

A logistic regression is similar to a linear regression, but instead predicts a probability, e.g. given this much rain, probability of a flood?

You can think of the problem as what parameters maximize the likelyhood of our (training) data occuring?

Logistic Regression

Pick parameters for function

such that the following is maximized (likelyhood)

logistic regression

Lab: Titanic Dataset