The goal of a linear regression is to determine the vector that minimizes the error .
Visually, machine learning "magic" is picking the line that minimizes the total distance of the green lines.
Logistic Regression
A logistic regression predicts a probability
Given amount of rain, what's the probability of a flood?
You can think of the problem as "What parameters maximize the likelyhood of our (training) data occuring"?
As a caveat, is only good for tasks where there is a binary outcome.
Logistic Regression
Pick parameters for function
such that the likelyhood is maximized
K-Means
Given points, how do we "optimally" group them into sets? Formally,
Conceptually, the best split is where each set is chosen such that the total distance between each point and the center of that set is minimized.
K-Means
K-Means
This is actually an NP-hard problem! Machine learning instead typically uses an approximiation algorithm. It's quite good in practice, but is prone to finding local minima. To get around this, we simply just repeat the process a few times to see how "good" it is.
Recap
Decision Forest
Good for classification tasks
Linear Regression
Good for value prediction tasks
Logistic Regression
Good for binary outcomes (i.e. True/False, pass/fail, etc)
K-Means Clustering
Good for unsupervised classification tasks
Note: most tasks can be modeled in multiple different ways, picking the "right" model is usually trial & error.