Spring 2017
Quiz 4
Note: answers are bolded
-
Stochastic gradient descent, when used with the hinge loss, leads to which update rule?
- Winnow
- Widrow's Adaline
- Perceptron
- AdaGrad
-
In a mistake-driven algorithm, if we make a mistake on example xi with label yi, we update the weights w so that we now predict yi correctly.
- True
- False
-
Which of the following properties is true about the (original) Perceptron algorithm?
- The Perceptron always converges to the best linear separator for a given dataset.
- The convergence criteria for Perceptron depends on the initial value of the weight vector.
- If the dataset is not linearly separable, the Perceptron algorithm does not converge and keeps cycling between some sets of weights.
- If the dataset is not lineary separable, the Perceptron algorithm learns the linear separator with least misclassifications.
-
Let's assume that we are using the standard Averaged Perceptron algorithm for training and testing (prediction). Let's further assume that it makes k mistakes on the training data. Now, how many weight vectors do we require to predict the label for a test instance?
- O(1)
- O(k)
- O(k2)
- Not enough information.
-
Winnow has a better mistake bound than Perceptron when only k of n features are relevant to the prediction and k << n.
- True
- False
Dan Roth