CS446: Machine Learning

Spring 2017

Quiz 7

Note: answers are bolded

Which of the following is able to approximate any continuous function to an arbitrary accuracy?
1. A two-layer neural network (input layer, output layer) using a linear activation function.
2. A two-layer neural network (input layer, output layer) using a non-linear activation function.
3. A three-layer neural network (input layer, hidden layer, output layer) using a linear activation function.

The use of sigmoid functions makes back-propagation possible because it is continuous and differentiable. Besides enabling back-propagation, the sigmoid function also makes neural network a:
1. linear classifier

While training neural networks with at least one hidden layer (and using a nonlinear activation function), would the initialization of weight vectors have an impact on the performance of the neural network?
1. No, because back-propagation using gradient descent would always find the best weights.
2. No, Neural networks in the given configuration always optimize a convex objective function and will reach the minimum eventually.

Which of the following are reasons why one may prefer using one-vs-all (OvA) over all-vs-all (AvA) in the multiclass classification setting (Multiple choices may be correct)?
1. OvA is able to learn problems that AvA cannot
2. OvA makes weaker assumptions regarding the separability of the data than AvA does

In k-class classification, one-vs-all at least requires k classifiers for k different labels.
1. True

Dan Roth