DSC 140A

Problem #001

Tags: nearest neighbor

Consider the data set shown below:

What is the predicted value of $y$ at $x = 3$ if the 3-nearest neighbor rule is used?

Solution

6

Problem #002

Tags: linear prediction functions

Suppose a linear prediction rule $H(\vec x; \vec w) = \Aug(\vec x) \cdot\vec w$ is parameterized by the weight vector $\vec w = (3, -2, 5, 2)^T$. Let $\vec z = (1, -1, -2)^T$. What is $H(\vec z)$?

Solution

-8

Problem #003

Tags: linear regression, least squares

Suppose a line of the form $H(x) = w_0 + w_1 x$ is fit to a data set of points $\{(x_i, y_i)\}$ in $\mathbb R^2$ by minimizing the mean squared error. Let the mean squared error of this predictor with respect to this data set be $E_1$.

Next, create a new data set by adding a single new point to the original data set with the property that the new point lies exactly on the line $H(x) = w_0 + w_1 x$ that was fit above. Let the mean squared error of $H$ on this new data set be $E_2$.

Which of the following is true?

Solution

$E_1 > E_2$

Problem #004

Tags: linear prediction functions

Suppose a linear predictor $H_1$ is fit on a data set $X = \{\nvec{x}{i}, y_i\}$ of $n$ points by minimizing the mean squared error, where each $\nvec{x}{i}\in\mathbb R^d$.

Let $Z = \{\nvec{z}{i}, y_i\}$ be the data set obtained from the original by standardizing each feature. That is, if a matrix were created with the $i$th row being $\nvec{z}{i}$, then the mean of each column would be zero, and the variance would be one.

Suppose a linear predictor $H_2$ is fit on this standardized data by minimizing the mean squared error.

True or False: $H_1(\nvec{x}{i}) = H_2(\nvec{z}{i})$ for each $i = 1, \ldots, n$.

Solution

True.

Problem #005

Tags: object type

Problem #006

Tags: linear prediction functions

Let $\nvec{x}{1} = (-1, -1)^T$, $\nvec{x}{2} = (1, 1)^T$, and $\nvec{x}{3} = (-1, 1)^T$. Suppose $H$ is a linear prediction function, and that $H(\nvec{x}{1}) = 2$ while $H(\nvec{x}{2}) = -2$ and $H(\nvec{x}{3}) = 0$.

Let $\nvec{x}{4} = (1,-1)^T$. What is $H(\nvec{x}{4})$?

Solution

0

Problem #007

Tags: computing loss

Consider the data set shown below. The ``$\times$'' points have label +1, while the ``o'' points have label -1. Shown are the places where a linear prediction function $H$ is equal to zero, 1, and -1.

For each of the below subproblems, calculate the total loss with respect to the given loss function. That is, you should calculate $\sum_{i=1}^n L(\nvec{x}{i}, y_i, \vec w)$ using the appropriate loss function in place of $L$. Note that we have most often calculated the mean loss, but here we calculate the total so that we encounter fewer fractions.

Part 1)

What is the total square loss of $H$ on this data set?

Solution

17

Part 2)

What is the total perceptron loss of $H$ on this data set?

Solution

3

Part 3)

What is the total hinge loss of $H$ on this data set?

Solution

7

Problem #008

Tags: subgradients

Consider the function $f(x,y) = x^2 + |y|$. Plots of this function's surface and contours are shown below.

Which of the following are subgradients of $f$ at the point $(0, 0)$? Check all that apply.

Solution

$(0, 0)^T$, $(0, 1)^T$, and $(0, -1)^T$ are subgradients of $f$ at $(0, 0)$.

Problem #009

Tags: gradient descent, perceptrons, gradients

Suppose gradient descent is to be used to train a perceptron classifier $H(\vec x; \vec w)$ on a data set of $n$ points, $\{\nvec{x}{i}, y_i \}$. Recall that each iteration of gradient descent takes a step in the opposite direction of the ``gradient''.

Which gradient is being referred to here?

Solution

The gradient of the empirical risk with respect to $\vec w$

Problem #010

Tags: convexity

Let $\{\nvec{x}{i}\}$ be a set of $n$ vectors in $\mathbb R^d$. Consider the function $f(\vec w) = \sum_{i=1}^n \vec w \cdot\nvec{x}{i}$, where $\vec w \in\mathbb R^d$.

True or False: $f$ is convex as a function of $\vec w$.

Solution

True.

Problem #011

Tags: SVMs

Let $X = \{\nvec{x}{i}, y_i\}$ be a data set of $n$ points where each $\nvec{x}{i}\in\mathbb R^d$.

Let $Z = \{\nvec{z}{i}, y_i\}$ be the data set obtained from the original by standardizing each feature. That is, if a matrix were created with the $i$th row being $\nvec{z}{i}$, then the mean of each column would be zero, and the variance would be one.

Suppose that $X$ and $Z$ are both linearly-separable. Suppose Hard-SVMs $H_1$ and $H_2$ are trained on $X$ and $Z$, respectively.

True or False: $H_1(\nvec{x}{i}) = H_2(\nvec{z}{i})$ for each $i = 1, \ldots, n$.

Solution

False.

This is a tricky problem! For an example demonstrating why this is False, see: https://youtu.be/LSr55vyvfb4

Problem #012

Tags: least squares

Suppose a data set $\{\nvec{x}{i}, y_i\}$ is linearly-separable.

True or false: a least squares classifier trained on this data set is guaranteed to achieve a training error of zero.

True False

Solution

False

Problem #013

Tags: SVMs

The image below shows a linear prediction function $H$ along with a data set; the ``$\times$'' points have label +1 while the ``$\circ$'' points have label -1. Also shown are the places where the output of $H$ is 0, 1, and -1.

True or False: $H$ could have been learned by training a Hard-SVM on this data set.

True False

Solution

False.

The solution to the Hard-SVM problem is a hyperplane that separates the two classes with the maximum margin.

In this solution, the margin is not maximized: There is room for the ``exclusion zone'' between the two classes to grow, and for the margin to increase.

Problem #014

Tags: kernel ridge regression

Let $\nvec{x}{1} = (1, 2, 0)^T$, $\nvec{x}{2} = (-1, -1, -1)^T$, $\nvec{x}{3} = (2, 2, 0)^T$, $\nvec{x}{4} = (0, 2, 0)$.

Suppose a prediction function $H(\vec x)$ is learned using kernel ridge regression on the above data set using the kernel $\kappa(\vec x, \vec x') = (1 + \vec x \cdot\vec x')^2$ and regularization parameter $\lambda = 3$. Suppose that $\vec\alpha = (1, 0, -1, 2)^T$ is the solution of the dual problem.

Let $\vec x = (0, 1, 0)^T$ be a new point. What is $H(\vec x)$?

Solution

18