DSC 140A

Problem #01

Tags: nearest neighbor

Consider the data set shown below:

What is the predicted value of $y$ at $x = 3$ if the 3-nearest neighbor rule is used?

Solution

6

Problem #02

Tags: linear prediction functions

Suppose a linear prediction rule $H(\vec x; \vec w) = \Aug(\vec x) \cdot\vec w$ is parameterized by the weight vector $\vec w = (3, -2, 5, 2)^T$. Let $\vec z = (1, -1, -2)^T$. What is $H(\vec z)$?

Solution

-8

Problem #03

Tags: linear regression, least squares

Suppose a line of the form $H(x) = w_0 + w_1 x$ is fit to a data set of points $\{(x_i, y_i)\}$ in $\mathbb R^2$ by minimizing the mean squared error. Let the mean squared error of this predictor with respect to this data set be $E_1$.

Next, create a new data set by adding a single new point to the original data set with the property that the new point lies exactly on the line $H(x) = w_0 + w_1 x$ that was fit above. Let the mean squared error of $H$ on this new data set be $E_2$.

Which of the following is true?

Solution

$E_1 > E_2$

Problem #04

Tags: linear prediction functions

Suppose a linear predictor $H_1$ is fit on a data set $X = \{\nvec{x}{i}, y_i\}$ of $n$ points by minimizing the mean squared error, where each $\nvec{x}{i}\in\mathbb R^d$.

Let $Z = \{\nvec{z}{i}, y_i\}$ be the data set obtained from the original by standardizing each feature. That is, if a matrix were created with the $i$th row being $\nvec{z}{i}$, then the mean of each column would be zero, and the variance would be one.

Suppose a linear predictor $H_2$ is fit on this standardized data by minimizing the mean squared error.

True or False: $H_1(\nvec{x}{i}) = H_2(\nvec{z}{i})$ for each $i = 1, \ldots, n$.

Solution

True.

Problem #05

Tags: object type

Problem #06

Tags: linear prediction functions

Let $\nvec{x}{1} = (-1, -1)^T$, $\nvec{x}{2} = (1, 1)^T$, and $\nvec{x}{3} = (-1, 1)^T$. Suppose $H$ is a linear prediction function, and that $H(\nvec{x}{1}) = 2$ while $H(\nvec{x}{2}) = -2$ and $H(\nvec{x}{3}) = 0$.

Let $\nvec{x}{4} = (1,-1)^T$. What is $H(\nvec{x}{4})$?

Solution

0

Problem #07

Tags: computing loss

Consider the data set shown below. The ``$\times$'' points have label +1, while the ``o'' points have label -1. Shown are the places where a linear prediction function $H$ is equal to zero, 1, and -1.

For each of the below subproblems, calculate the total loss with respect to the given loss function. That is, you should calculate $\sum_{i=1}^n L(\nvec{x}{i}, y_i, \vec w)$ using the appropriate loss function in place of $L$. Note that we have most often calculated the mean loss, but here we calculate the total so that we encounter fewer fractions.

Part 1)

What is the total square loss of $H$ on this data set?

Solution

17

Part 2)

What is the total perceptron loss of $H$ on this data set?

Solution

3

Part 3)

What is the total hinge loss of $H$ on this data set?

Solution

7

Problem #08

Tags: subgradients

Consider the function $f(x,y) = x^2 + |y|$. Plots of this function's surface and contours are shown below.

Which of the following are subgradients of $f$ at the point $(0, 0)$? Check all that apply.

Solution

$(0, 0)^T$, $(0, 1)^T$, and $(0, -1)^T$ are subgradients of $f$ at $(0, 0)$.

Problem #09

Tags: gradient descent, perceptrons, gradients

Suppose gradient descent is to be used to train a perceptron classifier $H(\vec x; \vec w)$ on a data set of $n$ points, $\{\nvec{x}{i}, y_i \}$. Recall that each iteration of gradient descent takes a step in the opposite direction of the ``gradient''.

Which gradient is being referred to here?

Solution

The gradient of the empirical risk with respect to $\vec w$

Problem #10

Tags: convexity

Let $\{\nvec{x}{i}\}$ be a set of $n$ vectors in $\mathbb R^d$. Consider the function $f(\vec w) = \sum_{i=1}^n \vec w \cdot\nvec{x}{i}$, where $\vec w \in\mathbb R^d$.

True or False: $f$ is convex as a function of $\vec w$.

Solution

True.

Problem #11

Tags: SVMs

Let $X = \{\nvec{x}{i}, y_i\}$ be a data set of $n$ points where each $\nvec{x}{i}\in\mathbb R^d$.

Let $Z = \{\nvec{z}{i}, y_i\}$ be the data set obtained from the original by standardizing each feature. That is, if a matrix were created with the $i$th row being $\nvec{z}{i}$, then the mean of each column would be zero, and the variance would be one.

Suppose that $X$ and $Z$ are both linearly-separable. Suppose Hard-SVMs $H_1$ and $H_2$ are trained on $X$ and $Z$, respectively.

True or False: $H_1(\nvec{x}{i}) = H_2(\nvec{z}{i})$ for each $i = 1, \ldots, n$.

Solution

False.

This is a tricky problem! For an example demonstrating why this is False, see: https://youtu.be/LSr55vyvfb4

Problem #12

Tags: least squares

Suppose a data set $\{\nvec{x}{i}, y_i\}$ is linearly-separable.

True or false: a least squares classifier trained on this data set is guaranteed to achieve a training error of zero.

True False

Solution

False

Problem #13

Tags: SVMs

The image below shows a linear prediction function $H$ along with a data set; the ``$\times$'' points have label +1 while the ``$\circ$'' points have label -1. Also shown are the places where the output of $H$ is 0, 1, and -1.

True or False: $H$ could have been learned by training a Hard-SVM on this data set.

True False

Solution

False.

The solution to the Hard-SVM problem is a hyperplane that separates the two classes with the maximum margin.

In this solution, the margin is not maximized: There is room for the ``exclusion zone'' between the two classes to grow, and for the margin to increase.

Problem #17

Tags: gradients

Let $f(\vec w) = \vec a \cdot\vec w + \lambda\|\vec w \|^2$, where $\vec w \in\mathbb R^d$, $\vec a \in\mathbb R^d$, and $\lambda > 0$.

What is the minimizer of $f$? State your answer in terms of $\vec a$ and $\lambda$.

Solution

$\vec w^* = -\frac{\vec{a}}{2\lambda}$

Problem #18

Tags: nearest neighbor

Consider the data set of diamonds and circles shown below. Suppose a $k$-nn classifier is used to predict the label of the new point marked by $\times$, with $k = 3$. What will be the prediction? You may assume that the Euclidean distance is used.

Problem #19

Tags: linear prediction functions

Suppose a linear prediction rule $H(\vec x; \vec w) = \Aug(\vec x) \cdot\vec w$ is parameterized by the weight vector $\vec w = (2, 1, -3, 4)^T$. Let $\vec z = (3, -2, 1)^T$. What is $H(\vec z)$?

Solution

15

Problem #20

Tags: least squares

In the following, let $\mathcal D_1$ be a set of points $(x_1, y_1), \ldots, (x_n, y_n)$ in $\mathbb R^2$. Suppose a straight line $H_1(x) = a_1 x + a_0$ is fit to this data set by minimizing the mean squared error, and let $R_1$ the be mean squared error of this line.

Now create a second data set, $\mathcal D_2$, by doubling the $x$-coordinate of each of the original points, but leaving the $y$-coordinate unchanged. That is, $\mathcal D_2$ consists of points $(2x_1, y_1), \ldots, (2x_n, y_n)$. Suppose a straight line $H_2(x) = b_1 x + b_0$ is fit to this data set by minimizing the mean squared error, and let $R_2$ be the mean squared error of this line.

You may assume that all of the $x_i$ are unique, as are all of the $y_i$.