Maximum Likelihood

The data points shown above were drawn from a Gaussian. What were the Gaussian's parameters? We can estimate them using the principle of Maximum Likelihood.

Move the sliders above to adjust the parameter estimates. As you do, the likelihood of that choice of parameters with respect to the data will be shown as the blue bar labeled "current likelihood". The largest likelihood seen so far is represented by the gray bar. Try to maximize the likelihood.

Recall that the likelihood of the parameters $\mu, \sigma$ is given by: $$ \mathcal{L}(\mu, \sigma; x^{(1)}, \ldots, x^{(n)}) = \prod_{i=1}^n p(x^{(i)}; \mu, \sigma). $$ In this case, $p$ is the Gaussian probability density function: $$ p(x; \mu, \sigma) = \frac{1}{\sigma \sqrt{2 \pi}}e^{-(x - \mu)^2/(2\sigma^2)} $$ In terms of the above visualization:

$ x^{(i)} $ is the $i$th data point, represented by a black circle
$ p(x^{(i)}; \mu, \sigma) $ is the density at the $i$th data point, represented by the heights of the dashed blue lines. The longer this line, the "more likely" it was to generate this data point with this choice of $ \mu $ and $ \sigma $.
The likelihood, $ \mathcal{L}(\mu, \sigma; x^{(1)}, \ldots, x^{(n)}) $, is the product of the heights of the dashed blue lines, and is represented by the length of the blue bar labeled "current likelihood".

If any one of the lines is close to zero in height, the current choice of parameters is unlikely for the corresponding data point, and so the overall likelihood is also close to zero.

In the case where we assume that the data came from a Gaussian, we do not need to use an iterative approach to find the maximum likelihood estimates for $ \mu $ and $ \sigma $; we can derive formulas for them that work on any set of data. Namely: $$ \mu_\text{MLE} = \frac{1}{n} \sum_{i=1}^n x^{(i)} \qquad \sigma_\text{MLE} = \sqrt{\frac{1}{n} \sum_{i=1}^n (x^{(i)} - \mu_\text{MLE})^2} $$ That is, the maximum likelihood estimates are the mean and standard deviation, respectively. There are similar results for distributions other than Gaussian, too.