Step 1: The Log-Likelihood The likelihood function is the product of individual densities (i.i.d. observations): \[
L(\mu, \Sigma) = \prod_{i=1}^n \frac{1}{\sqrt{(2\pi)^d |\Sigma|}} \exp \left( -\frac{1}{2} (x_i - \mu)^T \Sigma^{-1} (x_i - \mu) \right)
\]
We want to maximize the log-likelihood \(\mathcal{L}(\mu, \Sigma) = \ln L(\mu, \Sigma)\): \[
\begin{aligned}
\mathcal{L}(\mu, \Sigma) &= \sum_{i=1}^n \left( -\frac{d}{2}\ln(2\pi) - \frac{1}{2}\ln|\Sigma| - \frac{1}{2}(x_i - \mu)^T \Sigma^{-1} (x_i - \mu) \right) \\
&= -\frac{nd}{2}\ln(2\pi) - \frac{n}{2}\ln|\Sigma| - \frac{1}{2} \sum_{i=1}^n (x_i - \mu)^T \Sigma^{-1} (x_i - \mu)
\end{aligned}
\]
Step 2: MLE for \(\mu\) To find \(\hat{\mu}\), take the derivative w.r.t \(\mu\) and set to 0. We use the fact that \(\nabla_\mu (x - \mu)^T \Sigma^{-1} (x - \mu) = -2\Sigma^{-1}(x - \mu)\) (covariance matrix \(\Sigma\) is symmetric):
\[
\frac{\partial \mathcal{L}}{\partial \mu} = -\frac{1}{2} \sum_{i=1}^n \left( -2 \Sigma^{-1} (x_i - \mu) \right) = \Sigma^{-1} \sum_{i=1}^n (x_i - \mu) = 0
\]
Since \(\Sigma^{-1}\) is positive definite, the sum term must be zero: \[
\sum_{i=1}^n x_i - n\mu = 0 \implies \hat{\mu}_{\text{MLE}} = \frac{1}{n} \sum_{i=1}^n x_i
\]
Step 3: MLE for \(\Sigma\) To find \(\hat{\Sigma}\), we differentiate w.r.t \(\Sigma\). First, rewrite the sum using the trace trick \(\text{tr}(x^T A x) = \text{tr}(A x x^T)\): \[
\sum_{i=1}^n (x_i - \mu)^T \Sigma^{-1} (x_i - \mu) = \text{tr}\left( \Sigma^{-1} \sum_{i=1}^n (x_i - \mu)(x_i - \mu)^T \right)
\]
Using matrix calculus identities \(\frac{\partial \ln|\Sigma|}{\partial \Sigma} = \Sigma^{-1}\) and \(\frac{\partial \text{tr}(\Sigma^{-1} S)}{\partial \Sigma} = -\Sigma^{-1} S \Sigma^{-1}\):
\[
\frac{\partial \mathcal{L}}{\partial \Sigma} = -\frac{n}{2}\Sigma^{-1} + \frac{1}{2} \Sigma^{-1} \left( \sum_{i=1}^n (x_i - \mu)(x_i - \mu)^T \right) \Sigma^{-1} = 0
\]
Multiply by \(\Sigma\) on both sides to isolate the term: \[
-n \Sigma + \sum_{i=1}^n (x_i - \mu)(x_i - \mu)^T = 0
\]
Substituting \(\hat{\mu}\) for \(\mu\): \[
\hat{\Sigma}_{\text{MLE}} = \frac{1}{n} \sum_{i=1}^n (x_i - \hat{\mu})(x_i - \hat{\mu})^T
\]