Homework07

Homework07#

  1. GitHub

    1. Please create your own GitHub account.

    2. Create a repository with the same name as your GitHub profile username. For example, my GitHub username is tomcm39 and i created a reposoty called tomcm39 tomcm39/tomcm39

    3. Please add a README.md to your repository

    4. Add to the README.md file professional information about yourself: Major, Courses that you would like to highlight, research projects (if appplicable).

    5. Please include in the homework your GitHub username so that i can review.

  1. Linear combinations of the Normal

Given \(\lambda\), please write down the corresponding multivariate normal distribution for

(314)#\[\begin{align} X \sim \text{MVN}\left( \begin{bmatrix} -1 \\ 0 \\ 2 \end{bmatrix} , \begin{bmatrix} 4 & 1 & -2 \\ 1 & 2 & 0.5 \\ -2 & 0.5 & 3 \end{bmatrix} \right) \end{align}\]
  1. \(\lambda = [0,1,10]\)

  2. \(\lambda = [-1,1,2]\)

  1. Gaussian Graphical Model

Given a precision matrix

(315)#\[\begin{align} \Lambda = \begin{bmatrix} 2 & -0.5 & 0 & 0 \\ -0.5 & 2 & -0.5 & 0 \\ 0 & -0.5 & 2 & -0.5 \\ 0 & 0 & -0.5 & 2 \end{bmatrix} \end{align}\]

please draw a circle for each variable \(X_1, \; X_2, \; X_3, \; X_4\). In addition, draw an edge between \(X_i\) and \(X_j\) if they are dependent and no edge between \(X_i\) and \(X_j\) if they are independent.

This representation of dependence and independence is often called a Gaussian Graphical Model.

  1. Marginal distribution

Given \(\mu\) and \(\Sigma\), please write down the marginal probability distribution for \(X_3\).

(316)#\[\begin{align} \mu = [0,0,0,0]^{'} ; \; \Sigma = \begin{bmatrix} 2.0 & -0.8 & 0.6 & 0.3 \\ -0.8 & 3.0 & -1.2 & 0.5 \\ 0.6 & -1.2 & 2.5 & -0.9 \\ 0.3 & 0.5 & -0.9 & 1.8 \end{bmatrix} \end{align}\]
  1. Conditional distribution

Given \(\mu\) and \(\Sigma\), please write down the conditional probability distribution for \(X_3, X_{4} | X_{1}, X_{2}\).

(317)#\[\begin{align} \mu = [1,-1,2,0]^{'} ; \; \Sigma = \begin{bmatrix} 4.0 & 1.2 & -0.6 & 0.8 \\ 1.2 & 2.5 & 0.7 & -0.4 \\ -0.6 & 0.7 & 3.0 & 1.1 \\ 0.8 & -0.4 & 1.1 & 2.2 \end{bmatrix} \end{align}\]
  1. Conditional distribution

Given the above distribution, please write down the conditional probability distribution for \(X_3 | X_{1}, X_{2}\).

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt 

weights = [0.3,0.7]

mu1     = [-1,1]
Sigma1  = 1./2*np.eye(2)

mu2     = [2,-1]
Sigma2  = np.array([[1, 0.2],[0.2, 1]]) 

N = 200
dataset = {  "group":[], "x1":[], "x2":[]  }
for sample in range(N):
    if np.random.random() < weights[0]:
        sample = np.random.multivariate_normal(mu1,Sigma1)
        group  = 0
    else:
        sample = np.random.multivariate_normal(mu2,Sigma2)
        group  = 1
    dataset["group"].append(group)
    dataset["x1"].append(sample[0])
    dataset["x2"].append(sample[1])
dataset = pd.DataFrame(dataset)
  1. 2D Gaussian Mixture model

The above dataset has three columns and 200 observations. The first column is called “group” and equals the value 0 when the observation is generated by the first group and equals the value 1 when the observation is generated by the second group. The observations that are generated are length 2 vectors \(x = \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}\) where the value \(x_{1}\) is in the column labeled “x1” and \(x_{2}\) is in the column labeled “x2”.

The model used to generate this dataset is called a 2D Gaussian Mixture model, and assumes that observations are geenrated by a random variable \(X_{i}\) such that

(318)#\[\begin{align} G_{i} &\sim \text{Bernoulli}( \theta ) \\ X_{i} | G_{i} &= 0 \sim \text{MVN}( \mu_{0}, \Sigma_{0} )\\ X_{i} | G_{i} &= 1 \sim \text{MVN}( \mu_{1}, \Sigma_{1} ) \end{align}\]

(a) Please plot the dataset using one color for observations in group 0 and a different color for obervations in group 1.

(b) Estimate the mixing probability \(\theta\) from the dataset.

Recall that \(G_i \in \{0,1\}\) indicates group membership. An estimate of \(\theta\) is given by the sample proportion:

(319)#\[\begin{align} \widehat{\theta} = \frac{1}{N} \sum_{i=1}^{N} G_i \end{align}\]

Compute \(\widehat{\theta}\) using the “group” column in the dataset and compare it to the true value used to simulate the data.

(c) Estimate the mean vector and covariance matrix for each group.

For group \(k \in \{0,1\}\), let \(N_k\) be the number of observations in that group. Then:

Step 1: Estimate the mean vector

(320)#\[\begin{align} \widehat{\mu}_k = \frac{1}{N_k} \sum_{i : G_i = k} X_i \end{align}\]

Step 2: Estimate the covariance matrix

(321)#\[\begin{align} \widehat{\Sigma}_k = \frac{1}{N_k - 1} \sum_{i : G_i = k} (X_i - \widehat{\mu}_k)(X_i - \widehat{\mu}_k)^T \end{align}\]

Compute \(\widehat{\mu}_0, \widehat{\mu}_1\) and \(\widehat{\Sigma}_0, \widehat{\Sigma}_1\) using the dataset.

Compare your estimates to the true parameters used to generate the data.