Homework07#
GitHub
Please create your own GitHub account.
Create a repository with the same name as your GitHub profile username. For example, my GitHub username is tomcm39 and i created a reposoty called tomcm39 tomcm39/tomcm39
Please add a README.md to your repository
Add to the README.md file professional information about yourself: Major, Courses that you would like to highlight, research projects (if appplicable).
Please include in the homework your GitHub username so that i can review.
Linear combinations of the Normal
Given \(\lambda\), please write down the corresponding multivariate normal distribution for
\(\lambda = [0,1,10]\)
\(\lambda = [-1,1,2]\)
Gaussian Graphical Model
Given a precision matrix
please draw a circle for each variable \(X_1, \; X_2, \; X_3, \; X_4\). In addition, draw an edge between \(X_i\) and \(X_j\) if they are dependent and no edge between \(X_i\) and \(X_j\) if they are independent.
This representation of dependence and independence is often called a Gaussian Graphical Model.
Marginal distribution
Given \(\mu\) and \(\Sigma\), please write down the marginal probability distribution for \(X_3\).
Conditional distribution
Given \(\mu\) and \(\Sigma\), please write down the conditional probability distribution for \(X_3, X_{4} | X_{1}, X_{2}\).
Conditional distribution
Given the above distribution, please write down the conditional probability distribution for \(X_3 | X_{1}, X_{2}\).
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
weights = [0.3,0.7]
mu1 = [-1,1]
Sigma1 = 1./2*np.eye(2)
mu2 = [2,-1]
Sigma2 = np.array([[1, 0.2],[0.2, 1]])
N = 200
dataset = { "group":[], "x1":[], "x2":[] }
for sample in range(N):
if np.random.random() < weights[0]:
sample = np.random.multivariate_normal(mu1,Sigma1)
group = 0
else:
sample = np.random.multivariate_normal(mu2,Sigma2)
group = 1
dataset["group"].append(group)
dataset["x1"].append(sample[0])
dataset["x2"].append(sample[1])
dataset = pd.DataFrame(dataset)
2D Gaussian Mixture model
The above dataset has three columns and 200 observations. The first column is called “group” and equals the value 0 when the observation is generated by the first group and equals the value 1 when the observation is generated by the second group. The observations that are generated are length 2 vectors \(x = \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}\) where the value \(x_{1}\) is in the column labeled “x1” and \(x_{2}\) is in the column labeled “x2”.
The model used to generate this dataset is called a 2D Gaussian Mixture model, and assumes that observations are geenrated by a random variable \(X_{i}\) such that
(a) Please plot the dataset using one color for observations in group 0 and a different color for obervations in group 1.
(b) Estimate the mixing probability \(\theta\) from the dataset.
Recall that \(G_i \in \{0,1\}\) indicates group membership. An estimate of \(\theta\) is given by the sample proportion:
Compute \(\widehat{\theta}\) using the “group” column in the dataset and compare it to the true value used to simulate the data.
(c) Estimate the mean vector and covariance matrix for each group.
For group \(k \in \{0,1\}\), let \(N_k\) be the number of observations in that group. Then:
Step 1: Estimate the mean vector
Step 2: Estimate the covariance matrix
Compute \(\widehat{\mu}_0, \widehat{\mu}_1\) and \(\widehat{\Sigma}_0, \widehat{\Sigma}_1\) using the dataset.
Compare your estimates to the true parameters used to generate the data.