Multivariate Normal Distribution#

Probability defined over vectors and the MVN#

Up until this point we have discussed assigning probabilities to a single random variable, or to a sequence of random variables such that every random variable in the sequence is independent of one another. We have not yet discussed (the very real) case that two or more random variables are related to one another and how to express these relationships.

Given a set of random variables \(X_{1}, X_{2}, \cdots, X_{n}\), a succint approach for defining probabilities to these random variables is by collecting them into a vector

(215)#\[\begin{align} X = \begin{bmatrix} X_{1} \\ X_{1} \\ \vdots X_{n} \\ \end{bmatrix} \end{align}\]

and assigning probabilties on the space \(\mathbb{R}^{n}\). In other words, we will begin to explore mathematical statements like

(216)#\[\begin{align} P(X = [1,2,3]^{T}) \end{align}\]

, or “What is the probability that the first random variable \(X_{1}\) equals the value 1 and the first random variable \(X_{2}\) equals the value 2 and the third random variable \(X_{3}\) equals the value 3 “

Before we begin a serious discussion of probability assignments like this, we need mathematical tools from linear algebra.

Review of vector and matrix algebra#

Vectors#

Vectors are the fundemental mathemtical object that we will work with in this unit. Vectors are defined as a list of items where each item is an element from the real line. For example

(217)#\[\begin{align} v_{1} = \begin{bmatrix} 1 \\ -1\\ 1/2\\ \end{bmatrix} \end{align}\]

is the vector \(v_{1}\) that contains the value one in the first position, value negative one in the second position and the value one-half in the third position.

Vectors can interact with elements from \(\mathbb{R}\) called scalars as well as interact with one another.

\(\alpha x\) and \(\alpha + x\)#

Given a scalar \(\alpha\), vector \(x \in \mathbb{R}^{n}\), then the new vector \(\alpha x\) is defined as

(218)#\[\begin{align} \alpha x = \begin{bmatrix} \alpha x_{1}\\ \alpha x_{2}\\ \vdots \\ \alpha x_{n} \end{bmatrix} \end{align}\]

and the new vector \(\alpha + x\) is defined as

(219)#\[\begin{align} \alpha+x = \begin{bmatrix} \alpha + x_{1}\\ \alpha + x_{2}\\ \vdots \\ \alpha + x_{n} \end{bmatrix} \end{align}\]

\(x+y\) and \(xy\)#

Given vectors \(x\) and \(y\) both in \(\mathbb{R}^{n}\) then the new vector \(x+y\) is defined as

(220)#\[\begin{align} x+y = \begin{bmatrix} x_{1} + y_{1}\\ x_{2} + y_{2}\\ \vdots \\ x_{n} + y_{n} \end{bmatrix} \end{align}\]

and the new vector \(xy\) is defined as

(221)#\[\begin{align} xy = \begin{bmatrix} x_{1} y_{1}\\ x_{2} y_{2}\\ \vdots \\ x_{n} y_{n} \end{bmatrix} \end{align}\]

An approach to defining a set of vectors#

On occasion it is important to describe a set of vectors, called a vector space. The canonical approach to describe a set of vectors is to generate all vectors in a vector space \(V\) as multiplicative and additive combinations of some core set of vectors. In otherwords, we can generate a set of vectors by picking a small number of vectors \(y_{1}, y_{2}, y_{3}\) and building a vector space \(V\) as

(222)#\[\begin{align} V = \left \{ \alpha_{1} y_{1} + \alpha_{2} y_{2} + \alpha_{3} y_{3} | \alpha_{1}, \alpha_{2}, \alpha_{3} \in \mathbb{R} ; y_{1}, y_{2} ,y_{3} \in \mathbb{R}^{n} \right \} \end{align}\]

For example, we may decide to generate the vector space \(V\) using these two vectors:

(223)#\[\begin{align} y_{1} = \begin{bmatrix} 1 \\ 2 \\ \end{bmatrix} \; \; % y_{2} = \begin{bmatrix} -1 \\ 3 \\ \end{bmatrix} \end{align}\]

Then in the space we created contains vectors like

(224)#\[\begin{align} \begin{bmatrix} -1 \\ 3 \end{bmatrix} &= 0\begin{bmatrix} 1 \\ 2 \end{bmatrix} + 1\begin{bmatrix} -1 \\ 3 \end{bmatrix} \\ \end{align}\]

or the vector

(225)#\[\begin{align} \begin{bmatrix} 1 \\ 2 \end{bmatrix} &= 1\begin{bmatrix} 1 \\ 2 \end{bmatrix} + 0\begin{bmatrix} -1 \\ 3 \end{bmatrix} \end{align}\]

or

(226)#\[\begin{align} \begin{bmatrix} 2 \\ 4 \end{bmatrix} & = 2\begin{bmatrix} 1 \\ 2 \end{bmatrix} + 0\begin{bmatrix} -1 \\ 3 \end{bmatrix} \end{align}\]

One can imagine defining then different vector spaces, and so then it is natural to ask about functions from one vector space to a second.

In otherwords, we can define functions over vectors like

(227)#\[\begin{align} f(v) = \sin(v_{1}/2) + \left(v_{2}\right)^{2} - \sqrt{v_{3}} \end{align}\]

However, an important class of functions are linear functions. A function is linear if

(228)#\[\begin{align} f(\alpha x) &= \alpha f(x) \\ f(x + y) &= f(x) + f(y) \end{align}\]

As it turns out, linear functions have a unique representation. Consider a vector space \(V\) generated by the two vectors \(x_{1}\) and \(x_{2}\). In addition, consider a function \(g\) that is linear and maps the vector space \(V\) to the new space \(Q\).

Then for a vector \(x\) in the space \(V\), we can write

(229)#\[\begin{align} x = \alpha_{1} x_{1} + \alpha_{2} x_{2} \end{align}\]

If we wanted to apply the function \(g\) to the vector \(x\) then

(230)#\[\begin{align} g(x) = g\left(\alpha_{1} x_{1} + \alpha_{2} x_{2}\right) \end{align}\]

However, remember that \(g\) is a linear function. This means that we can simplify the above as

(231)#\[\begin{align} g(x) &= g\left(\alpha_{1} x_{1} + \alpha_{2} x_{2}\right) \\ &= g\left(\alpha_{1} x_{1}\right) + g\left(\alpha_{2} x_{2}\right) \\ &= \alpha_{1} g\left(x_{1}\right) + \alpha_{2} g\left(x_{2}\right) \\ \end{align}\]

But, because \(g\) maps vectors from \(V\) to \(Q\) these new vectors \(g\left(x_{1}\right)\) and \(g\left(x_{2}\right)\) are members of \(Q\). If we suppose that every vector in \(Q\) can be generated from (say) the three vectors \(q_{1}, q_{2}, q_{3}\) then we can rewrite \(g\left(x_{1}\right)\) in terms of these three \(q\)s as well as \(g\left(x_{2}\right)\).

In other words,

(232)#\[\begin{align} g(x) &= \alpha_{1} g\left(x_{1}\right) + \alpha_{2} g\left(x_{2}\right) \\ &= \alpha_{1} \left( \beta^{1}_{1}q_{1} +\beta^{2}_{1}q_{2} +\beta^{3}_{1}q_{3} \right) + \alpha_{2} \left( \beta^{1}_{2}q_{1} +\beta^{2}_{2}q_{2} +\beta^{3}_{2}q_{3} \right) \\ &= \left( \alpha_{1} \beta^{1}_{1} + \alpha_{2} \beta^{1}_{2} \right) q_{1} + \left( \alpha_{1} \beta^{2}_{1} + \alpha_{2} \beta^{2}_{2} \right) q_{2} + \left( \alpha_{1} \beta^{3}_{1} + \alpha_{2} \beta^{3}_{2} \right) q_{3} \end{align}\]

or \(g(x)\) can be represented as the vector

(233)#\[\begin{align} g(x) = \begin{bmatrix} \alpha_{1} \beta^{1}_{1} + \alpha_{2} \beta^{1}_{2} \\ \alpha_{1} \beta^{2}_{1} + \alpha_{2} \beta^{2}_{2} \\ \alpha_{1} \beta^{3}_{1} + \alpha_{2} \beta^{3}_{2} \end{bmatrix} \end{align}\]

Ideally, we would like to be able to represent this linear transformation as an interaction between two mathematical objects: the vector (note: this is any vector) \(x\) represented by coordinate vector \([\alpha_{1} , \alpha_{2}]^{T}\) and a stack of values that were used to represent \(g(x_{1})\) and \(g(x_{2})\). That is we would like some rules to move from the pair

(234)#\[\begin{align} A = \begin{bmatrix} \beta^{1}_{1} & \beta^{1}_{2} \\ \beta^{2}_{1} & \beta^{2}_{2} \\ \beta^{3}_{1} & \beta^{3}_{2} \\ \end{bmatrix} \; \; x = \begin{bmatrix} \alpha_{1} \\ \alpha_{2} \end{bmatrix} \end{align}\]

to the vector

(235)#\[\begin{align} g(x) = \begin{bmatrix} \alpha_{1} \beta^{1}_{1} + \alpha_{2} \beta^{1}_{2} \\ \alpha_{1} \beta^{2}_{1} + \alpha_{2} \beta^{2}_{2} \\ \alpha_{1} \beta^{3}_{1} + \alpha_{2} \beta^{3}_{2} \end{bmatrix} \end{align}\]

Cue the matrix as an object and matrix multiplication .

Matrices#

A matrix is defined as a stack or ordered collection of vectors. Matrices are written as a rectangular grid of values such as

(236)#\[\begin{align} A = \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33}\\ a_{41} & a_{42} & a_{43}\\ \end{bmatrix} \end{align}\]

We can refer to individual elements inside the matrix using this notation \(A_{ij}\). This refers to the element located in row \(i\) and column \(j\). For example, \(A_{24}\) is the value \(a_{24}\) above. We can refer to an entire row of \(A\) usign this notation \(A_{(i,:)}\) and an entire column of \(A\) using this notation \(A_{(:,j)}\).

A natural, function, approach to define the matrix/vector product: Ax#

Based on our discussion of \(g(x)\), the product between a matrix \(A\) and vector \(x\) should represent a linear transformation. Then a natural defintion is to take

(237)#\[\begin{align} A &= \begin{bmatrix} \beta^{1}_{1} & \beta^{1}_{2} \\ \beta^{2}_{1} & \beta^{2}_{2} \\ \beta^{3}_{1} & \beta^{3}_{2} \\ \end{bmatrix} \; \; x = \begin{bmatrix} \alpha_{1} \\ \alpha_{2} \end{bmatrix} \end{align}\]

and define matrix multiplication between a matrix \(A\) and vector \(x\) as

(238)#\[\begin{align} Ax &= \alpha_{1} A_{(:,1)} + \alpha_{2} A_{(:,2)} \\ &= \alpha_{1} \begin{bmatrix} \beta^{1}_{1} \\ \beta^{2}_{1} \\ \beta^{3}_{1} \\ \end{bmatrix} + \alpha_{2} \begin{bmatrix} \beta^{1}_{2} \\ \beta^{2}_{2} \\ \beta^{3}_{2} \\ \end{bmatrix} \\ &= \begin{bmatrix} \alpha_{1} \beta^{1}_{1} + \alpha_{2} \beta^{1}_{2} \\ \alpha_{1} \beta^{2}_{1} + \alpha_{2} \beta^{2}_{2} \\ \alpha_{1} \beta^{3}_{1} + \alpha_{2} \beta^{3}_{2} \end{bmatrix} \end{align}\]

That is, the \(i^{\text{th}}\) entry in the vector \(Ax_{(i)}\) is defined as \( \sum_{k} A_{i,k} x_{k} \)

A natural, function composition approach, to define the matrix produce AB#

We have now dealt with applying a single function \(g\) to the vector \(x\). However, in typical algebra we can map a number with the function \(g\), called \(g(x)\), and then again map that new number with a seconf function \(f\) or \(f( g(x) )\). We should also strive to define function composition for vectors.

Lets consider two functions now, \(g\) and \(f\), as well as the vector \(x\). When we applied the function \(g\) to the vector \(x\) we recovered

(239)#\[\begin{align} g(x) = \begin{bmatrix} \alpha_{1} \beta^{1}_{1} + \alpha_{2} \beta^{1}_{2} \\ \alpha_{1} \beta^{2}_{1} + \alpha_{2} \beta^{2}_{2} \\ \alpha_{1} \beta^{3}_{1} + \alpha_{2} \beta^{3}_{2} \end{bmatrix} \end{align}\]

Now lets consider the function \(f\) maps vectors to a space that can be represented by the set of vectors \(r_{1}, r_{2}, r_{3} \in \mathbb{R}^{2}\). Lets suppose that

(240)#\[\begin{align} r_{1} = \begin{bmatrix} r_{1}^{1}\\ r_{1}^{2}\\ \end{bmatrix}\;\; r_{2} = \begin{bmatrix} r_{2}^{1}\\ r_{2}^{2}\\ \end{bmatrix}\;\; r_{3} = \begin{bmatrix} r_{3}^{1}\\ r_{3}^{2}\\ \end{bmatrix}\;\; \end{align}\]

Based on our above definition for matrix/vecotr multiplication we should define this product as

(241)#\[\begin{align} \begin{bmatrix} r_{1}^{1} & r_{2}^{1} & r_{3}^{1} \\ r_{1}^{2} & r_{2}^{2} & r_{3}^{2} \\ \end{bmatrix} \; \begin{bmatrix} \alpha_{1} \beta^{1}_{1} + \alpha_{2} \beta^{1}_{2} \\ \alpha_{1} \beta^{2}_{1} + \alpha_{2} \beta^{2}_{2} \\ \alpha_{1} \beta^{3}_{1} + \alpha_{2} \beta^{3}_{2} \end{bmatrix} &= \begin{bmatrix} r_{1}^{1}\\ r_{1}^{2}\\ \end{bmatrix} (\alpha_{1} \beta^{1}_{1} + \alpha_{2} \beta^{1}_{2}) + \begin{bmatrix} r_{2}^{1}\\ r_{2}^{2}\\ \end{bmatrix} (\alpha_{1} \beta^{2}_{1} + \alpha_{2} \beta^{2}_{2} ) + \begin{bmatrix} r_{3}^{1}\\ r_{3}^{2}\\ \end{bmatrix} (\alpha_{1} \beta^{3}_{1} + \alpha_{2} \beta^{3}_{2}) \end{align}\]

Further we can attempt again to write this as some algebraic manipulation of our original vector \(x\) that is represented by the coordinates \([\alpha_{1} , \alpha_{2}]^{T}\). Lets first group by these two coordinates.

(242)#\[\begin{align} \left( \begin{bmatrix} r_{1}^{1} \\ r_{1}^{2} \end{bmatrix} \beta_{1}^{1} + \begin{bmatrix} r_{2}^{1} \\ r_{2}^{2} \end{bmatrix} \beta_{1}^{2} + \begin{bmatrix} r_{3}^{1} \\ r_{3}^{2} \end{bmatrix} \beta_{1}^{3} \right) \alpha_{1} + \left( \begin{bmatrix} r_{1}^{1} \\ r_{1}^{2} \end{bmatrix} \beta_{2}^{1} + \begin{bmatrix} r_{2}^{1} \\ r_{2}^{2} \end{bmatrix} \beta_{2}^{2} + \begin{bmatrix} r_{3}^{1} \\ r_{3}^{2} \end{bmatrix} \beta_{2}^{3} \right) \alpha_{2} \end{align}\]

We can simplify this expression even further by recognizing that inside each paraentheses is a matrix/vector multiplication.

(243)#\[\begin{align} \left( \begin{bmatrix} r_{1}^{1} & r_{2}^{1} & r_{3}^{1} \\ r_{1}^{2} & r_{2}^{2} & r_{3}^{2} \end{bmatrix} \begin{bmatrix} \beta_{1}^{1} \\ \beta_{1}^{2} \\ \beta_{1}^{3} \\ \end{bmatrix} \right) \alpha_{1} + \left( \begin{bmatrix} r_{1}^{1} & r_{2}^{1} & r_{3}^{1} \\ r_{1}^{2} & r_{2}^{2} & r_{3}^{2} \end{bmatrix} \begin{bmatrix} \beta_{2}^{1} \\ \beta_{2}^{2} \\ \beta_{2}^{3} \\ \end{bmatrix}\right) \alpha_{2} \end{align}\]

It ma be natural then to simplify this expression by grouping together the “\(\beta\)” vectors and defining multiplication between the “r” matrix and “$\beta” matrix such that is equals the above.

Define

(244)#\[\begin{align} R = \begin{bmatrix} r_{1}^{1} & r_{2}^{1} & r_{3}^{1} \\ r_{1}^{2} & r_{2}^{2} & r_{3}^{2} \\ \end{bmatrix} \end{align}\]

and

(245)#\[\begin{align} B = \begin{bmatrix} \beta_{1}^{1} & \beta_{2}^{1} \\ \beta_{1}^{2} & \beta_{2}^{2} \\ \beta_{1}^{3} & \beta_{2}^{3} \\ \end{bmatrix} \end{align}\]

and the product of \(RB\) as

(246)#\[\begin{align} RB = \begin{bmatrix} r_{1}^{1} & r_{2}^{1} & r_{3}^{1} \\ r_{1}^{2} & r_{2}^{2} & r_{3}^{2} \\ \end{bmatrix} B_{:,1} + \begin{bmatrix} r_{1}^{1} & r_{2}^{1} & r_{3}^{1} \\ r_{1}^{2} & r_{2}^{2} & r_{3}^{2} \\ \end{bmatrix} B_{:,2} \end{align}\]

In other words, we recover the natural “matrix multiplication rule” and see that this is a result of function composition between two functions, one represented by \(R\) and a second represented by \(B\).

(247)#\[\begin{align} (RB)_{ij} = \sum_{k} R_{i,k} B_{k,j} \end{align}\]

The identity function \(f(x) = x\) for vectors can be easily derived now that we know how to represent funcitons as matrices. We are looking for a matrix \(M\) such that \(Mx = M_{:,1} x_{1} + M_{:,2} x_{2} + \cdots = x\) or

(248)#\[\begin{align} M_{:,1} x_{1} + M_{:,2} x_{2} + \cdots = \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \end{bmatrix} \end{align}\]

If we let \(M_{:,1}\) equal the vector with a one in the first position and zeros after, the column \(M_{:,2}\) equals the vector with a one in the second position and zeros otherwise, etc then we recover \(x\).

(249)#\[\begin{align} \begin{bmatrix} 1 \\ 0 \\ \vdots \end{bmatrix} x_{1} + \begin{bmatrix} 0 \\ 1 \\ \vdots \end{bmatrix} x_{2} = \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \end{bmatrix} \end{align}\]

, and so then the matrix

(250)#\[\begin{align} I_{nn} = \begin{bmatrix} 1 & 0 & 0 &\cdots \\ 0 & 1 & 0 &\cdots \\ 0 & 0 & 1 &\cdots \\ \vdots & \vdots & & \ddots \end{bmatrix} \end{align}\]

with ones on the diagonal and zeros otherwise is called the identity matrix.

\(A A^{-1} = I\)#

When given a function \(f\), we can attempt to find a function interverse \(f^{-1}\) such that \(f^{-1} (f(x)) = x\). In the same way, if we are given the mapping \(Ax\) then we can attempt to find an inverse mapping such that

(251)#\[\begin{align} B A x = A B x = I \end{align}\]

If there does exist a matrix with this property then we call that matrix the inverse of \(A\) and denote it \(A^{-1}\). These are the tools from vector and matrix algebra that we need to udnerstadn the MVN.

Expectation algebra#

In addition to vector and matrix algebra, to understand the MVN we need to know how we can manipulate the expected value, variance, and covariance. In otherwords, there is an algebra to these functions that is important to know.

Consider two random variables \(X\) and \(Y\) and the constant \(a\). Then

(252)#\[\begin{align} \mathbb{E}(X+Y) &= \mathbb{E}(X) + \mathbb{E}(Y) \\ \mathbb{E}(a X ) &= a\mathbb{E}(X) \end{align}\]

In otherwords, the expected value is a linear function.

Normal + Normal = Normal#

An inner product view point for the MVN#

Marginal#

Conditional#