Forward problem: (Inversion sampling)

Forward problem: (Inversion sampling)#

There are many different definitions for what is commonly called “The forward problem”. We will define the forward problem as the following task:

Given a (1) model and (2) parameter values then produce a feasible set of observations

Definition of a model#

What do we mean when we say “a model”? A model—in the most abstract terminology—is a collection of probability distributions \(\mathcal{P}\) over a sample space \(\mathcal{G}\) of potential measurements. In other words, we can define a model as the following two objects

(179)#\[\begin{align} \mathcal{P} &= \{ F_{\theta} | \theta \in \Theta \}\\ \mathcal{G} &= \{ (x_{1},x_{2}, \cdots, x_{k}) | x_{k} \in \mathbb{R} \} \end{align}\]

where the input for \(F_{\theta}\), the cumulative density function (cdf), is all possible points in the sample space \(\mathcal{G}\). That is, we assume that \(F_{\theta}\) is a cdf that can input points like \((x_{1},x_{2}, \cdots, x_{k})\). Note that every point \(\theta \in \Theta\) will specify exactly one assignment of probabilities to all points in the sample space.

This definition of a model is abstract, but encompasses almost all feasible types of models that we can specify.

Example

Suppose we work for a public health office and are asked to begin modeling the incidence of influenza over during the typical 32-week influenza season. We assume that, for the person requesting this model, it is sufficient to provide a probability density over the potential number of weekly lab-confirmed cases of influenza in the public health office’s jurisdiction. The number of cases that we could observe (could measure) in one week starts at 0 (no cases this week) and end at the total number of individuals living in the jurisdiction (we will call this value \(N\)).

Then the sample space is \(\mathcal{G} = \{ (x_{1},x_{2}, \cdots, x_{32}) \;| \; x_{k} \in [0,N] \}\)

Further, we will assume that the number of cases each week is drawn from a Poisson distribution with parameter \(\lambda\). That is, for week \(k\), we assume \(x_{k} \sim \text{Poisson}(\lambda_{k})\). If we further assume that the number of cases in week \(k\) is statistically independent from cases in week \(l\) then the probability of measuring less than \(x_{k}\) cases in week \(k\) and less than \(x_{l}\) cases in week \(l\) equals

(180)#\[\begin{align} F(x_{k},x_{l}) = F(x_{k} | \lambda_{k}) \cdot F(x_{l} | \lambda_{l}) \end{align}\]

where \((\lambda_{k}, \lambda_{l}) \in \mathbb{R}^{+} \times \mathbb{R}^{+}\)

If we can write down our collection of probabilities for two points then we can write this collection for \(32\) points

(181)#\[\begin{align} \mathcal{P} = \left\{\prod_{k=1}^{32} F(x_{k} | \lambda_{k}) \;|\; (\lambda_{1},\lambda_{2},\cdots,\lambda_{32} ) \in (\mathbb{R}^{+})^{32} \right\}\\ \end{align}\]

We can generate a dataset—a tuple of possible measurements—from the above model if we are given a set of 32 parameter values and a method for drawing values from the Poisson distribution.