Homework 4#
Bernoulli
Let \(X \sim \text{Bernoulli}(0.2)\)
P(X=0) = ?
P(X=1) = ?
Please compute \(\mathbb{E}(X)\)
Please compute \(V(X)\)
Define the \(supp(X)\)
Let \(Y \sim \text{Bernoulli}(\theta)\), and show that \(P(Y=1) = \mathbb{E}(Y)\)
Let \(Y \sim \text{Bernoulli}(\theta)\), and show that \(V(Y) \leq \mathbb{E}(Y)\)
A new vaccine for a contagious disease is tested, and the probability that a vaccinated person does not get infected upon exposure is 0.9. If a person is vaccinated and exposed to the disease, what is the probability that they:
Get infected?
Do not get infected?
Binomial
Define a random variable \(R\) with a binomial distribution \((R \sim \text{Bin}(10,0.2))\).
Compute \(\mathbb{E}(R)\)
Compute \(V(R)\)
Describe to someone who may not have statistical expertise what \(P(R=3)\) means? Be sure to include assumptions about the Binomial distribution and how the parameters \(N,\theta\) relate to this probability.
For what value of \(\theta\) is \(V(R)\) the highest? Why does this make sense intuitively?
In a town where 10% of the population is susceptible to a new flu strain, 15 people are randomly selected. Let X be the number of susceptible individuals in this group. From previous analysis, if more than 70% of the population is susceptible to this disease then there will likely be a massive outbreak. Compute:
a) The probability that exactly 3 people are susceptible.
b) The probability that at most 2 people are susceptible.
c) The expected number of susceptible individuals. d) Please run another analysis using the binomial distribution and discuss what you would communicate to experts in the study of this new flu strain?Suppose you decide to develop a model for how the number of infections propoate over time in a finite population where no travel is allowe into or out of the system. You define a series of random variables \(I_{t}\) that describe the number of infected individuals (called infectors) at time \(t\). The model you decide on is \begin{align} I_{t+1} \sim \text{Binomial}(S_{t}, 1 - (1-p)^{i_{t}}) \end{align} where \(S_{t}\) is the number of individuals susceptible to disease at time \(t\), \(p\) is probability that an infector passes their pathogen to a susceptible individual upon contact, and \(i_{t}\) is the observed number of infectors at time \(t\).
Please compute an expression for the number of infected individuals at time \(t+1\)
We know that for \(X \sim \text{Binomial}(N,\theta)\) when \(N=1\) the Binomial distributions reduces to the Bernoulli distribution. Using this idea, can you please reason about what \(1-(1-p)^{t_{t}}\) is describing?
Let there be 10,000 susceptibles individuals and 10 infectors at time \(t\). Let \(p\)=0.02. Please compute \(P( I_{t+1} > 10 )\) and provide an explain what information that probability is communicating to you?
Poisson
Suppose \(Y \sim \text{Pois}(2)\)
Compute \(P(Y=2)\)
Compute \(P(Y \leq 2)\)
Compute \(P(Y > 2)\)
The number of patients arriving at a hospital’s emergency room follows a Poisson distribution with an average of 5 arrivals per hour.
What is the probability that exactly 3 patients arrive in the next hour?
What is the probability that at least 150 patients arrive in the next day?
Geometric
We wish to compute the probability of an event on a continuous time interval, rather than a discrete interval. Define a time interval that starts at zero and ends at \(T\), break that interval into pieces of wideth \(h\), and then define a random variable \(Y \sim \text{Geometric}( p \cdot h )\).
If we divide the interval from 0 to \(t\) into pieces of size \(h\) then please express the number of pieces in terms of \(h\) and of the length of this interval.
Write down the probability mass function for \(Y\) and reason what would happen to this pmf as \(h \to 0\). In other words, evaluate \( \lim_{h \to 0} f_{Y}(y) \).
Use this new probability mass function that you derived above to compute the probability that, given \(p=1/2\) and \(t=10\), \(P(Y < 1)\).
A clinic tests individuals for a disease where 10% of tests come back positive. If patients are tested one at a time, what is the probability that:
The first positive test occurs on the 3rd test?
The first positive test occurs before the 5th test?
Computing
Given a random variable \(X\), the cumualative distribution function is defined as \(F_{X}(x) = P(X \leq x)\). Lets consider the dataset generated below called \(d\). This is a list of 1000 values.
Lets estimate the cdf at value \(x\) as the number of data points less than or equal to the value \(x\) divided by the number of data points. Create a python function that inputs the value \(x\) and returns an estimate of the cdf.
Find the minimum \(x\) from \(d\), called \(x_{\text{min}}\) and the maximum \(x\) from \(d\), called \(x_{\text{max}}\) and use the function above to plot \((x, F_{X}(x))\) for 100 values between \(x_{\text{min}}\) and \(x_{\text{max}}\)
import numpy as np
d = 2*np.random.normal(0,1,size=1000) + 10*np.random.random(size=1000) -2