Homework03#
Let
\[\begin{split} A = \{1,2,3,4,5,6\}, \quad B = \{1,3,6\}\; \\ C = \{7\}, \quad D = \emptyset \end{split}\]Compute \(A \cap B\)
Compute \(A \cup C\)
Compute \(A \cup D\)
Compute \(A \cap D\)
Compute \((A \cap B) \cup (C \cup D)\)
Let the sample space \(\mathcal{G} = \{1,2,3,4,5,6,7\}\) and $\( A = \{1,2,3,4,5,6\}, \quad B = \{1,3,6\} \;\\ C = \{7\}, \quad D = \emptyset \)$
Compute \(A^{c}\)
Compute \(B^{c}\)
Compute \(D^{c}\)
Compute \(\mathcal{G} \cap A\)
Is \(A \subset \mathcal{G}\)?
Is \(\emptyset \subset \mathcal{G}\)?
Delfina Szigethy — Class of 2025:
10 students in an Economics class and 10 students in a Biology class are asked to respond to the statement: Pick a number between 0 and 10.
The data set collected for the economics class is:
\(E = \{1, 2, 4, 4, 6, 7, 7, 8, 9, 10\}\)
and the data set collected for the biology class is:
\(B = \{0, 2, 2, 5, 6, 6, 7, 9, 9, 10\}\)Find the sample space \(\mathcal{G}\).
Form the set \(E\) from the responses from the 10 students in Economics.
Form the set \(B\) from the responses from the 10 students in Biology.
Compute \(E \cup B\).
Compute \(E \cap B\).
Are the sets \(E\) and \(B\) disjoint?
Suppose \(P(E) = 0.3\). Compute \(P(E^{c})\).
Let the sample space \(\mathcal{G} = \{0,1,2,a,b,c,d,e,f,...,z\}\) and let \(A = \{0,1\}\) and \(B = \{x \mid x \text{ is a letter of the English alphabet}\}\).
Compute \(A \cap B\)
Compute \(A \cup B\)
Compute \(A^{c}\)
Is \(A \cup B = \mathcal{G}\)?
If we assigned probabilities to all outcomes, could \(P(A \cup B) = 1\)? Why or why not?
Let \(A = \{0,1,2\}\) for some sample space \(\mathcal{G} = \{0,1,2,3,4,5,6\}\). Further assume \(P(A) = 0.2\).
Are the sets \(A\) and \(A^{c}\) disjoint? Why or why not?
Simplify \(P(A \cup A^{c})\) into an expression that involves \(P(A)\) and \(P(A^{c})\).
Use Kolmogorov’s axioms to show that \(P(A) = 1 - P(A^{c})\).
Let \(\mathcal{G} = \{x \mid x \text{ is a positive integer}\}\).
Are the sets \(\emptyset\) and \(\mathcal{G}\) disjoint?
Simplify \(P(\mathcal{G} \cup \emptyset)\) into an expression that involves \(P(\mathcal{G})\) and \(P(\emptyset)\).
Use Kolmogorov’s axioms to show that \(P(\emptyset) = 0\).
If \(A = \{1,2,3\}\) and \(B = \{2,3,4\}\) and \(C = \{1,3\}\):
Can \(P(A) < P(B)\)? Why or why not?
Can \(P(A) < P(C)\)? Why or why not?
Use what you know about intersection, subsets, and probability to show that \(P(A \cap B) \leq P(A)\).
Hint: How are \(A \cap B\) and \(A\) related?
Chinwe Okezie — Class of 2025:
Let \(A = \{1,2,3,4\}\), \(B = \{4,5,6\}\), and \(C = \{2,4,6\}\), assuming the Principle of Equally Likely Outcomes:Can \(P(A) > P(B)\)?
Can \(P(B) > P(A)\)?
Can \(P(C) < P(B)\)?
Audrey Vitello — Class of 2025:
Every year at the town fair there is a pie baking contest among the four best bakers. The best-tasting pie wins, and the outcome associated with this event is the baker’s number in the contest. For example, if baker one wins, the outcome is the value 1; if baker two wins, the outcome is the value 2, and so on.
In the past ten years, Mr. Brown, consistently contestant number 4, has won 3 times.
Compute \(P(E)\) where \(E\) is the event that Mr. Brown, constestant 4, wins.
From what you know about Frequentist assignment of probabilities, does the above estimate of P(4) seem reasonable?
Suppose we wish to study the reemergence of cancer among patients in remission. We collect data on 1,000 patients who are in cancer remission and follow them for 5 years. At five years, we are interested in the probability of a second cancer.
Define a sample space \(\mathcal{G}\) we can use to assign probabilities to a second cancer and no second cancer.
After five years of follow-up, we find that 238 patients experienced a second cancer. Use the frequentist approach to assign probabilities to a second cancer and no second cancer.
If you collected data on 2,000 patients, do you expect the probability of a second cancer to change? How do you expect the probability to be different for 2,000 patients than with 1,000 patients?
A study (link) found that young adults were 32 times more at risk of developing multiple sclerosis (MS) after infection with the Epstein-Barr virus compared to young adults who were not infected by the virus. The experiment enrolled 10 million young adults and observed them for a period of 20 years.
Design a sample space if we wish to study outcomes that describe the number of young adults who develop MS.
What outcomes are in the event \(E_1\): “less than 10% of young adults develop MS”?
What outcomes are in the event \(E_2\): “less than 5% of young adults develop MS”?
Are \(E_1\) and \(E_2\) disjoint? Why or why not?
Can \(P(E_1) < P(E_2)\)?
Kareem Hargrove — Class of 2025:
Suppose we wish to study the five-year incidence of asthma among patients who were infected with SARS-CoV-2. We decide to enroll 10,000 patients who were infected with SARS-CoV-2, follow them for five years, and count the number of patients who were diagnosed with asthma at or before five years of follow-up.
Define a sample space to describe whether a patient is or is not diagnosed with asthma within five years of follow-up.
After the observation period ends, we find 4,973 patients were diagnosed with asthma. Use the frequentist approach to assign a probability of being diagnosed with asthma.
Use the frequentist approach to assign a probability of not being diagnosed with asthma.
Please compute the following:
\(A = \{1,2,3\}\) and \(B = \{4,5,6\}\). Compute \(A \times B\) (Answer should be a set of tuples).
\(A = \{1,2,3\}\). Compute \(A \times A\) (Answer should be a set of tuples).
How many elements are in \(A \times A\)? (Looking for a number)
How many elements are in \(A \times A \times A\)? (Looking for a number)
How many elements are in \(A \times A \times A \times \cdots \times A\) where the Cartesian product is taken \(N\) times? (Looking for a number)
Define a sample space \(\mathcal{G} = \{a, b, c, d, 1, 2, 3, 4, 5\}\) and let:
\(E_1 = \{1,3,5\}\),
\(E_2 = \{a,b,c\}\),
\(E_3 = \{a,d,5\}\)
Assign the following probabilities:
Outcome |
\(P(\{\text{Outcome}\})\) |
|---|---|
a |
0.10 |
b |
0.05 |
c |
0.15 |
d |
0.02 |
1 |
0.14 |
2 |
0.25 |
3 |
0.08 |
4 |
0.04 |
5 |
0.17 |
Compute \(P(E_1)\)
Compute \(P(E_2)\)
Compute \(P(E_3)\)
Compute \(P(E_1 \cap E_2)\)
Compute \(P(E_1 \cap E_3)\)
Compute \(P(E_2 \cap E_3)\)
Compute \(P(E_1 | E_2)\)
Compute \(P(E_1 | E_3)\)
Compute \(P(E_2 | E_3)\)
Compute \(P(E_3 | E_2)\)
Define a sample space:
Let:
\(E_1 = \{(a,1), (a,2), (c,2)\}\)
\(E_2 = \{(c,2), (a,1)\}\)
\(E_3 = \{(b,2)\}\)
Assign the following probabilities:
Outcome |
\(P(\{\text{Outcome}\})\) |
|---|---|
(a,1) |
0.05 |
(b,1) |
0.22 |
(c,1) |
0.15 |
(a,2) |
0.02 |
(b,2) |
0.13 |
(c,2) |
0.43 |
Compute \(P(E_1)\)
Compute \(P(E_2)\)
Compute \(P(E_3)\)
Compute \(P(E_1 \cap E_2)\)
Compute \(P(E_1 \cap E_3)\)
Compute \(P(E_2 \cap E_3)\)
Compute \(P(E_1 | E_2)\)
Compute \(P(E_1 | E_3)\)
Compute \(P(E_2 | E_3)\)
Compute \(P(E_3 | E_2)\)
Imani Ashman — Class of 2024:
A college student has been up studying for their midterm and decides to visit a coffee shop before the exam. The probability that they order a coffee (\(C\)) is \(0.2\), the probability they order a bagel (\(B\)) is \(0.4\), and the probability they order a bagel given that they have already ordered a coffee is \(0.3\). There is also the potential that they don’t order a bagel or a coffee before the midterm (\(N\)). The sample space is defined as:
Compute \(P(B \cap C)\)
Compute \(P(C | B)\)
Compute the probability that they order nothing.
Given two events \(A\) and \(B\), show that \(P(A|B) \geq P(A \cap B)\).
(Looking for a short explanation and mathematical argument.)
Suppose we wish to study adverse outcomes among patients who have unprotected left main disease (NEJM Study).
In this experiment (called a clinical trial), we randomize patients to receive percutaneous intervention (PCI) or a coronary artery bypass graft (CABG). We wish to study the number of patients who received a PCI and experienced a myocardial infarction (MI) between the time they had their procedure and 5 years. However, we only know three pieces of information:The probability a patient was randomized to PCI was \(0.5\)
The probability a patient was randomized to CABG was \(0.5\)
The probability of a MI was \(0.2\) among patients who received a PCI
Define a sample space that will allow us to compute the probability a patient was randomized to PCI and experienced an MI (looking for a description of the sample space and a set).
Compute the probability a patient experiences an MI and was randomized to PCI. (Looking for a number)
Compute the probability a patient does not experience an MI and was randomized to PCI. (Looking for a number)
If two events \(A\) and \(B\) are disjoint, are they also independent?
(Looking for a description and a mathematical argument to back up your description.)
from IPython.display import display, HTML
display(HTML("""
<style>
.callout { padding:12px 14px; border-radius:10px; margin:10px 0; }
.callout.note { background:#eff6ff; border-left:6px solid #3b82f6; }
.callout.warn { background:#fff7ed; border-left:6px solid #f97316; }
.callout.good { background:#ecfdf5; border-left:6px solid #10b981; }
.callout.bad { background:#fef2f2; border-left:6px solid #ef4444; }
</style>
"""))
display(HTML("""
<style>
/* Base callout */
.callout {
padding: 12px 14px;
border-radius: 10px;
margin: 12px 0;
line-height: 1.35;
border-left: 6px solid;
box-shadow: 0 1px 2px rgba(0,0,0,0.06);
}
/* NOTE variant */
.callout.note {
background: #eff6ff; /* light blue */
border-left-color: #3b82f6; /* blue */
}
/* Optional: title line inside */
.callout .title {
font-weight: 700;
margin-bottom: 6px;
}
</style>
"""))
Counting
In class we learned about the Principle of Equally Likely outcomes (PELO). One natural place that PELO appears is when we have a large number of outcomes that are equally likely, and we wish to estimate the probability of events that are related to attribute of those outcomes.
For example, in gambling, we may wish to know the probability of selecting 4 cards of the same kind from a hand dealt to us that contains 5 cards. PELO would assume that there is an equally likely probability that we are dealt any one card. We’re uninterested in the individual cards. Instead, we are interested in the event that four appear that are all the same kind.
Using PELO, the probability of an event (\(E\)) can be computed as
Then, we must be able to count the number of outcomes in an event and in the sample space efficiently.
\textbf{The fundemental theorem of counting} The FTC states that: Suppose we wish to select two items. For the first item, we can select from \(m\) different items. For the second item, we can select from \(n\) different items. Then the total number of ways to select both item one and item two is \(m \cdot n\).
In general, if there are \(n_{1}\) ways to select a first item, \(n_{2}\) ways to select a second item, and so on up until \(n_{r}\) ways to select a \(r^{\text{th}}\) item then there are \(n_{1} \cdot n_{2} \cdot n_{3} \cdots n_{r}\) ways to select all \(r\) items.
There are two natural extensions to the FTC that will help us efficiently count the number of outcomes in events.
Permutation
From our card example above, suppose we are given five cards and that those cards are a 2,3,4,5,6 of spades and wish to count the number of ways in which we could have recieved those cards. That is, we want to count the number of ways to arrange those 5 cards in a specific order. One arrangement is (2,3,4,5,6). A second, different arrangement is (4,3,2,6,5).
One way to view this problem is that there exists five empty “slots” that will each be filled with a card. We know that the cards we were dealt are the five cards: 2,3,4,5,6 of spades. Place all five cards in a bag and draw from that bag a card that will occupy the first empty slot. Because there are five cards, we have five ways to place a card in the first empty “slot”. Then there only remain four cards left in the bag. The number of ways that we can place a card from the bag into the second empty slot is 4. That is, the number of ways to arrange the first to cards is, using the FTC, \(5 \times 4 = 20\) .
Continuing in this way we see that the total number of ways to arrange 5 cards in these five slots is \(5 \times 4 \times 3 \times 2\ times 1\) and in general the number of ways to arrange \(r\) items selected from a pool of \(n\) possible items is \(n \times (n-1) \times (n-2) \cdots \times (n-(r-1)) \). The number of ways to arrange r items from a pool of n possible items is called the permutation and is written \(_{n}P_{r}\)
A useful tool in counting is the factorial. The factorial is a function that inputs an integer and returns the following product \(n! = n \times (n-1) \times (n-2) \times \cdots 2 \times 1\)
We can use the factorial to rewrite \(_{n}P_{r}\).
Example
Prof m rents 10 books from FML library. Three books are about statistics. Five books are about biology. Two books are about Victorian literature. How many different ways can Prof m arrange these books such that they are grouped together by discipline but otherwise, the individuals book order doesnt matter?
Because there are just three disciplines, there exist three empty “slots”—one per discpline. The number of ways that we can arrange these three disciplines is \(_{3}P_{3} = \frac{3!}{0!}\). By definition, \(0!=1\) and so \(_{3}P_{3} = 3! = 6\).
Example
The above scenario but now the ordering of the individual books is important.
We can break this problem down into two stages: stage one is the number of ways to order the different disciplines and after we order the disciplines, the number of ways to order the individual sets of books in each discipline.
Well we know that there are \(3!\) ways to order the three discplines. There are, by the same reasoning, \(3!\) ways to order the stats books, \(5!\) ways to order the biology books, and \(2!\) ways to order the Victorian books.
This means that there are a total of \((3!) \times (3! \cdot 5! \cdot 2!) = 8640\) ways to arrange these ten books such that they are grouped by discipline.
Combination
Suppose we wish to count the number of ways that we can select \(r\) items from a pool of \(n\) total items. But, we do not care about the arrangement of those \(r\) items. This is called the Combination and is denoted \(_{n}C_{r}\).
To discover a formula for \(_{n}C_{r}\), we can turn to the permuation, \(_{n}P_{r}\), and the FTC. One way to select items for the permutation is: (1) first select \(r\) items from the pool of \(n\), and then (2) arrange those \(r\) selected items. By the FTC, we could take the number of ways from (1) and multiply that by the number of ways in (2) to arrive at \(_{n}P_{r}\). In other words,
The first expression is exactly \(_{n}C_{r}\). The number of different ways to arrange \(r\) items from a pool of \(r\) total items is \(_{r}P_{r} = r!\). Then
Example How many ways can we create a subset of size \(4\) from a set of \(10\) items? Note here that we dont care about the arrangment of items because a set has no inherent ordering.
This is then just \(_{10}C_{4} = \frac{10!}{4!(10-4)!} = \frac{10!}{4!6!} = 210\)
21a Assume you are playing cards with a standard 52-card deck. A 52-card deck has four suits (spades, hearts, clubs, diamonds). Within each suit there are thirteen different values: 2,3,4,5,6,7,8,9,10,Jack, Queen, King, Ace.
Compute the probability of recieving a hand with five total cards and where four cards have the same value.
21b Write a python program to draw 10,000 5-card hands and estimate the event that there are four cards with the same value (ie estimate 21a using Frequentism).
21c. Which computation above is more accurate? Why?