Homework 04#

The goal for this homework will be to learn more about how you can use groupby and how merging can be used to improve your ability to generate hypotheses.

We will use the FluSurv Net dataset provided by the WHO. This dataset is linked on coursesite and called FluNetData[...]. This dataset has recorded, for each country, the number of lab-confirmed cases of influenza from sentinal sites (Sent), non-sentinal sites (Non-Sent), and sites that have not yet been defined as Sent or Non-Sent. The WHO has a dashboard for this data here

A sentinel surveillance site (Sent site) is a single or small number of health facilities that are responsible for collecting data on cases enrolled with the case definition. In our case, these facilities are recording those who meet the case definition for influenza.

We will also use a dataset that has recorded countries and whether or not that country belongs to the Northern or Southern Hemisphere.

Problem one#

  1. Read in the FluNet dataset

  2. Read in the “Hemisphere” dataset

  3. Merge together the FluNet and Hemisphere dataset so that you have a combined dataset. This combined dataset should include the column “Hemisphere” as well as columns for country, flu tests, and flu positives. Lets call this combined dataset d.

Problem two#

  1. Split this dataset into two: a dataset called us that includes only observations for the “United States of America”, and a second dataset called aus that includes only observations for “Australia”

  2. Create a new dataset called either_one that removes those countries that are in both the north and south hemispheres. These countries are denoted with the string “Both” in the “Hemispere” column.

Problem three#

  1. Create a new column called pct_positive that equals the number of positive influenza cases (the column “Influenza positive”) divided by the number of test (called “Speciman tested”).

Problem four#

  1. Use seaborn lineplot to plot the percent positive influenza by “Week start date (ISO 8601 calendar)” for the either_one. Use the hue keywoord to stratify this plot by Northern and Southern Hemispehre.

  2. What do you observe? What might this plot suggest about the interplat between North and South Hemispehric influenza?

Problem five#

  1. Use the split-apply-combine paradigm (ie Pandas Groupby) to build a dataset aus_total. aus_total will sum over Sent, Non-Sent, and Undefined sites: the number of influenza cases and the number of specimens collected for Australia.

  2. Use the split-apply-combine paradigm (ie Pandas Groupby) to build a dataset usa_total. usa_total will sum over Sent, Non-Sent, and Undefined sites: the number of influenza cases and the number of specimens collected for the USA.

  3. For datasets in (9) and (10) compute the percent positive as number of influenza cases divded by number of specimens collected.

  4. Plot the percent positive influenza for Australia (aus_total).

  5. Plot the Plot the percent positive influenza for the USA (usa_total) on the same plot.

  6. Make sure that AUS and USA lines are different colors and that the graph is properly annotated.

Problem six#

  1. Read the documentation about rolling windows here = Pandas Rolling Window Docs

  2. For the aus_total dataset create a new column that is a rolling mean of the percent positive influenza with window size 10 time units.

  3. For the us_total dataset create a new column that is a rolling mean of the percent positive influenza with window size 10 time units.

  4. Plot the rolling means for Australia and for the US.