Homework 2#
import pandas as pd
import numpy as np
disasters = pd.read_csv("https://raw.githubusercontent.com/computationalUncertaintyLab/dexp_book/refs/heads/main/events-US-1980-2024.csv")
disasters = disasters.assign(Year = lambda x: x["End Date"].astype(str).str[:3+1].astype(int) )
disasters = disasters.loc[ disasters["Unadjusted Cost"] !="TBD" ]
#--change these to floating values
disasters["CPI-Adjusted Cost"] = disasters["CPI-Adjusted Cost"].astype(float)
disasters["Unadjusted Cost"] = disasters["Unadjusted Cost"].astype(float)
Problem one#
The goal for this problem is to add columns to the “disasters” dataset above to make selecting specific types of disasters easier. Use the template code in the notes to add columns for: Floods, Cyclones, Hurricanes, Tornadoes, and Severe Storms.
disasters
Name | Disaster | Begin Date | End Date | CPI-Adjusted Cost | Unadjusted Cost | Deaths | Year | |
---|---|---|---|---|---|---|---|---|
0 | Southern Severe Storms and Flooding (April 1980) | Flooding | 19800410 | 19800417 | 2742.3 | 706.8 | 7 | 1980 |
1 | Hurricane Allen (August 1980) | Tropical Cyclone | 19800807 | 19800811 | 2230.2 | 590.0 | 13 | 1980 |
2 | Central/Eastern Drought/Heat Wave (Summer-Fall... | Drought | 19800601 | 19801130 | 40480.8 | 10020.0 | 1260 | 1980 |
3 | Florida Freeze (January 1981) | Freeze | 19810112 | 19810114 | 2070.6 | 572.0 | 0 | 1981 |
4 | Severe Storms, Flash Floods, Hail, Tornadoes (... | Severe Storm | 19810505 | 19810510 | 1405.2 | 401.4 | 20 | 1981 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
393 | Central and Northeast Severe Weather (June 2024) | Severe Storm | 20240624 | 20240626 | 1704.0 | 1704.0 | 3 | 2024 |
394 | New Mexico Wildfires (June 2024) | Wildfire | 20240617 | 20240707 | 1700.0 | 1700.0 | 2 | 2024 |
395 | Hurricane Beryl (July 2024) | Tropical Cyclone | 20240708 | 20240708 | 7219.0 | 7219.0 | 45 | 2024 |
396 | Central and Eastern Tornado Outbreak and Sever... | Severe Storm | 20240713 | 20240716 | 2435.0 | 2435.0 | 2 | 2024 |
397 | Hurricane Debby (August 2024) | Tropical Cyclone | 20240805 | 20240809 | 2476.0 | 2476.0 | 10 | 2024 |
398 rows × 8 columns
Problem two#
Compute the mean, median, standard deviation, interquartile range (and also the 25, 75th percentiles) for Floods, Cyclones, Hurricanes, Tornadoes, and Severe Storms. From this exploratory data analysis, which type of storm appears to be most costly?
Problem three#
Dates are difficult to work with in the computer, but there are many functions in python to help. Our goal will be to convert the begin date for each disaster to a datetime object and then create a new column that determines if the disaster started between 1980 and 1990, 1991 to 2000, and so on.
Date time objects#
Date time objects, like any other object in Python, have a special set of functions for computing typical tasks with dates.
The most common module in python is datetime
and you can import this module like
from datetime import datetime, timedelta
.
Parsing Dates#
In the disaster data frame, Begin date is considered a floating point number. We want to convert this number into a datetime object. The most common way to convert a number into a datetime is to use the strptime function. The strptime function stands for “String Parse into Time”. The inputs to strptime are a string that contains the date you wish to format and the how this date was formatted. There are special symbols used to tell Python how your date was formatted. A list of these formats is here = List of format “Directives”.
Example of using strftime
We can convert the following string “2020-03-20” into a datetime object.
from datetime import datetime, timedelta
date_object = datetime.strptime("2020-03-20", "%Y-%m-%d")
One of the attributes of datetime objects is the attribute year
.
This will extract the date from our datetime object.
date_object.year
2020
Problem 3 Task 1#
Create a function that inputs a string with the format “%Y%m%d” and outputs the year.
You will need to import the datetime module. Call this function from_str_to_dt
Problem 3 Task 2#
Apply from_str_to_dt
using the asign function in pandas to create a new column in the disasters data frame called “Year”.
Problem 3 Task 3#
Use the assign function to create a new column called “above2000” in the disasters dataframe. This column will equal the value one if the year of the disaster was greater than 2000 and 0 otherwise.
Problem four#
Conduct your own exploratory data analysis of the disasters data frame to determine whether disasters appear to be more or less costly after 2000 (compared to before).