Homework 2#

import pandas as pd 
import numpy as np 

disasters = pd.read_csv("https://raw.githubusercontent.com/computationalUncertaintyLab/dexp_book/refs/heads/main/events-US-1980-2024.csv")
disasters = disasters.assign(Year = lambda x: x["End Date"].astype(str).str[:3+1].astype(int) )
disasters = disasters.loc[ disasters["Unadjusted Cost"] !="TBD" ]

#--change these to floating values
disasters["CPI-Adjusted Cost"] = disasters["CPI-Adjusted Cost"].astype(float)
disasters["Unadjusted Cost"]   = disasters["Unadjusted Cost"].astype(float)

Problem one#

The goal for this problem is to add columns to the “disasters” dataset above to make selecting specific types of disasters easier. Use the template code in the notes to add columns for: Floods, Cyclones, Hurricanes, Tornadoes, and Severe Storms.

disasters
Name Disaster Begin Date End Date CPI-Adjusted Cost Unadjusted Cost Deaths Year
0 Southern Severe Storms and Flooding (April 1980) Flooding 19800410 19800417 2742.3 706.8 7 1980
1 Hurricane Allen (August 1980) Tropical Cyclone 19800807 19800811 2230.2 590.0 13 1980
2 Central/Eastern Drought/Heat Wave (Summer-Fall... Drought 19800601 19801130 40480.8 10020.0 1260 1980
3 Florida Freeze (January 1981) Freeze 19810112 19810114 2070.6 572.0 0 1981
4 Severe Storms, Flash Floods, Hail, Tornadoes (... Severe Storm 19810505 19810510 1405.2 401.4 20 1981
... ... ... ... ... ... ... ... ...
393 Central and Northeast Severe Weather (June 2024) Severe Storm 20240624 20240626 1704.0 1704.0 3 2024
394 New Mexico Wildfires (June 2024) Wildfire 20240617 20240707 1700.0 1700.0 2 2024
395 Hurricane Beryl (July 2024) Tropical Cyclone 20240708 20240708 7219.0 7219.0 45 2024
396 Central and Eastern Tornado Outbreak and Sever... Severe Storm 20240713 20240716 2435.0 2435.0 2 2024
397 Hurricane Debby (August 2024) Tropical Cyclone 20240805 20240809 2476.0 2476.0 10 2024

398 rows × 8 columns

Problem two#

Compute the mean, median, standard deviation, interquartile range (and also the 25, 75th percentiles) for Floods, Cyclones, Hurricanes, Tornadoes, and Severe Storms. From this exploratory data analysis, which type of storm appears to be most costly?

Problem three#

Dates are difficult to work with in the computer, but there are many functions in python to help. Our goal will be to convert the begin date for each disaster to a datetime object and then create a new column that determines if the disaster started between 1980 and 1990, 1991 to 2000, and so on.

Date time objects#

Date time objects, like any other object in Python, have a special set of functions for computing typical tasks with dates. The most common module in python is datetime and you can import this module like from datetime import datetime, timedelta.

Parsing Dates#

In the disaster data frame, Begin date is considered a floating point number. We want to convert this number into a datetime object. The most common way to convert a number into a datetime is to use the strptime function. The strptime function stands for “String Parse into Time”. The inputs to strptime are a string that contains the date you wish to format and the how this date was formatted. There are special symbols used to tell Python how your date was formatted. A list of these formats is here = List of format “Directives”.


Example of using strftime

We can convert the following string “2020-03-20” into a datetime object.


from datetime import datetime, timedelta
date_object = datetime.strptime("2020-03-20", "%Y-%m-%d")

One of the attributes of datetime objects is the attribute year. This will extract the date from our datetime object.

date_object.year
2020

Problem 3 Task 1#

Create a function that inputs a string with the format “%Y%m%d” and outputs the year. You will need to import the datetime module. Call this function from_str_to_dt

Problem 3 Task 2#

Apply from_str_to_dt using the asign function in pandas to create a new column in the disasters data frame called “Year”.

Problem 3 Task 3#

Use the assign function to create a new column called “above2000” in the disasters dataframe. This column will equal the value one if the year of the disaster was greater than 2000 and 0 otherwise.

Problem four#

Conduct your own exploratory data analysis of the disasters data frame to determine whether disasters appear to be more or less costly after 2000 (compared to before).