Homework 2

Homework 2#

import pandas as pd 
import numpy as np 

disasters = pd.read_csv("https://raw.githubusercontent.com/computationalUncertaintyLab/dexp_book/refs/heads/main/events-US-1980-2024.csv")
disasters = disasters.assign(Year = lambda x: x["End Date"].astype(str).str[:3+1].astype(int) )
disasters = disasters.loc[ disasters["Unadjusted Cost"] !="TBD" ]

#--change these to floating values
disasters["CPI-Adjusted Cost"] = disasters["CPI-Adjusted Cost"].astype(float)
disasters["Unadjusted Cost"]   = disasters["Unadjusted Cost"].astype(float)

Problem one#

The goal for this problem is to add columns to the “disasters” dataset above to make selecting specific types of disasters easier. Use the template code in the notes to add columns for: Floods, Cyclones, Hurricanes, Tornadoes, and Severe Storms.

disasters

	Name	Disaster	Begin Date	End Date	CPI-Adjusted Cost	Unadjusted Cost	Deaths	Year
0	Southern Severe Storms and Flooding (April 1980)	Flooding	19800410	19800417	2742.3	706.8	7	1980
1	Hurricane Allen (August 1980)	Tropical Cyclone	19800807	19800811	2230.2	590.0	13	1980
2	Central/Eastern Drought/Heat Wave (Summer-Fall...	Drought	19800601	19801130	40480.8	10020.0	1260	1980
3	Florida Freeze (January 1981)	Freeze	19810112	19810114	2070.6	572.0	0	1981
4	Severe Storms, Flash Floods, Hail, Tornadoes (...	Severe Storm	19810505	19810510	1405.2	401.4	20	1981
...	...	...	...	...	...	...	...	...
393	Central and Northeast Severe Weather (June 2024)	Severe Storm	20240624	20240626	1704.0	1704.0	3	2024
394	New Mexico Wildfires (June 2024)	Wildfire	20240617	20240707	1700.0	1700.0	2	2024
395	Hurricane Beryl (July 2024)	Tropical Cyclone	20240708	20240708	7219.0	7219.0	45	2024
396	Central and Eastern Tornado Outbreak and Sever...	Severe Storm	20240713	20240716	2435.0	2435.0	2	2024
397	Hurricane Debby (August 2024)	Tropical Cyclone	20240805	20240809	2476.0	2476.0	10	2024

398 rows × 8 columns

Problem two#

Compute the mean, median, standard deviation, interquartile range (and also the 25, 75th percentiles) for Floods, Cyclones, Hurricanes, Tornadoes, and Severe Storms. From this exploratory data analysis, which type of storm appears to be most costly?

Problem three#

Dates are difficult to work with in the computer, but there are many functions in python to help. Our goal will be to convert the begin date for each disaster to a datetime object and then create a new column that determines if the disaster started between 1980 and 1990, 1991 to 2000, and so on.

Date time objects#

Date time objects, like any other object in Python, have a special set of functions for computing typical tasks with dates. The most common module in python is datetime and you can import this module like from datetime import datetime, timedelta.

Parsing Dates#

In the disaster data frame, Begin date is considered a floating point number. We want to convert this number into a datetime object. The most common way to convert a number into a datetime is to use the strptime function. The strptime function stands for “String Parse into Time”. The inputs to strptime are a string that contains the date you wish to format and the how this date was formatted. There are special symbols used to tell Python how your date was formatted. A list of these formats is here = List of format “Directives”.

Example of using strftime

We can convert the following string “2020-03-20” into a datetime object.

from datetime import datetime, timedelta
date_object = datetime.strptime("2020-03-20", "%Y-%m-%d")

One of the attributes of datetime objects is the attribute year. This will extract the date from our datetime object.

date_object.year

Problem 3 Task 1#

Create a function that inputs a string with the format “%Y%m%d” and outputs the year. You will need to import the datetime module. Call this function from_str_to_dt

Problem 3 Task 2#

Apply from_str_to_dt using the asign function in pandas to create a new column in the disasters data frame called “Year”.

Problem 3 Task 3#

Use the assign function to create a new column called “above2000” in the disasters dataframe. This column will equal the value one if the year of the disaster was greater than 2000 and 0 otherwise.

Problem four#

Conduct your own exploratory data analysis of the disasters data frame to determine whether disasters appear to be more or less costly after 2000 (compared to before).