{ "cells": [ { "cell_type": "markdown", "id": "31fcb45e-dc0a-44d8-ac15-90d9788839be", "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "# Homework 2" ] }, { "cell_type": "code", "execution_count": 16, "id": "b8b3d4c7-5b9a-4726-b0e7-2c2ac6920ee0", "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'Mean': 4578.118181818182,\n", " 'Median': 2542.2,\n", " 'Standard Deviation': 7122.264318774445,\n", " '25th percentile': 1565.425,\n", " '75th percentile': 4383.825000000001,\n", " 'IQR': 2818.4000000000005}" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd \n", "import numpy as np \n", "\n", "disasters = pd.read_csv(\"https://raw.githubusercontent.com/computationalUncertaintyLab/dexp_book/refs/heads/main/events-US-1980-2024.csv\")\n", "disasters = disasters.assign(Year = lambda x: x[\"End Date\"].astype(str).str[:3+1].astype(int) )\n", "disasters = disasters.loc[ disasters[\"Unadjusted Cost\"] !=\"TBD\" ]\n", "\n", "#--change these to floating values\n", "disasters[\"CPI-Adjusted Cost\"] = disasters[\"CPI-Adjusted Cost\"].astype(float)\n", "disasters[\"Unadjusted Cost\"] = disasters[\"Unadjusted Cost\"].astype(float)\n" ] }, { "cell_type": "markdown", "id": "9e01927e-6eda-4e89-aaee-7e7aec7c568c", "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Problem one \n", "\n", "The goal for this problem is to add columns to the \"disasters\" dataset above to make selecting specific types of disasters easier. Use the template code in the notes to add columns for: Floods, Cyclones, Hurricanes, Tornadoes, and Severe Storms." ] }, { "cell_type": "code", "execution_count": 13, "id": "57ca51b2-0121-484f-896d-f90826757d6f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameDisasterBegin DateEnd DateCPI-Adjusted CostUnadjusted CostDeathsYear
0Southern Severe Storms and Flooding (April 1980)Flooding19800410198004172742.3706.871980
1Hurricane Allen (August 1980)Tropical Cyclone19800807198008112230.2590131980
2Central/Eastern Drought/Heat Wave (Summer-Fall...Drought198006011980113040480.81002012601980
3Florida Freeze (January 1981)Freeze19810112198101142070.657201981
4Severe Storms, Flash Floods, Hail, Tornadoes (...Severe Storm19810505198105101405.2401.4201981
...........................
395Hurricane Beryl (July 2024)Tropical Cyclone202407082024070872197219452024
396Central and Eastern Tornado Outbreak and Sever...Severe Storm20240713202407162435243522024
397Hurricane Debby (August 2024)Tropical Cyclone202408052024080924762476102024
398Hurricane Helene (September 2024)Tropical Cyclone2024092420240929TBDTBD2252024
399Hurricane Milton (August 2024)Tropical Cyclone2024100920241010TBDTBD242024
\n", "

400 rows × 8 columns

\n", "
" ], "text/plain": [ " Name Disaster \\\n", "0 Southern Severe Storms and Flooding (April 1980) Flooding \n", "1 Hurricane Allen (August 1980) Tropical Cyclone \n", "2 Central/Eastern Drought/Heat Wave (Summer-Fall... Drought \n", "3 Florida Freeze (January 1981) Freeze \n", "4 Severe Storms, Flash Floods, Hail, Tornadoes (... Severe Storm \n", ".. ... ... \n", "395 Hurricane Beryl (July 2024) Tropical Cyclone \n", "396 Central and Eastern Tornado Outbreak and Sever... Severe Storm \n", "397 Hurricane Debby (August 2024) Tropical Cyclone \n", "398 Hurricane Helene (September 2024) Tropical Cyclone \n", "399 Hurricane Milton (August 2024) Tropical Cyclone \n", "\n", " Begin Date End Date CPI-Adjusted Cost Unadjusted Cost Deaths Year \n", "0 19800410 19800417 2742.3 706.8 7 1980 \n", "1 19800807 19800811 2230.2 590 13 1980 \n", "2 19800601 19801130 40480.8 10020 1260 1980 \n", "3 19810112 19810114 2070.6 572 0 1981 \n", "4 19810505 19810510 1405.2 401.4 20 1981 \n", ".. ... ... ... ... ... ... \n", "395 20240708 20240708 7219 7219 45 2024 \n", "396 20240713 20240716 2435 2435 2 2024 \n", "397 20240805 20240809 2476 2476 10 2024 \n", "398 20240924 20240929 TBD TBD 225 2024 \n", "399 20241009 20241010 TBD TBD 24 2024 \n", "\n", "[400 rows x 8 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "disasters" ] }, { "cell_type": "markdown", "id": "3b04c681-1a9a-478b-90a9-c0df2a60e73e", "metadata": {}, "source": [ "## Problem two \n", "\n", "Compute the mean, median, standard deviation, interquartile range (and also the 25, 75th percentiles) for Floods, Cyclones, Hurricanes, Tornadoes, and Severe Storms. \n", "From this exploratory data analysis, which type of storm appears to be most costly? " ] }, { "cell_type": "code", "execution_count": null, "id": "2eb7fddb-f733-4964-ab3b-12734a42e0d8", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "0e9abdce-5a23-48b3-98d6-d0950e0ac7ee", "metadata": {}, "source": [ "## Problem three" ] }, { "cell_type": "markdown", "id": "d8addb41-a4ed-4cc1-89f8-57c692efdd0e", "metadata": {}, "source": [ "Dates are difficult to work with in the computer, but there are many functions in python to help. \n", "Our goal will be to convert the begin date for each disaster to a datetime object and then create a new column that determines if the disaster started between 1980 and 1990, 1991 to 2000, and so on. \n", "\n", "### Date time objects\n", "Date time objects, like any other object in Python, have a special set of functions for computing typical tasks with dates.\n", "The most common module in python is ```datetime``` and you can import this module like \n", "```from datetime import datetime, timedelta```.\n", "\n", "#### Parsing Dates\n", "\n", "In the disaster data frame, Begin date is considered a floating point number. \n", "We want to convert this number into a datetime object. \n", "The most common way to convert a number into a datetime is to use the **strptime** function. \n", "The **strptime** function stands for \"String Parse into Time\".\n", "The inputs to **strptime** are a string that contains the date you wish to format and the how this date was formatted. There are special symbols used to tell Python how your date was formatted. A list of these formats is here = [List of format \"Directives\"](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior).\n", "\n", "--- \n", "\n", "*Example of using strftime*\n", "\n", "We can convert the following string \"2020-03-20\" into a datetime object. \n", "\n", "---" ] }, { "cell_type": "code", "execution_count": 14, "id": "00251455-43cd-441c-ac7a-3585f142405b", "metadata": {}, "outputs": [], "source": [ "from datetime import datetime, timedelta\n", "date_object = datetime.strptime(\"2020-03-20\", \"%Y-%m-%d\")" ] }, { "cell_type": "markdown", "id": "436e0159-f64b-4d05-80f6-77e059c89c2d", "metadata": {}, "source": [ "One of the attributes of datetime objects is the attribute ```year```. \n", "This will extract the date from our datetime object. " ] }, { "cell_type": "code", "execution_count": 15, "id": "755ece99-25d6-49e9-89c4-b27fdb764307", "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "2020" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "date_object.year" ] }, { "cell_type": "markdown", "id": "f689d53d-94a8-4464-9d6a-8493b323aaff", "metadata": {}, "source": [ "### Problem 3 Task 1\n", "\n", "Create a function that inputs a string with the format \"%Y%m%d\" and outputs the year. \n", "You will need to import the datetime module. Call this function ```from_str_to_dt```" ] }, { "cell_type": "markdown", "id": "9f270939-da25-41d0-8b6f-0660d1c5726f", "metadata": {}, "source": [ "### Problem 3 Task 2\n", "\n", "Apply ```from_str_to_dt``` using the asign function in pandas to create a new column in the disasters data frame called \"Year\". " ] }, { "cell_type": "markdown", "id": "5f48df8d-fc5d-407b-855e-84d9d9b4d9e0", "metadata": {}, "source": [ "### Problem 3 Task 3\n", "\n", "Use the assign function to create a new column called \"above2000\" in the disasters dataframe. \n", "This column will equal the value one if the year of the disaster was greater than 2000 and 0 otherwise. " ] }, { "cell_type": "markdown", "id": "2a9e0453-99dc-4e8d-865d-72d24b00bedf", "metadata": {}, "source": [ "## Problem four\n", "\n", "Conduct your own exploratory data analysis of the disasters data frame to determine whether disasters appear to be more or less costly after 2000 (compared to before). " ] }, { "cell_type": "code", "execution_count": null, "id": "8faa2677-df6d-4e2e-99ad-68c24b9eacd3", "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" } }, "nbformat": 4, "nbformat_minor": 5 }