{ "cells": [ { "cell_type": "markdown", "id": "d2a3fe29-64a2-4786-b3dd-c84ae710d009", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "# Chapter 03: Data visualization\n", "\n", "```{contents} Table of Contents\n", ":depth: 3\n", "```\n", "\n", "Data visualization is another piece of exploratory data analysis. \n", "Visuals are, technically, a mapping from a dataset to some graphic.\n", "They are meant as summaries and not meant to be an exhaustive tabulation of all data points in a data frame. \n", "\n", "The foundation for plotting in Python is [matplotlib](https://matplotlib.org/stable/). \n", "We'll also explore the [seaborn visualization library](https://seaborn.pydata.org/), and, in the homework, you'll learn about another popular viz tool called [altair](https://altair-viz.github.io/). \n", "\n", "\n", "As our example, we'll use a classic dataset called the \"heart disease\" dataset.\n", "The abstract for this dataset reads\n", "\n", "*This data set dates from 1988 and consists of four databases: Cleveland, Hungary, Switzerland, and Long Beach V. It contains 76 attributes, including the predicted attribute, but all published experiments refer to using a subset of 14 of them. The \"target\" field refers to the presence of heart disease in the patient. It is integer valued 0 = no disease and 1 = disease.*" ] }, { "cell_type": "code", "execution_count": 21, "id": "c5796aae-534b-417e-9c8e-6b592f078c7f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | age | \n", "sex | \n", "cp | \n", "trestbps | \n", "chol | \n", "fbs | \n", "restecg | \n", "thalach | \n", "exang | \n", "oldpeak | \n", "slope | \n", "ca | \n", "thal | \n", "target | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "52 | \n", "1 | \n", "0 | \n", "125 | \n", "212 | \n", "0 | \n", "1 | \n", "168 | \n", "0 | \n", "1.0 | \n", "2 | \n", "2 | \n", "3 | \n", "0 | \n", "
1 | \n", "53 | \n", "1 | \n", "0 | \n", "140 | \n", "203 | \n", "1 | \n", "0 | \n", "155 | \n", "1 | \n", "3.1 | \n", "0 | \n", "0 | \n", "3 | \n", "0 | \n", "
2 | \n", "70 | \n", "1 | \n", "0 | \n", "145 | \n", "174 | \n", "0 | \n", "1 | \n", "125 | \n", "1 | \n", "2.6 | \n", "0 | \n", "0 | \n", "3 | \n", "0 | \n", "
3 | \n", "61 | \n", "1 | \n", "0 | \n", "148 | \n", "203 | \n", "0 | \n", "1 | \n", "161 | \n", "0 | \n", "0.0 | \n", "2 | \n", "1 | \n", "3 | \n", "0 | \n", "
4 | \n", "62 | \n", "0 | \n", "0 | \n", "138 | \n", "294 | \n", "1 | \n", "1 | \n", "106 | \n", "0 | \n", "1.9 | \n", "1 | \n", "3 | \n", "2 | \n", "0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1020 | \n", "59 | \n", "1 | \n", "1 | \n", "140 | \n", "221 | \n", "0 | \n", "1 | \n", "164 | \n", "1 | \n", "0.0 | \n", "2 | \n", "0 | \n", "2 | \n", "1 | \n", "
1021 | \n", "60 | \n", "1 | \n", "0 | \n", "125 | \n", "258 | \n", "0 | \n", "0 | \n", "141 | \n", "1 | \n", "2.8 | \n", "1 | \n", "1 | \n", "3 | \n", "0 | \n", "
1022 | \n", "47 | \n", "1 | \n", "0 | \n", "110 | \n", "275 | \n", "0 | \n", "0 | \n", "118 | \n", "1 | \n", "1.0 | \n", "1 | \n", "1 | \n", "2 | \n", "0 | \n", "
1023 | \n", "50 | \n", "0 | \n", "0 | \n", "110 | \n", "254 | \n", "0 | \n", "0 | \n", "159 | \n", "0 | \n", "0.0 | \n", "2 | \n", "0 | \n", "2 | \n", "1 | \n", "
1024 | \n", "54 | \n", "1 | \n", "0 | \n", "120 | \n", "188 | \n", "0 | \n", "1 | \n", "113 | \n", "0 | \n", "1.4 | \n", "1 | \n", "1 | \n", "3 | \n", "0 | \n", "
1025 rows × 14 columns
\n", "