The report mentions that in 2010, the top 5 causes of death - diseases of the heart, cancer, chronic lower respiratory disease, cerebrovascular diseases (stroke), and unintentional injuries accounted for approximately 63% of all deaths. For the purposes of their report, they used mortality data from the National Vital Statistics System for 2008-2010. Please read their report for caveats associated with the data as well as the assumptions underlying the procedures used. Implications are also discussed in the report and the discussion section of the report is really worth a read.
This section of the R code retrieves data from CDC’s report.
Data Cleaning and Manipulations
Let’s clean the dataset by doing the following.
Changing column names
Removing the top 3 rows and the bottom two rows
Let’s also check the structure of the data.
Let’s change columns for numbers from factor variables to numeric variables and view the data using googleVis’s table. Entries can be sorted in this table by clicking on the header for a column.
For each type of disease, we do the following. Instead of dealing with raw numbers of potential deaths preventable, we compute the percentage of potential deaths preventable among the number of deaths observed. We then also compute the average percentage of potential deaths preventable among the 5 categories of diseases.
Let’s now start plotting bar charts and choropleths using googleVis within the shiny server environment. Before we do that, we make the following modifications to the dataset.
Convert it into a long form such that all columns, besides State are collapsed into a single column with a new column for the corresponding value.
Reorder levels of the column of different diseases so that the average percentages and disease percentages are among the first few levels.