Practice Problem Set 3: Querying and Wrangling Data
Run the following code to load the data:
Using
tidyversesyntax, calculate the count and proportion of high BP patients who were diagnosed with diabetes.Calculate the average BMI for smokers.
Using the codebook here, create another variable that contains the corresponding label for the
educationvariable. For example, for the rows whereeducationis 1, the new variable should contain:Never attended school or only kindergarten.For each category of education, calculate the proportion of people who smoke, are physically active, have had a heart attack, eat fruits, and eat vegetables.
Create a
ggplotrepresenting the data from above.For people with and without diabetes, calculate the mean and standard deviation of the following variables:
mental_hlth,physical_htlh,gen_hlth,bmi,age,income.Think of different ways to represent the table above. Experiment with the new formats.
Restructure the data so that we have mean and standard deviation for diabetic vs non-diabetic people as columns. And, all the background traits for which we calculated the mean and standard deviation as rows.
Challenge Problem: Create a table using
kableExtrawith the data from Question 9.