Practice Problem Set 3: Querying and Wrangling Data

Run the following code to load the data:
Using tidyverse syntax, calculate the count and proportion of high BP patients who were diagnosed with diabetes.
Calculate the average BMI for smokers.
Using the codebook here, create another variable that contains the corresponding label for the education variable. For example, for the rows where education is 1, the new variable should contain: Never attended school or only kindergarten.
For each category of education, calculate the proportion of people who smoke, are physically active, have had a heart attack, eat fruits, and eat vegetables.
Create a ggplot representing the data from above.
For people with and without diabetes, calculate the mean and standard deviation of the following variables: mental_hlth, physical_htlh, gen_hlth, bmi, age, income.
Think of different ways to represent the table above. Experiment with the new formats.
Restructure the data so that we have mean and standard deviation for diabetic vs non-diabetic people as columns. And, all the background traits for which we calculated the mean and standard deviation as rows.
Challenge Problem: Create a table using kableExtra with the data from Question 9.

---
title: "Practice Problem Set 3: Querying and Wrangling Data"
format:
  html:
    code-tools: true
    embed-resources: true
engine: knitr
editor: visual
webr:
  packages: ['tidyverse', 'janitor', 'curl', 'kableExtra']
filters:
  - webr
---

1.  Run the following code to load the data:

    ```{webr, message = FALSE, warning = FALSE}
    library(tidyverse)
    library(janitor)

    diabetes_dat <- read_csv("https://raw.githubusercontent.com/meghapsimatrix/Data_Analytics_in_R/refs/heads/main/practice_problems/data/diabetes_uci/diabetes_data.csv") %>%
      clean_names()
      
    glimpse(diabetes_dat)
    ```

2.  Using `tidyverse` syntax, calculate the count and proportion of high BP patients who were diagnosed with diabetes.

    ```{webr}

    ```

3.  Calculate the average BMI for smokers.

    ```{webr}

    ```

4.  Using the [codebook](https://archive.ics.uci.edu/dataset/891/cdc+diabetes+health+indicators) here, create another variable that contains the corresponding label for the `education` variable. For example, for the rows where `education` is 1, the new variable should contain: `Never attended school or only kindergarten`.

    ```{webr}

    ```

5.  For each category of education, calculate the proportion of people who smoke, are physically active, have had a heart attack, eat fruits, and eat vegetables.

    ```{webr}

    ```

6.  Create a `ggplot` representing the data from above.

    ```{webr}

    ```

7.  For people with and without diabetes, calculate the mean and standard deviation of the following variables: `mental_hlth`, `physical_htlh`, `gen_hlth`, `bmi`, `age`, `income`.

    ```{webr}

    ```

8.  Think of different ways to represent the table above. Experiment with the new formats.

    ```{webr}

    ```

9.  Restructure the data so that we have mean and standard deviation for diabetic vs non-diabetic people as columns. And, all the background traits for which we calculated the mean and standard deviation as rows.

    ```{webr}

    ```

10. Challenge Problem: Create a table using [`kableExtra`](https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html) with the data from Question 9.

    ```{webr}

    ```