Data Visualization with ggplot

Megha Joshi

Data Visualization

 

A motivational poster with quote from Kieran Healy saying "You should look at your data." The quote is from Data Visualization: A Practical Introduction, 2018.o

  • Exploration

    • Look at distributions

    • Examine relationship between variables

  • Communication

    • Illustrate your findings in ways that are digestible

    • Make or support an argument

A Grammar of Graphics

  • Leland Wilkinson wrote The Grammar of Graphics (2nd edition, 2005)

    • Developed the grammar of graphics

    • Described fundamental elements that make up a statistical graphic

  • Hadley Wickham created the R package ggplot2 to create graphs based on Wilkinson’s grammar of graphics (gg)

    • Layers

Loading ggplot

The ggplot2 package is available as part of the tidyverse set of packages:

library(tidyverse)

You can also load the package separately:

library(ggplot2)

Example Dataset: Capital Bike Sharing Data

Source: UCI Repository

load("../../slides/data/capital_bike_sharing_data/bike_sharing_dat.RData")
glimpse(bike_sharing_dat)
Rows: 731
Columns: 16
$ instant    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
$ dteday     <date> 2011-01-01, 2011-01-02, 2011-01-03, 2011-01-04, 2011-01-05…
$ season     <chr> "winter", "winter", "winter", "winter", "winter", "winter",…
$ yr         <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,…
$ mnth       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ holiday    <chr> "No Holiday", "No Holiday", "No Holiday", "No Holiday", "No…
$ weekday    <chr> "Saturday", "Sunday", "Monday", "Tuesday", "Wednesday", "Th…
$ workingday <chr> "Holiday", "Holiday", "Working Day", "Working Day", "Workin…
$ weathersit <chr> "Misty and Cloudy", "Misty and Cloudy", "Clear", "Clear", "…
$ temp       <dbl> 0.3441670, 0.3634780, 0.1963640, 0.2000000, 0.2269570, 0.20…
$ atemp      <dbl> 0.3636250, 0.3537390, 0.1894050, 0.2121220, 0.2292700, 0.23…
$ hum        <dbl> 0.805833, 0.696087, 0.437273, 0.590435, 0.436957, 0.518261,…
$ windspeed  <dbl> 0.1604460, 0.2485390, 0.2483090, 0.1602960, 0.1869000, 0.08…
$ casual     <dbl> 331, 131, 120, 108, 82, 88, 148, 68, 54, 41, 43, 25, 38, 54…
$ registered <dbl> 654, 670, 1229, 1454, 1518, 1518, 1362, 891, 768, 1280, 122…
$ count      <dbl> 985, 801, 1349, 1562, 1600, 1606, 1510, 959, 822, 1321, 126…

The Grammar

The Grammar

Screenshot from ggplot2 website showing different layers. There is an illustration of squares on top of each other representing the following layers from bottom to top: data, mapping, layers, scales, facets, coordinates, themes.

“In brief, the grammar tells us that a graphic maps the data to the aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars).”

Hadley Wickham

Data

  • Data - information used to create the graphic
ggplot(data = bike_sharing_dat)

Aesthetic Mappings

  • Aesthetic attributes - x, y, color, size
ggplot(bike_sharing_dat, 
       aes(x = temp, 
           y = count, 
           color = season))

Layers

  • Geoms- geometric representations of the data
ggplot(bike_sharing_dat, 
       aes(x = temp, 
           y = count, 
           color = season)) + 
  geom_point()

Scales

Scales - translate what is in the graph to what is in the data using legends or axes

ggplot(bike_sharing_dat, 
       aes(x = temp, 
           y = count, 
           color = season)) + 
  geom_point() + 
  scale_color_brewer(palette = "Dark2")

Facets

  • Facets - small multiples
ggplot(bike_sharing_dat, 
       aes(x = temp, 
           y = count, 
           color = season)) + 
  geom_point() +
  facet_grid(~ season)

Theme

Theme - set overall how the graph looks

ggplot(bike_sharing_dat, 
       aes(x = temp, 
           y = count, 
           color = season)) + 
  geom_point() +
  facet_grid(~ season) + 
  theme_bw()

Cleaning Up

ggplot(bike_sharing_dat, 
       aes(x = temp, 
           y = count, 
           color = season)) + 
  geom_point(alpha = .5) +
  facet_grid(~ season) + 
  scale_color_brewer(palette = "Dark2") +
  labs(x = "Temperature", y = "Total Rental Bikes", color = "Season") +
  theme_bw() + 
  guides(color = guide_legend(position = "bottom")) 

Anatomy of a ggplot

ggplot(data = [dat], # the data
       aes(x = [x_var], 
           y = [y_var],
           color = [color_var], 
           fill = [fill_var])) + # aesthetic mappings
  geom_[point]() + # geom
  geom_[smooth]() + # another layer
  scale_[aesthetic]_[type]() + # scaling
  facet_grid([row] ~ [col]) + # faceting
  theme_[theme]() # theme

Working with One Variable

Bar Plot: Categorical Data

ggplot(bike_sharing_dat, aes(x = holiday, fill = holiday)) + 
  geom_bar() +
  theme_minimal()

Histogram: Continuous Data

ggplot(bike_sharing_dat, aes(x = temp)) + 
  geom_histogram(fill = "darkred", alpha = .8, bins = 15) +
  theme_grey()

Density Plot: Continuous Data

ggplot(bike_sharing_dat, aes(x = count)) +
  geom_density(alpha = .8, fill = "navy") + 
  theme(legend.position = "none") +
  theme_bw()

Working with Two Variables

Bar Plot: Categorical Data

ggplot(bike_sharing_dat, aes(x = season, fill = weathersit)) +
  geom_bar(position = "fill") +
  theme_classic()

Box Plot

ggplot(bike_sharing_dat, 
       aes(x = weathersit, 
           y = count, 
           fill = weathersit)) +
  geom_boxplot(alpha = .7) +
  theme(legend.position = "none") +
  theme_linedraw()

Scatter Plot: Continuous Data

ggplot(bike_sharing_dat, 
       aes(x = temp, 
           y = count)) + 
  geom_point(alpha = .7) + 
  theme_dark()

Total Rentals by Season

ggplot(bike_sharing_dat, aes(x = count, fill = season)) +
  geom_density(alpha = .8) + 
  theme_light() 

Total Rentals by Season

ggplot(bike_sharing_dat, aes(x = count, fill = season)) +
  geom_density(alpha = .8) + 
  facet_wrap(~ season) +
  theme_void() + 
  guides(fill = FALSE)

Introducing More Variables

Relationship Between Temperature and Total Rentals by Season and Weather and by Weekday and Holiday

ggplot(bike_sharing_dat, 
       aes(x = temp, 
           y = count,
           color = weekday,
           shape = holiday)) + 
  geom_point(alpha = .7) + 
  facet_grid(weathersit ~ season) +
  theme_test()

Thank you!