Quiz 16 Instructions

Please complete the following questions and submit a file named Quiz16.R to Gradescope for autograding.

Remember:

  • Do not use global paths in you script. Instead, use setwd() interactively in the console, but do not forget to remove or comment out this part of the code before you submit. The directory structure of your machine is not the same as the one on Gradescope’s virtual machines.
  • Do not destroy or overwrite any variables in your program. I check them only after I have run your entire program from start to finish.
  • Check to make sure you do not have any syntax errors. Code that doesn’t run will get a very bad grade.
  • Make sure to name your submission Quiz16.R

Tip: before submitting, it might help to clear all the objects from your workspace, and then source your file before you submit it. This will often uncover bugs.

Install and Load the fivethirtyeight Package

In this quiz, let’s use a motivating example from the fivethirtyeight package.

The fivethirtyeight package (Kim, Ismay, and Chunn 2021) provides access to the datasets used in many articles published by the data journalism website, https://FiveThirtyEight.com. For a complete list of all 129 datasets included in the fivethirtyeight package, check out the package webpage by going to: https://fivethirtyeight-r.netlify.app/articles/fivethirtyeight.html.

if(!require("fivethirtyeight", character.only = TRUE)) install.packages("fivethirtyeight")
library(fivethirtyeight)
library(tidyverse)

Question 1

Use the drinks data frame.

drinks is a data frame containing results from a survey of the average number of servings of beer, spirits, and wine consumed in 193 countries. This data was originally reported on FiveThirtyEight.com in Mona Chalabi’s article: “Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?”.

  1. [2 pts] Prepare the data frame drinks_4ctrs:

Let’s apply some of the data wrangling verbs we learned in the previous lectures on the drinks data frame:

  • filter() the drinks data frame to only the 4 countries: the United States, China, Italy, and Saudi Arabia, [Hint: Please use unique(drinks$country)to find the corresponding names in the country column.]
  • select() all columns except total_litres_of_pure_alcohol by using the - sign,
  • rename() the variables beer_servings, spirit_servings, and wine_servings to beer, spirit, and wine, respectively.

Save the resulting tibble as drinks_4ctrs.

  1. [2 pts] Convert drinks_4ctrs to “tidy” format using pivot_longer(). Let
  • names_to = "type"
  • values_to = "servings"

Save the tidy tibble as drinks_4ctrs_tidy.

  1. [2 pts] Visualize the data frame drinks_4ctrs_tidy by creating a bar-graph with x = country, y = serving, and color the bars using variable type. Please use position = "dodge" to place the bars side by side. Save the plot as PlotQ1.png using ggsave("PlotQ1.png", width = 10, height = 8, dpi = 300).

Question 2

airline_safety is another data frame in fivethirtyeight package containing information on different airline companies’ safety records.

This data was originally reported on the data journalism website, FiveThirtyEight.com, in Nate Silver’s article, “Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?”.

  1. [2 pts] Create the tibble df_fatalities with the columns: airline, and all columns starts with fatalities. [Hint: You may use starts_with("fatalities").]

  2. [2 pts] Convert df_fatalities to tidy format as shown below and save the tidy tibble as df_fatalities_tidy. In this process, you need to

  • Lengthens the data, so that the new column fatalities_year storing the values fatalities_85_99 or fatalities_00_14, and the new column count storing the cell values.
  • Use Seperate_wider_position() to remove the fatalities_ in the fatalities_year column. And the resulting column is named year.
  • Make the year column a factor with levels year_level = c("85_99", "00_14")
#> # A tibble: 112 × 3
#>    airline               year  count
#>    <chr>                 <fct> <int>
#>  1 Aer Lingus            85_99     0
#>  2 Aer Lingus            00_14     0
#>  3 Aeroflot              85_99   128
#>  4 Aeroflot              00_14    88
#>  5 Aerolineas Argentinas 85_99     0
#>  6 Aerolineas Argentinas 00_14     0
#>  7 Aeromexico            85_99    64
#>  8 Aeromexico            00_14     0
#>  9 Air Canada            85_99     0
#> 10 Air Canada            00_14     0
#> # ℹ 102 more rows
  1. [2 pts] Visualize the df_fatalities_tidy tibble by creating a bar-graph with x = airline, y = count, and color the bars using variable year for the airlines "United / Continental", "Delta / Northwest", and "American". You may use position = "dodge" to place the bars side by side. Save the plot as PlotQ2.png using ggsave("PlotQ2.png", width = 10, height = 8, dpi = 300).