Lab 8 Reaction Time Project

About the Data and Survey

In Spring 2024 STAT 425 class, we collected data on Reaction Time from around 150 students. Please find the survey used to collect this data Reaction Time Survey Questions.pdf. There are two datasets for Lab 8:

The real data – “Reaction Time Survey.csv”
The mock data – “Mock Student Information.csv”

In this lab, please clean and explore the datasets.

Lab 8 Instruction

It is highly recommended to follow the method demonstrated in class for the Female Life Expectancy Project by creating an R project to complete this lab.
As usual, you are only required to submit the PDF file generated to Gradescope.

Load Packages and Read Data

Please load the packages you need and import the two data sets:

Import “Reaction Time survey.csv” as survey.
Import “Mock Student Information.csv” as student.

Question I: Data Cleaning

[40 pts] Please perform the data cleaning for survey. Some hints are:

You may rename the variables as

c("mockID", "Reaction.time", "Class", "Age", "Avg.sleep.time",
  "last.night.sleep.time", "Awake.hours", "Fatigue.level",
  "Stress.level", "Distraction", "Noise.level", "Temp.level",
  "Game.freq", "Sport.freq", "Avg.hours.exercise",
  "Caffein.intake", "Alcohol.intake", "Visual.acuity",
  "Primary.hand", "Use.primary.hand", "Cautious.level",
  "Input.device", "Device.OS", "WiFi.stable")

Technically, it’s impossible to have a Reaction Time less than or equal to 100 milliseconds. Remove the observations with Reaction.time less than or equal to 100. [Notes: In practice, we reached out to the students to confirm or update their submissions.]
You may use separate_wider_delim() to clean up the Visual.acuity so that the variable becomes a factor with the following levels: “Very Poor”, “Poor”, “Average”, “Good”, “Excellent”.

(Use levels = c("Very Poor", "Poor", "Average", "Good", "Excellent"))

You may use separate_wider_delim twice to clean up the Device.OS so that the variable is a character variable with the possible values: “Desktop computer”, “Laptop”, “iPhone”, “iPad”, “Smartphone”, “Tablet”, “Chromebook”. Please use help(separate_wider_delim) to find solutions for the issue where “Chromebook” does not have “-” or “(” as a delimiter.
Please clean up the columns Avg.sleep.time, last.night.sleep.time, Awake.hours, Noise.level, and Avg.hours.exercise so that they become numeric variables. Be cautious of using as.numeric() directly, as it may introduce NAs due to coercion. Instead, handle edge cases like “7h4min”, “7:30”, “1~3” first, then use str_remove_all(..., "[[:alpha:][:space:]]") to remove any characters and empty spaces from the responses. Finally, convert the cleaned columns to numeric variables.
Convert the Class, Fatigue.level, Stress.level, Temp.level, Game.freq, Sport.freq, Cautious.level as factors with levels as follows:

classlevel =  c("Freshman", "Sophomore", "Junior", "Senior", "Graduate") 
# Even if there are no freshmen in the data.
fatiguelevel = c("Not fatigued at all", "Slightly Fatigued", 
                 "Moderately fatigued", "Very fatigued", "Extremely fatigued")
stresslevel = c("Very Low", "Low", "Moderate", "High" , "Very High")
templevel = c("Very Cold", "Cold", "Neutral", "Warm", "Very Warm")
gamefreqlevel = c("Daily", "Several times a week", 
                  "Once a week", "Several times a month", "Rarely", "Never")
sportfreqlevel =  c("Daily", "Several times a week", "Once a week", 
                    "Several times a month", "Rarely", "Never")
cautiouslevel = c("Not cautious at all", "Slightly cautious", 
                  "Moderately cautious", "Very cautious", "Extremely cautious")

Perform any additional data cleaning you find necessary.

After the data cleaning procedure, glimpse(survey) should show something like

glimpse(survey)
#> Rows: 140
#> Columns: 24
#> $ mockID                <chr> "MJJXGR", "JHJZMT", "VQTCNE", "CKJWEX", "MBKMNF"…
#> $ Reaction.time         <dbl> 180, 278, 350, 267, 229, 225, 200, 282, 272, 412…
#> $ Class                 <fct> Sophomore, Sophomore, Senior, Graduate, Sophomor…
#> $ Age                   <dbl> 20, 19, 22, 24, 20, 20, 20, 20, 19, 21, 20, 21, …
#> $ Avg.sleep.time        <dbl> 7.0, 7.0, 8.0, 6.0, 7.0, 7.0, 6.0, 10.0, 8.5, 7.…
#> $ last.night.sleep.time <dbl> 8.0, 7.0, 5.0, 5.0, 6.0, 8.0, 5.0, 8.0, 9.0, 6.0…
#> $ Awake.hours           <dbl> 10.00, 6.00, 4.00, 6.00, 5.00, 5.00, 1.00, 4.00,…
#> $ Fatigue.level         <fct> Not fatigued at all, Slightly Fatigued, Slightly…
#> $ Stress.level          <fct> Very Low, Low, Low, High, Low, Moderate, Moderat…
#> $ Distraction           <chr> "No", "No", "No", "No", "Yes", "No", "Yes", "Yes…
#> $ Noise.level           <dbl> 5, 8, 3, 2, 9, 3, 6, 8, 7, 3, 5, 7, 3, 7, 5, 7, …
#> $ Temp.level            <fct> Neutral, Warm, Neutral, Neutral, Neutral, Neutra…
#> $ Game.freq             <fct> Several times a month, Rarely, Daily, Several ti…
#> $ Sport.freq            <fct> Several times a month, Several times a month, On…
#> $ Avg.hours.exercise    <dbl> 5.0, 6.0, 5.0, 3.0, 1.0, 10.0, 7.0, 7.0, 2.0, 4.…
#> $ Caffein.intake        <chr> "No", "Yes", "No", "No", "No", "No", "No", "No",…
#> $ Alcohol.intake        <chr> "No", "No", "No", "No", "No", "No", "No", "No", …
#> $ Visual.acuity         <fct> Excellent, Excellent, Good, Good, Good, Excellen…
#> $ Primary.hand          <chr> "Ambidextrous (both hands equally)", "Right hand…
#> $ Use.primary.hand      <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes",…
#> $ Cautious.level        <fct> Moderately cautious, Slightly cautious, Slightly…
#> $ Input.device          <chr> "Mouse", "Trackpad", "Mouse", "Touch screen", "T…
#> $ Device.OS             <chr> "Desktop computer", "Laptop", "Laptop", "iPhone"…
#> $ WiFi.stable           <chr> "Stable", "Stable", "Stable", "Stable", "Stable"…

Question II: Join Two tables

[10 pts] Join the two tables and list the students who did not respond to the survey. [Hint: You may use anti_join() for this task. Look up information about anti_join() online if needed.]
[10 pts] Several students have submitted the survey more than once. To handle this, we should remove the duplicate submissions by keeping only the last submission (assuming the survey is in time order). Follow the steps:

Arrange the observations in reverse-time order using desc(row_number()).
Apply distinct(mockID, .keep_all = TRUE). [You can use ?distinct() to understand why this approach works.]

[10 pts] Join the two tables and keep all information from survey, use mockID as the key. Save the new tibble as survey_student (or a name of your choice).

Question III: Explore the Data `survey_student`

[30 pts] Create data visualizations or tables to uncover interesting insights from the data. Please list at least three notable findings.

Hint: You may find it’s helpful to combine some categories into fewer, more concise categories. Below is one example. For more information about the case_when function, you can run ?case_when in R console.

survey_student %>%
  mutate(Fatigue.3 = case_when(
    Fatigue.level %in% c("Extremely fatigued", "Very fatigued")~ "H.Fatigue",
    Fatigue.level %in% c("Moderately fatigued") ~ "M.Fatigue",
    Fatigue.level %in% c("Not fatigued at all", "Slightly Fatigued")~ "L.Fatigue"
    ),
    Fatigue.3 = factor(Fatigue.3, levels = c("H.Fatigue", "M.Fatigue", "L.Fatigue"))
  )