c("mockID", "Reaction.time", "Class", "Age", "Avg.sleep.time",
"last.night.sleep.time", "Awake.hours", "Fatigue.level",
"Stress.level", "Distraction", "Noise.level", "Temp.level",
"Game.freq", "Sport.freq", "Avg.hours.exercise",
"Caffein.intake", "Alcohol.intake", "Visual.acuity",
"Primary.hand", "Use.primary.hand", "Cautious.level",
"Input.device", "Device.OS", "WiFi.stable")
Lab 8 Reaction Time Project
About the Data and Survey
In Spring 2024 STAT 425 class, we collected data on Reaction Time
from around 150 students. Please find the survey used to collect this data Reaction Time Survey Questions.pdf. There are two datasets for Lab 8:
- The real data – “Reaction Time Survey.csv”
- The mock data – “Mock Student Information.csv”
In this lab, please clean and explore the datasets.
Lab 8 Instruction
- It is highly recommended to follow the method demonstrated in class for the Female Life Expectancy Project by creating an R project to complete this lab.
- As usual, you are only required to submit the PDF file generated to Gradescope.
Load Packages and Read Data
Please load the packages you need and import the two data sets:
- Import “Reaction Time survey.csv” as
survey
. - Import “Mock Student Information.csv” as
student
.
Question I: Data Cleaning
[40 pts] Please perform the data cleaning for survey
. Some hints are:
- You may rename the variables as
Technically, it’s impossible to have a Reaction Time less than or equal to 100 milliseconds. Remove the observations with
Reaction.time
less than or equal to 100. [Notes: In practice, we reached out to the students to confirm or update their submissions.]You may use
separate_wider_delim()
to clean up theVisual.acuity
so that the variable becomes a factor with the following levels: “Very Poor”, “Poor”, “Average”, “Good”, “Excellent”.
(Use levels = c("Very Poor", "Poor", "Average", "Good", "Excellent")
)
You may use
separate_wider_delim
twice to clean up theDevice.OS
so that the variable is a character variable with the possible values: “Desktop computer”, “Laptop”, “iPhone”, “iPad”, “Smartphone”, “Tablet”, “Chromebook”. Please usehelp(separate_wider_delim)
to find solutions for the issue where “Chromebook” does not have “-” or “(” as a delimiter.Please clean up the columns
Avg.sleep.time
,last.night.sleep.time
,Awake.hours
,Noise.level
, andAvg.hours.exercise
so that they become numeric variables. Be cautious of usingas.numeric()
directly, as it may introduce NAs due to coercion. Instead, handle edge cases like “7h4min”, “7:30”, “1~3” first, then usestr_remove_all(..., "[[:alpha:][:space:]]")
to remove any characters and empty spaces from the responses. Finally, convert the cleaned columns to numeric variables.Convert the
Class
,Fatigue.level
,Stress.level
,Temp.level
,Game.freq
,Sport.freq
,Cautious.level
as factors with levels as follows:
= c("Freshman", "Sophomore", "Junior", "Senior", "Graduate")
classlevel # Even if there are no freshmen in the data.
= c("Not fatigued at all", "Slightly Fatigued",
fatiguelevel "Moderately fatigued", "Very fatigued", "Extremely fatigued")
= c("Very Low", "Low", "Moderate", "High" , "Very High")
stresslevel = c("Very Cold", "Cold", "Neutral", "Warm", "Very Warm")
templevel = c("Daily", "Several times a week",
gamefreqlevel "Once a week", "Several times a month", "Rarely", "Never")
= c("Daily", "Several times a week", "Once a week",
sportfreqlevel "Several times a month", "Rarely", "Never")
= c("Not cautious at all", "Slightly cautious",
cautiouslevel "Moderately cautious", "Very cautious", "Extremely cautious")
- Perform any additional data cleaning you find necessary.
After the data cleaning procedure, glimpse(survey)
should show something like
glimpse(survey)
#> Rows: 140
#> Columns: 24
#> $ mockID <chr> "MJJXGR", "JHJZMT", "VQTCNE", "CKJWEX", "MBKMNF"…
#> $ Reaction.time <dbl> 180, 278, 350, 267, 229, 225, 200, 282, 272, 412…
#> $ Class <fct> Sophomore, Sophomore, Senior, Graduate, Sophomor…
#> $ Age <dbl> 20, 19, 22, 24, 20, 20, 20, 20, 19, 21, 20, 21, …
#> $ Avg.sleep.time <dbl> 7.0, 7.0, 8.0, 6.0, 7.0, 7.0, 6.0, 10.0, 8.5, 7.…
#> $ last.night.sleep.time <dbl> 8.0, 7.0, 5.0, 5.0, 6.0, 8.0, 5.0, 8.0, 9.0, 6.0…
#> $ Awake.hours <dbl> 10.00, 6.00, 4.00, 6.00, 5.00, 5.00, 1.00, 4.00,…
#> $ Fatigue.level <fct> Not fatigued at all, Slightly Fatigued, Slightly…
#> $ Stress.level <fct> Very Low, Low, Low, High, Low, Moderate, Moderat…
#> $ Distraction <chr> "No", "No", "No", "No", "Yes", "No", "Yes", "Yes…
#> $ Noise.level <dbl> 5, 8, 3, 2, 9, 3, 6, 8, 7, 3, 5, 7, 3, 7, 5, 7, …
#> $ Temp.level <fct> Neutral, Warm, Neutral, Neutral, Neutral, Neutra…
#> $ Game.freq <fct> Several times a month, Rarely, Daily, Several ti…
#> $ Sport.freq <fct> Several times a month, Several times a month, On…
#> $ Avg.hours.exercise <dbl> 5.0, 6.0, 5.0, 3.0, 1.0, 10.0, 7.0, 7.0, 2.0, 4.…
#> $ Caffein.intake <chr> "No", "Yes", "No", "No", "No", "No", "No", "No",…
#> $ Alcohol.intake <chr> "No", "No", "No", "No", "No", "No", "No", "No", …
#> $ Visual.acuity <fct> Excellent, Excellent, Good, Good, Good, Excellen…
#> $ Primary.hand <chr> "Ambidextrous (both hands equally)", "Right hand…
#> $ Use.primary.hand <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes",…
#> $ Cautious.level <fct> Moderately cautious, Slightly cautious, Slightly…
#> $ Input.device <chr> "Mouse", "Trackpad", "Mouse", "Touch screen", "T…
#> $ Device.OS <chr> "Desktop computer", "Laptop", "Laptop", "iPhone"…
#> $ WiFi.stable <chr> "Stable", "Stable", "Stable", "Stable", "Stable"…
Question II: Join Two tables
[10 pts] Join the two tables and list the students who did not respond to the survey. [Hint: You may use
anti_join()
for this task. Look up information aboutanti_join()
online if needed.][10 pts] Several students have submitted the survey more than once. To handle this, we should remove the duplicate submissions by keeping only the last submission (assuming the
survey
is in time order). Follow the steps:
- Arrange the observations in reverse-time order using
desc(row_number())
. - Apply
distinct(mockID, .keep_all = TRUE)
. [You can use?distinct()
to understand why this approach works.]
- [10 pts] Join the two tables and keep all information from
survey
, usemockID
as the key. Save the new tibble assurvey_student
(or a name of your choice).
Question III: Explore the Data survey_student
[30 pts] Create data visualizations or tables to uncover interesting insights from the data. Please list at least three notable findings.
Hint: You may find it’s helpful to combine some categories into fewer, more concise categories. Below is one example. For more information about the case_when
function, you can run ?case_when
in R console.
%>%
survey_student mutate(Fatigue.3 = case_when(
%in% c("Extremely fatigued", "Very fatigued")~ "H.Fatigue",
Fatigue.level %in% c("Moderately fatigued") ~ "M.Fatigue",
Fatigue.level %in% c("Not fatigued at all", "Slightly Fatigued")~ "L.Fatigue"
Fatigue.level
),Fatigue.3 = factor(Fatigue.3, levels = c("H.Fatigue", "M.Fatigue", "L.Fatigue"))
)