<- c(1, 2, 3, 1, 2, 3, 2)
num_vector factor(num_vector)
#> [1] 1 2 3 1 2 3 2
#> Levels: 1 2 3
<- c('a', 'b', 'c', 'b', 'c', 'a', 'c', 'b')
str_vector factor(str_vector)
#> [1] a b c b c a c b
#> Levels: a b c
<- c(TRUE, FALSE, TRUE, TRUE, FALSE)
log_vector factor(log_vector)
#> [1] TRUE FALSE TRUE TRUE FALSE
#> Levels: FALSE TRUE
Factors
Reading Assignments
Make sure to read the following sections in the textbook: R Coding Basics, https://www.gastonsanchez.com/R-coding-basics/
Creating Factors
factor()
can convert a vector (numerical, character, or logical) to a factor
Examples:
Why factors?
Factors are used to handle categorical data
Factors vs Character
Factors have levels; Character doesn’t have levels
- Use factors when your data represents categories or groups with a limited set of values (e.g., “red,” “green,” “blue” for colors).
- Use character vectors if your data consists of plain strings without specific meaning (e.g., names, labels, or arbitrary text).
As factors have predefined levels, factors are particularly useful in the following cases.
Case 1: You get error message (and NA generated) when adding a new value not in the predefined levels (help prevent typos)
<- factor(num_vector)
first_factor
first_factor#> [1] 1 2 3 1 2 3 2
#> Levels: 1 2 3
1] <- 6
first_factor[#> Warning in `[<-.factor`(`*tmp*`, 1, value = 6): invalid factor level, NA
#> generated
first_factor#> [1] <NA> 2 3 1 2 3 2
#> Levels: 1 2 3
Case 2: when you tabulate a factor you’ll get counts of all categories, even unobserved ones.
<- c("m", "m", "m")
sex_char <- factor(sex_char, levels = c("m", "f"))
sex_factor
table(sex_char)
#> sex_char
#> m
#> 3
table(sex_factor)
#> sex_factor
#> m f
#> 3 0
Case 3: Compare to character, factors allow for efficient memory storage
Why? The way R stores factors is as vectors of integer values.
<- factor(str_vector)
second_factor
second_factor#> [1] a b c b c a c b
#> Levels: a b c
typeof(second_factor)
#> [1] "integer"
unclass(second_factor)
#> [1] 1 2 3 2 3 1 3 2
#> attr(,"levels")
#> [1] "a" "b" "c"
unclass()
is equivalent to as.numeric()
or as.integer()
.
How R treats factors
Factors are built on top of an integer vector with two attributes:
class
, “factor”, which makes it behave differently from regular integer vectorslevels
, which defines the set of allowed values.
# Recall second_factor defined above
second_factor#> [1] a b c b c a c b
#> Levels: a b c
attributes(second_factor)
#> $levels
#> [1] "a" "b" "c"
#>
#> $class
#> [1] "factor"
typeof(second_factor)
#> [1] "integer"
Ordinal Factor
- Ordinal factors are a minor variation of factors.
- Ordinal factors behave like regular factors, but the order of the levels is meaningful (low, medium, high) (a property that is automatically leveraged by some modelling and visualization functions)
Convert to Ordinal Factor
Two equivalent ways below to convert vectors (numerical, character, logical) or unordered factors to ordinal factors
# Two equivalent ways to get Ordinal factors
factor(num_vector, ordered = TRUE)
#> [1] 1 2 3 1 2 3 2
#> Levels: 1 < 2 < 3
ordered(num_vector)
#> [1] 1 2 3 1 2 3 2
#> Levels: 1 < 2 < 3
Nominal Factor vs Ordinal Factor
# Nominal factor or Unordered factor
<- factor(num_vector)
first_factor
first_factor#> [1] 1 2 3 1 2 3 2
#> Levels: 1 2 3
# Ordinal factor
<- factor(num_vector, ordered = TRUE)
first_factor_o
first_factor_o#> [1] 1 2 3 1 2 3 2
#> Levels: 1 < 2 < 3
Ordinal Factor or not? – is.ordered()
is.ordered(num_vector) # Recall num_vector is a numerical vector
#> [1] FALSE
is.ordered(first_factor) # first_factor is an unordered factor
#> [1] FALSE
is.ordered(first_factor_o) # first_factor_o is an ordinal factor
#> [1] TRUE