Factors

Reading Assignments

Creating Factors

factor() can convert a vector (numerical, character, or logical) to a factor

Examples:

num_vector <- c(1, 2, 3, 1, 2, 3, 2)
factor(num_vector)
#> [1] 1 2 3 1 2 3 2
#> Levels: 1 2 3

str_vector <- c('a', 'b', 'c', 'b', 'c', 'a', 'c', 'b')
factor(str_vector)
#> [1] a b c b c a c b
#> Levels: a b c

log_vector <- c(TRUE, FALSE, TRUE, TRUE, FALSE)
factor(log_vector)
#> [1] TRUE  FALSE TRUE  TRUE  FALSE
#> Levels: FALSE TRUE

Why factors?

Factors are used to handle categorical data

Factors vs Character

  • Factors have levels; Character doesn’t have levels

    • Use factors when your data represents categories or groups with a limited set of values (e.g., “red,” “green,” “blue” for colors).
    • Use character vectors if your data consists of plain strings without specific meaning (e.g., names, labels, or arbitrary text).
  • As factors have predefined levels, factors are particularly useful in the following cases.

    Case 1: You get error message (and NA generated) when adding a new value not in the predefined levels (help prevent typos)

first_factor <- factor(num_vector)
first_factor
#> [1] 1 2 3 1 2 3 2
#> Levels: 1 2 3
first_factor[1] <- 6
#> Warning in `[<-.factor`(`*tmp*`, 1, value = 6): invalid factor level, NA
#> generated
first_factor
#> [1] <NA> 2    3    1    2    3    2   
#> Levels: 1 2 3

Case 2: when you tabulate a factor you’ll get counts of all categories, even unobserved ones.

sex_char <- c("m", "m", "m")
sex_factor <- factor(sex_char, levels = c("m", "f"))

table(sex_char)
#> sex_char
#> m 
#> 3
table(sex_factor)
#> sex_factor
#> m f 
#> 3 0

Case 3: Compare to character, factors allow for efficient memory storage

Why? The way R stores factors is as vectors of integer values.
second_factor <- factor(str_vector)
second_factor
#> [1] a b c b c a c b
#> Levels: a b c
typeof(second_factor)
#> [1] "integer"
unclass(second_factor)
#> [1] 1 2 3 2 3 1 3 2
#> attr(,"levels")
#> [1] "a" "b" "c"

unclass() is equivalent to as.numeric() or as.integer().

How R treats factors

Factors are built on top of an integer vector with two attributes:

  • class, “factor”, which makes it behave differently from regular integer vectors
  • levels, which defines the set of allowed values.
# Recall second_factor defined above
second_factor
#> [1] a b c b c a c b
#> Levels: a b c
attributes(second_factor)
#> $levels
#> [1] "a" "b" "c"
#> 
#> $class
#> [1] "factor"
typeof(second_factor)
#> [1] "integer"

Ordinal Factor

  • Ordinal factors are a minor variation of factors.
  • Ordinal factors behave like regular factors, but the order of the levels is meaningful (low, medium, high) (a property that is automatically leveraged by some modelling and visualization functions)

Convert to Ordinal Factor

Two equivalent ways below to convert vectors (numerical, character, logical) or unordered factors to ordinal factors

# Two equivalent ways to get Ordinal factors
factor(num_vector, ordered = TRUE)
#> [1] 1 2 3 1 2 3 2
#> Levels: 1 < 2 < 3
ordered(num_vector)
#> [1] 1 2 3 1 2 3 2
#> Levels: 1 < 2 < 3

Nominal Factor vs Ordinal Factor

# Nominal factor or Unordered factor
first_factor <- factor(num_vector)
first_factor
#> [1] 1 2 3 1 2 3 2
#> Levels: 1 2 3

# Ordinal factor
first_factor_o <- factor(num_vector, ordered = TRUE)
first_factor_o
#> [1] 1 2 3 1 2 3 2
#> Levels: 1 < 2 < 3

Ordinal Factor or not? – is.ordered()

is.ordered(num_vector) # Recall num_vector is a numerical vector
#> [1] FALSE
is.ordered(first_factor) # first_factor is an unordered factor
#> [1] FALSE
is.ordered(first_factor_o) # first_factor_o is an ordinal factor
#> [1] TRUE