Data in R can be stored in several different ways, and choosing the right data structure is an important part of writing efficient and organized code. Whether you are working with a simple collection of numbers or a complete dataset, R provides specialized data structures designed for different tasks. In this post, we’ll explore the most common data structures in R and learn when to use each one.
1. Vectors
A vector is the simplest data structure in R. It is a sequence of elements that are all of the same type (numeric, character, or logical). You can create a vector using the c() function:
#Creating a numerical vector:
numeric_vector <- c(10, 20, 30, 40)
#Creating a character vector:
character_vector <- c("apple", "banana", "cherry")
#Creating a logical vector:
logical_vector <- c(TRUE, FALSE, TRUE)
Vectors in R support element-wise arithmetic, which allows you to add, subtract, or multiply the values in a single step:
# Defining a numerical vector: numeric_vector <- c(10, 20, 30, 40) #Adding 5 to each element in the vector: numeric_vector + 5 # Output: 15 25 35 45 #Subtracting 2 from each element in the vector: numeric_vector - 2 # Output: 8 18 28 38 #Multiplying each element of the vector by 2: numeric_vector * 2 # Output: 20 40 60 80
You can also perform element-wise arithmetic between two vectors of the same length:
# Defining two numerical vectors: vector_a <- c(5, 10, 15, 20) vector_b <- c(2, 4, 6, 8) # Adding the vectors: vector_a + vector_b # Output: 7 14 21 28 # Subtracting the second vector from the first: vector_a - vector_b # Output: 3 6 9 12 # Multiplying the vectors element-wise: vector_a * vector_b # Output: 10 40 90 160
2. Matrices
A matrix is a two-dimensional collection of elements arranged in rows and columns, where all elements must be of the same type (numeric, character, or logical). You can create a matrix using the matrix() function by specifying the number of rows and columns:
my_matrix <- matrix(1:9, nrow = 3, ncol = 3) my_matrix
Output:
[,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9
By default, R fills matrices column-wise. You can change this behavior using the argument byrow = TRUE:
matrix(1:9, nrow = 3, byrow = TRUE)
Output:
[,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9
You can access specific elements, rows, or columns using square brackets:
my_matrix[2, 3] # Element in 2nd row, 3rd column my_matrix[ , 2] # Entire 2nd column my_matrix[1, ] # Entire 1st row
Matrices support arithmetic and linear algebra operations, such as addition, multiplication, and transposition:
# Defining two numerical matrices:
A <- matrix(c(1, 2, 3, 4), nrow = 2)
B <- matrix(c(5, 6, 7, 8), nrow = 2)
# The matrix A is:
A
# Output:
[,1] [,2]
[1,] 1 3
[2,] 2 4
# The matrix B is:
B
# Output:
[,1] [,2]
[1,] 5 7
[2,] 6 8
# The sum of the matrices is:
A + B
# [,1] [,2]
# [1,] 6 10
# [2,] 8 12
# The product of the matrices is:
A %*% B
# Output:
# [,1] [,2]
# [1,] 26 30
# [2,] 38 44
# The transpose of the matrix A is:
t(A)
# Output:
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
3. Factors
Factors are used to represent categorical data, such as groups or categories (for example, gender, education level, or region). Unlike ordinary character vectors, factors have underlying numeric codes that correspond to each category label. This means R stores both the labels (like “Male” and “Female”) and their internal codes (for example, 1 and 2), making it easier to handle categorical variables efficiently in statistical models.
gender <- factor(c("Male", "Female", "Female", "Male"))
gender
Output:
[1] Male Female Female Male Levels: Female Male
Internally, R represents the factor with numeric codes linked to the labels:
as.numeric(gender) [1] 2 1 1 2
Now let’s look at how this works in a real modeling context. Suppose you have data on people’s income and education level:
education <- c("High School", "College", "College", "High School", "Graduate")
income <- c(40000, 55000, 60000, 42000, 75000)
If you fit a regression model using education as a character vector, R will not know how to interpret text values as categories.
However, converting it to a factor tells R that these are categorical groups:
education <- factor(education) model <- lm(income ~ education) summary(model)
Output:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 41000.0 2500.0 16.40 0.0004 ***
educationCollege 16250.0 3535.5 4.60 0.045 *
educationGraduate 34000.0 4100.2 8.29 0.014 *
Interpretation: “High School” is the baseline category. The model shows how average income differs for “College” and “Graduate” groups compared to the baseline.
This example demonstrates why factors are essential: they provide the internal structure that allows R to create dummy variables automatically, enabling correct analysis of categorical data in statistical models.
4. Lists
A list is a flexible data structure that can hold objects of different types and sizes. This makes lists especially useful for storing model results, nested data, or mixed data.
my_list <- list( name = "Alice", age = 25, scores = c(88, 92, 95) ) my_list
Output:
$name [1] "Alice" $age [1] 25 $scores [1] 88 92 95
You can access elements of a list using the $ operator or double brackets:
my_list$name # Output: "Alice" my_list[["scores"]] # Output: 88 92 95
5. Data Frames
A data frame is a table-like structure where each column is a vector of equal length, and each row represents an observation. Data frames are the most common way to store datasets in R.
students <- data.frame(
name = c("Alice", "Bob", "Charlie"),
age = c(25, 22, 23),
score = c(90, 85, 88)
)
students
Output:
name age score 1 Alice 25 90 2 Bob 22 85 3 Charlie 23 88
Each column of a data frame can be a different type (e.g., name is character, age is numeric), but all columns must have the same number of rows.
6. Comparing the Data Types
- Vectors: Basic building block, all elements must be the same type.
- Factors: Used for categorical data with fixed levels.
- Lists: Can hold objects of different types and lengths.
- Data Frames: Tables that combine multiple vectors of equal length, ideal for datasets.
Need Help from an R Tutor?
If you’re finding it challenging to work with different data types in R, working with an experienced tutor can save you time and make learning R a more enjoyable, less stressful experience. Visit our R Tutor page to learn more about our one-on-one tutoring services and assignment assistance.
