Chapter 4: apply Functions (2024)

A Language, not a Letter: Learning Statistics in R

  • Intro
  • R-programming
    • Chapter 1: Introduction
    • Chapter 2: Indexing
    • Chapter 4: Apply Family
    • Chapter 5: Plyr Package
    • Chapter 6: Vectorizing
    • Chapter 9: Tidyr Package
  • Plotting
    • Chapter 10: GGPlot1: Basics
  • Regression
    • Chapter 13: Ploting Interactions
    • Chapter 14: Moderation/Mediation
    • Chapter 15: Moderated-Mediation
    • Chapter 16: MultiLevel Models
    • Chapter 17: Mixed Models
    • Chapter 18: Mixed Assumptions Testing
  • ANOVA
    • Chapter 20: Between-Subjects
  • Advanced
    • Chapter 22: Correlations
    • Chapter 23: ARIMA
    • Chapter 24: Decision Trees
    • Chapter 25: Signal Detection
  • Apps
    • Chapter 26: Intro to Shiny
    • Chapter 27: ANOVA Variance
  • Download Rmd

Erin Sovansky Winter

Apply functions are a family of functions in base R which allow you to repetitively perform an action on multiple chunks of data. An apply function is essentially a loop, but run faster than loops and often require less code.

The apply functions that this chapter will address are apply, lapply, sapply, vapply, tapply, and mapply. There are so many different apply functions because they are meant to operate on different types of data.

First, let’s go over the basic apply function. You can use the help section to get a description of this function.

?apply

the apply function looks like this: apply(X, MARGIN, FUN).

  • X is an array or matrix (this is the data that you will be performing the function on)
  • Margin specifies whether you want to apply the function across rows (1) or columns (2)
  • FUN is the function you want to use

2.1 apply examples

my.matrx is a matrix with 1-10 in column 1, 11-20 in column 2, and 21-30 in column 3. my.matrx will be used to show some of the basic uses for the apply function.

my.matrx <- matrix(c(1:10, 11:20, 21:30), nrow = 10, ncol = 3)my.matrx
## [,1] [,2] [,3]## [1,] 1 11 21## [2,] 2 12 22## [3,] 3 13 23## [4,] 4 14 24## [5,] 5 15 25## [6,] 6 16 26## [7,] 7 17 27## [8,] 8 18 28## [9,] 9 19 29## [10,] 10 20 30

2.1.1 Example 1: Using apply to find row sums

What if I wanted to summarize the data in matrix m by finding the sum of each row? The arguments are X = m, MARGIN = 1 (for row), and FUN = sum

apply(my.matrx, 1, sum)
## [1] 33 36 39 42 45 48 51 54 57 60

The apply function returned a vector containing the sums for each row.

2.1.2 Example 2: Creating a function in the arguments

What if I wanted to be able to find how many datapoints (n) are in each column of m? I can use the length function to do this. Because we are using columns, MARGIN = 2.

apply(my.matrx, 2, length)
## [1] 10 10 10

What if instead, I wanted to find n-1 for each column? There isn’t a function in R to do this automatically, so I can create my own function. If the function is simple, you can create it right inside the arguments for apply. In the arguments I created a function that returns length - 1.

apply(my.matrx, 2, function (x) length(x)-1)
## [1] 9 9 9

As you can see, the function correctly returned a vector of n-1 for each column.

2.1.3 Example 3: Using a function defined outside of apply

If you don’t want to write a function inside of the arguments, you can define the function outside of apply, and then use that function in apply later. This may be useful if you want to have the function available to use later. In this example, a function to find standard error was created, then passed into an apply function.

st.err <- function(x){ sd(x)/sqrt(length(x))}apply(my.matrx,2, st.err)
## [1] 0.9574271 0.9574271 0.9574271

2.1.4 Example 4: Transforming data

Now for something a little different. In the previous examples, apply was used to summarize over a row or column. It can also be used to repeat a function on cells within a matrix. In this example, the apply function is used to transform the values in each cell. Pay attention to the MARGIN argument. If you set the MARGIN to 1:2 it will have the function operate on each cell.

my.matrx2 <- apply(my.matrx,1:2, function(x) x+3)my.matrx2
## [,1] [,2] [,3]## [1,] 4 14 24## [2,] 5 15 25## [3,] 6 16 26## [4,] 7 17 27## [5,] 8 18 28## [6,] 9 19 29## [7,] 10 20 30## [8,] 11 21 31## [9,] 12 22 32## [10,] 13 23 33

2.1.5 Example 5: Vectors?

The previous examples showed several ways to use the apply function on a matrix. But what if I wanted to loop through a vector instead? Will the apply function work?

vec <- c(1:10)vec
## [1] 1 2 3 4 5 6 7 8 9 10
apply(vec, 1, sum)

If you run this function it will return the error: Error in apply(v, 1, sum) : dim(X) must have a positive length. As you can see, this didn’t work because apply was expecting the data to have at least two dimensions. If your data is a vector you need to use lapply, sapply, or vapply instead.

lapply, sapply, and vapply are all functions that will loop a function through data in a list or vector. First, try looking up lapply in the help section to see a description of all three function.

?lapply

Here are the agruments for the three functions:

  • lapply(X, FUN, …)
  • sapply(X, FUN, …, simplify = TRUE, USE.NAMES = TRUE)
  • vapply(X, FUN, FUN.VALUE, …, USE.NAMES = TRUE)

In this case, X is a vector or list, and FUN is the function you want to use. sapply and vapply have extra arguments, but most of them have default values, so you don’t need to worry about them. However, vapply requires another agrument called FUN.VALUE, which we will look at later.

3.0.1 Example 1: Getting started with lapply

Earlier, we created the vector v. Let’s use that vector to test out the lapply function.

lapply(vec, sum)
## [[1]]## [1] 1## ## [[2]]## [1] 2## ## [[3]]## [1] 3## ## [[4]]## [1] 4## ## [[5]]## [1] 5## ## [[6]]## [1] 6## ## [[7]]## [1] 7## ## [[8]]## [1] 8## ## [[9]]## [1] 9## ## [[10]]## [1] 10

This function didn’t add up the values like we may have expected it to. This is because lapply applies treats the vector like a list, and applies the function to each point in the vector.

Let’s try using a list instead

A<-c(1:9)B<-c(1:12)C<-c(1:15)my.lst<-list(A,B,C)lapply(my.lst, sum)
## [[1]]## [1] 45## ## [[2]]## [1] 78## ## [[3]]## [1] 120

This time, the lapply function seemed to work better. The function summed each vector in the list and returned a list of the 3 sums.

3.0.2 Example 2: sapply

sapply works just like lapply, but will simplify the output if possible. This means that instead of returning a list like lapply, it will return a vector instead if the data is simplifiable.

sapply(vec, sum)
## [1] 1 2 3 4 5 6 7 8 9 10
sapply(my.lst, sum)
## [1] 45 78 120

See how these two examples gave the same answers, but returned a vector instead?

3.0.3 Example 3: vapply

vapply is similar to sapply, but it requires you to specify what type of data you are expecting the arguments for vapply are vapply(X, FUN, FUN.VALUE). FUN.VALUE is where you specify the type of data you are expecting. I am expecting each item in the list to return a single numeric value, so FUN.VALUE = numeric(1).

vapply(vec, sum, numeric(1))
## [1] 1 2 3 4 5 6 7 8 9 10
vapply(my.lst, sum, numeric(1))
## [1] 45 78 120

If your function were to return more than one numeric value, FUN.VALUE = numeric(1) will cause the function to return an error. This could be useful if you are expecting only one result per subject.

#vapply(my.lst, function(x) x+2, numeric(1))

3.0.4 Example 4: Transforming data with sapply

Like apply, these functions can also be used for transforming data inside the list

my.lst2 <- sapply(my.lst, function(x) x*2)my.lst2
## [[1]]## [1] 2 4 6 8 10 12 14 16 18## ## [[2]]## [1] 2 4 6 8 10 12 14 16 18 20 22 24## ## [[3]]## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

3.0.5 Which function should I use, lapply, sapply, or vapply?

If you are trying to decide which of these three functions to use, because it is the simplest, I would suggest to use sapply if possible. If you do not want your results to be simplified to a vector, lapply should be used. If you want to specify the type of result you are expecting, use vapply.

Sometimes you may want to perform the apply function on some data, but have it separated by factor. In that case, you should use tapply. Let’s take a look at the information for tapply.

?tapply

The arguments for tapply are tapply(X, INDEX, FUN). The only new argument is INDEX, which is the factor you want to use to separate the data.

4.0.1 Example 1: Means split by condition

First, let’s create data with an factor for indexing. Dataset t will be created by adding a factor to matrix m and converting it to a dataframe.

tdata <- as.data.frame(cbind(c(1,1,1,1,1,2,2,2,2,2), my.matrx))colnames(tdata)
## [1] "V1" "V2" "V3" "V4"

Now let’s use column 1 as the index and find the mean of column 2

tapply(tdata$V2, tdata$V1, mean)
## 1 2 ## 3 8

4.0.2 Example 2: Combining functions

You can use tapply to do some quick summary statistics on a variable split by condition. In this example, I created a function that returns a vector ofboth the mean and standard deviation. You can create a function like this for any apply function, not just tapply.

summary <- tapply(tdata$V2, tdata$V1, function(x) c(mean(x), sd(x)))summary
## $`1`## [1] 3.000000 1.581139## ## $`2`## [1] 8.000000 1.581139

the last apply function I will cover is mapply.

?mapply

the arguments for mapply are mapply(FUN, …, MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE). First you list the function, followed by the vectors you are using the rest of the arguments have default values so they don’t need to be changed for now. When you have a function that takes 2 arguments, the first vector goes into the first argument and the second vector goes into the second argument.

5.0.1 Example 1: Understanding mapply

In this example, 1:9 is specifying the value to repeat, and 9:1 is specifying how many times to repeat. This order is based on the order of arguments in the rep function itself.

mapply(rep, 1:9, 9:1)
## [[1]]## [1] 1 1 1 1 1 1 1 1 1## ## [[2]]## [1] 2 2 2 2 2 2 2 2## ## [[3]]## [1] 3 3 3 3 3 3 3## ## [[4]]## [1] 4 4 4 4 4 4## ## [[5]]## [1] 5 5 5 5 5## ## [[6]]## [1] 6 6 6 6## ## [[7]]## [1] 7 7 7## ## [[8]]## [1] 8 8## ## [[9]]## [1] 9

5.0.2 Example 2: Creating a new variable

Another use for mapply would be to create a new variable. For example, using dataset t, I could divide one column by another column to create a new value. This would be useful for creating a ratio of two variables as shown in the example below.

tdata$V5 <- mapply(function(x, y) x/y, tdata$V2, tdata$V4)tdata$V5
## [1] 0.04761905 0.09090909 0.13043478 0.16666667 0.20000000 0.23076923## [7] 0.25925926 0.28571429 0.31034483 0.33333333

5.0.3 Example 3: Saving data into a premade vector

When using an apply family function to create a new variable, one option is to create a new vector ahead of time with the size of the vector pre-allocated. I created a numeric vector of length 10 using the vector function. The arguments for the vector function are vector(mode, length). Inside mapply I created a function to multiple two variables together. The results of the mapply function are then saved into the vector.

new.vec <- vector(mode = "numeric", length = 10)new.vec <- mapply(function(x, y) x*y, tdata$V3, tdata$V4)new.vec
## [1] 231 264 299 336 375 416 459 504 551 600

This last section will be a few examples of using apply functions on real data.This section will make use of the MASS package, which is a collection of publicly available datasets. Please install MASS if you do not already have it. If you do not have MASS installed, you can uncomment the code below.

#install.packages("MASS")library(MASS)

load the state dataset. It contains information about all 50 states

data(state)

Let’s look at the data we will be using. We will be using the state.x77 dataset

head(state.x77)
## Population Income Illiteracy Life Exp Murder HS Grad Frost## Alabama 3615 3624 2.1 69.05 15.1 41.3 20## Alaska 365 6315 1.5 69.31 11.3 66.7 152## Arizona 2212 4530 1.8 70.55 7.8 58.1 15## Arkansas 2110 3378 1.9 70.66 10.1 39.9 65## California 21198 5114 1.1 71.71 10.3 62.6 20## Colorado 2541 4884 0.7 72.06 6.8 63.9 166## Area## Alabama 50708## Alaska 566432## Arizona 113417## Arkansas 51945## California 156361## Colorado 103766
str(state.x77)
## num [1:50, 1:8] 3615 365 2212 2110 21198 ...## - attr(*, "dimnames")=List of 2## ..$ : chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...## ..$ : chr [1:8] "Population" "Income" "Illiteracy" "Life Exp" ...

All the data in the dataset happens to be numeric, which is necessary when the function inside the apply function requires numeric data.

6.0.1 Example 1: using apply to get summary data

You can use apply to find measures of central tendency and dispersion

apply(state.x77, 2, mean)
## Population Income Illiteracy Life Exp Murder HS Grad ## 4246.4200 4435.8000 1.1700 70.8786 7.3780 53.1080 ## Frost Area ## 104.4600 70735.8800
apply(state.x77, 2, median)
## Population Income Illiteracy Life Exp Murder HS Grad ## 2838.500 4519.000 0.950 70.675 6.850 53.250 ## Frost Area ## 114.500 54277.000
apply(state.x77, 2, sd)
## Population Income Illiteracy Life Exp Murder ## 4.464491e+03 6.144699e+02 6.095331e-01 1.342394e+00 3.691540e+00 ## HS Grad Frost Area ## 8.076998e+00 5.198085e+01 8.532730e+04

6.0.2 Example 2: Saving the results of apply

In this, I created one function that gives the mean and SD, and another that give min, median, and max. Then I saved them as objects that could be used later.

state.summary<- apply(state.x77, 2, function(x) c(mean(x), sd(x))) state.summary
## Population Income Illiteracy Life Exp Murder HS Grad Frost## [1,] 4246.420 4435.8000 1.1700000 70.878600 7.37800 53.108000 104.46000## [2,] 4464.491 614.4699 0.6095331 1.342394 3.69154 8.076998 51.98085## Area## [1,] 70735.88## [2,] 85327.30
state.range <- apply(state.x77, 2, function(x) c(min(x), median(x), max(x)))state.range
## Population Income Illiteracy Life Exp Murder HS Grad Frost Area## [1,] 365.0 3098 0.50 67.960 1.40 37.80 0.0 1049## [2,] 2838.5 4519 0.95 70.675 6.85 53.25 114.5 54277## [3,] 21198.0 6315 2.80 73.600 15.10 67.30 188.0 566432

6.0.3 Example 3: Using mapply to compute a new variable

In this example, I want to find the population density for each state. In order to do this, I want to divide population by area. state.area and state.x77 are not from the same dataset, but that is fine as long as the vectors are the same length and the data is in the same order. Both vectors are alphabetically by state, so mapply can be used.

population <- state.x77[1:50]area <- state.areapop.dens <- mapply(function(x, y) x/y, population, area)pop.dens
## [1] 0.070045922 0.000618899 0.019419010 0.039733353 0.133578671## [6] 0.024374802 0.618886005 0.281477880 0.141342213 0.083752293## [11] 0.134573643 0.009729885 0.198528369 0.146399934 0.050826079## [16] 0.027715647 0.083847011 0.078437030 0.031853078 0.389713529## [21] 0.704129829 0.156503367 0.046640815 0.049061112 0.068406854## [26] 0.005070070 0.019993008 0.005337434 0.087274291 0.935809086## [31] 0.009402791 0.364611909 0.103468604 0.009014364 0.260419194## [36] 0.038830647 0.023551005 0.261619571 0.766886326 0.090677830## [41] 0.008838761 0.098783259 0.045773344 0.014166941 0.049120616## [46] 0.122038466 0.052190873 0.074397254 0.081721694 0.003840105

6.0.4 Example 4: Using tapply to explore population by region

In this example, I want to find out some information about the population of states split by region. state.region is a factor with four levels: Northeast, South, North Central, and West. For each region, I want the minimum, median, and maximum populations.

region.info <- tapply(population, state.region, function(x) c(min(x), median(x), max(x)))region.info
## $Northeast## [1] 472 3100 18076## ## $South## [1] 579.0 3710.5 12237.0## ## $`North Central`## [1] 637 4255 11197## ## $West## [1] 365 1144 21198

Here are some sources I used to help me create this chapter:

Datacamp tutorial on apply functions: https://www.datacamp.com/community/tutorials/r-tutorial-apply-family

r-bloggers: Using apply, sapply, and lapply in R: https://www.r-bloggers.com/using-apply-sapply-lapply-in-r/

stackoverflow: Why is vapply safer than sapply?: http://stackoverflow.com/questions/12339650/why-is-vapply-safer-than-sapply

*ckHgpLCBtYXgoeCkpKQ0KcmVnaW9uLmluZm8NCmBgYA0KDQojIFJlZmVyZW5jZXMNCkhlcmUgYXJlIHNvbWUgc291cmNlcyBJIHVzZWQgdG8gaGVscCBtZSBjcmVhdGUgdGhpcyBjaGFwdGVyOg0KDQpEYXRhY2FtcCB0dXRvcmlhbCBvbiBhcHBseSBmdW5jdGlvbnM6IGh0dHBzOi8vd3d3LmRhdGFjYW1wLmNvbS9jb21tdW5pdHkvdHV0b3JpYWxzL3ItdHV0b3JpYWwtYXBwbHktZmFtaWx5DQoNCnItYmxvZ2dlcnM6IFVzaW5nIGFwcGx5LCBzYXBwbHksIGFuZCBsYXBwbHkgaW4gUjogaHR0cHM6Ly93d3cuci1ibG9nZ2Vycy5jb20vdXNpbmctYXBwbHktc2FwcGx5LWxhcHBseS1pbi1yLw0KDQpzdGFja292ZXJmbG93OiBXaHkgaXMgdmFwcGx5IHNhZmVyIHRoYW4gc2FwcGx5PzogaHR0cDovL3N0YWNrb3ZlcmZsb3cuY29tL3F1ZXN0aW9ucy8xMjMzOTY1MC93aHktaXMtdmFwcGx5LXNhZmVyLXRoYW4tc2FwcGx5DQoNCg0KPHNjcmlwdD4NCiAgKGZ1bmN0aW9uKGkscyxvLGcscixhLG0pe2lbJ0dvb2dsZUFuYWx5dGljc09iamVjdCddPXI7aVtyXT1pW3JdfHxmdW5jdGlvbigpew0KICAoaVtyXS5xPWlbcl0ucXx8W10pLnB1c2goYXJndW1lbnRzKX0saVtyXS5sPTEqbmV3IERhdGUoKTthPXMuY3JlYXRlRWxlbWVudChvKSwNCiAgbT1zLmdldEVsZW1lbnRzQnlUYWdOYW1lKG8pWzBdO2EuYXN5bmM9MTthLnNyYz1nO20ucGFyZW50Tm9kZS5pbnNlcnRCZWZvcmUoYSxtKQ0KICB9KSh3aW5kb3csZG9jdW1lbnQsJ3NjcmlwdCcsJ2h0dHBzOi8vd3d3Lmdvb2dsZS1hbmFseXRpY3MuY29tL2FuYWx5dGljcy5qcycsJ2dhJyk7DQoNCiAgZ2EoJ2NyZWF0ZScsICdVQS05ODg3ODc5My0xJywgJ2F1dG8nKTsNCiAgZ2EoJ3NlbmQnLCAncGFnZXZpZXcnKTsNCg0KPC9zY3JpcHQ+DQo=

Chapter 4: apply Functions (2024)

FAQs

How to use apply() function in R? ›

the apply function looks like this: apply(X, MARGIN, FUN).
  1. X is an array or matrix (this is the data that you will be performing the function on)
  2. Margin specifies whether you want to apply the function across rows (1) or columns (2)
  3. FUN is the function you want to use.

What is the apply family of functions? ›

The apply() function is the basic model of the family of apply functions in R, which includes specific functions like lapply() , sapply() , tapply() , mapply() , vapply() , rapply() , bapply() , eapply() , and others.

What does applying Lapply function on a matrix returns ________? ›

lapply returns a list of the same length as X , each element of which is the result of applying FUN to the corresponding element of X . sapply is a user-friendly version and wrapper of lapply by default returning a vector, matrix or, if simplify = "array" , an array if appropriate, by applying simplify2array() .

What is the apply () function? ›

The apply() method is one of the most common methods of data preprocessing. It simplifies applying a function on each element in a pandas Series and each row or column in a pandas DataFrame.

How do I apply a function to an element of a list in R? ›

lapply() function in R Programming Language is used to apply a function over a list of elements. lapply() function is used with a list and performs the following operations: lapply(List, length): Returns the length of objects present in the list, List.

What is the family of functions example? ›

A family of functions is a function that changes depending on the particular values of certain parame- ters. An example of such a family of functions would be something like f(x) = a(x-b)2 +c. Depending on the values of a, b, and c, this function could take multiple forms.

What are the four functions of the family? ›

Murdock found that the nuclear family was a universal family structure because it performed four major, crucial functions for wider society. These are the educational, economic, reproductive, and sexual functions.

How to find family function? ›

If two functions have the same degree, they are said to be in the same function family because they are the same type of function, even if the numbers are different. It is also important to note that these are determined when in function notation, f(x) or y=, and that the only variables used are f(x)/y and x.

What is the apply function over matrix in R? ›

Applying Functions
  1. apply_matrix() : The functions must take a matrix as input. In base R, this is similar to simply calling fun(matrix_object) .
  2. apply_row() : The functions must take a vector as input. The vector will be a matrix row. ...
  3. apply_column() : The functions must take a vector as input.

What is the difference between apply and Lapply in R? ›

lapply , you see that the syntax looks like the apply() function. The difference is that: It can be used for other objects like dataframes, lists or vectors; and. The output returned is a list (which explains the “l” in the function name), which has the same number of elements as the object passed to it.

How to apply the same function to all rows and columns of a matrix? ›

Using the apply() Function
  1. m is the matrix.
  2. dimcode is the dimension, equal to 1 if the function applies to rows or 2 for columns.
  3. f is the function to be applied.
  4. fargs is an optional set of arguments to be supplied to f .

How to apply a function to each column in a dataframe in R? ›

Applying a Function to Multiple Columns

But what if we want to apply the same function to multiple columns in a data frame? For this, we can use the mutate_all function from the dplyr package. The mutate_all function takes a data frame as input and applies a function to all columns.

How to use which function for a vector in R? ›

Syntax of which() function in R
  1. X = An input logical vector.
  2. Arr. ind = Returns the array indices if x is an array.
  3. useNames = Indicates the dimension names of an array.
Aug 3, 2022

How to use conditional formatting in R? ›

Method
  1. Select your table.
  2. Copy the table name from General > GENERAL > Name.
  3. Select the Calculation icon. ...
  4. Go to General > R CODE in the object inspector.
  5. Paste the below CustomTable function into your output to adjust fonts, font colors, lines, shading, and the border: library(RColorBrewer)
May 31, 2024

How to use split() in R? ›

Split() is a built-in R function that divides a vector or data frame into groups according to the function's parameters. It takes a vector or data frame as an argument and divides the information into groups. The syntax for this function is as follows: split(x, f, drop = FALSE, ...)

Top Articles
Latest Posts
Article information

Author: Merrill Bechtelar CPA

Last Updated:

Views: 6579

Rating: 5 / 5 (50 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Merrill Bechtelar CPA

Birthday: 1996-05-19

Address: Apt. 114 873 White Lodge, Libbyfurt, CA 93006

Phone: +5983010455207

Job: Legacy Representative

Hobby: Blacksmithing, Urban exploration, Sudoku, Slacklining, Creative writing, Community, Letterboxing

Introduction: My name is Merrill Bechtelar CPA, I am a clean, agreeable, glorious, magnificent, witty, enchanting, comfortable person who loves writing and wants to share my knowledge and understanding with you.