Functional Programming with R-Base - Pure and Apply Functions

This tutorial introduces functional programming (FP) in R using only base R, focusing on pure functions and the apply family of functions (lapply, sapply, apply, etc.). Functional programming emphasizes pure functions, immutability, and declarative coding, avoiding side effects and mutable state. By the end, you’ll understand how to write FP-style code in base R with practical examples.

What is Functional Programming?

Functional programming is a paradigm that:

Treats computation as evaluating mathematical functions.
Uses pure functions (same input always gives same output, no side effects).
Avoids mutable state (changing variables).
Leverages higher-order functions (functions that take or return functions).

In R, base functions like lapply and sapply support FP by enabling declarative iteration over data.

Pure Functions in R

A pure function:

Always produces the same output for the same input.
Has no side effects (e.g., no modifying global variables, no I/O operations).

Code

# Pure function: Squares a number
square <- function(x) {
  x * x
}
square(4) # Returns 16, always for input 4

[1] 16

Impure Function (Avoid in FP)

Code

# Impure: Modifies global variable
counter <- 0
impure_add <- function(x) {
  counter <<- counter + 1
  x + counter
}
impure_add(5) # Output depends on counter, not just input

[1] 6

Tip: Ensure functions rely only on their arguments and return results without altering external state.

The Apply Family for Functional Programming

Base R’s apply functions are key to FP, allowing you to apply a function to elements of a data structure declaratively, avoiding loops and mutable state.

Key Apply Functions

The apply family consists of vectorized functions which minimize our need to explicitly create loops. These family is an inbuilt R package, so no need to install any packages for the execution.

apply() for matrices and data frames
lapply() for lists…output as list
sapply() for lists…output simplified
tapply() for vectors
mapply() for multi-variant

apply

apply() returns a vector or array or list of values obtained by applying a function to margins of an array or matrix or dataframe. Using apply() is not faster than using a loop function, but it is highly compact and can be written in one line.

apply(x,MARGIN, FUN,…)

Where:

x is the matrix, dataframe or array
MARGIN is a vector giving the subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns, c(1, 2) indicates rows and columns.
FUN is the function to be applied
… is for any other arguments to be passed to the function

Code

# Crate a dataframe
df <- cbind(x1 = 1:8, x2 = 2:9, x3=3:10)
# add row names
dimnames(df)[[1]] <- letters[1:8]

Let’s calculate column mean:

Code

apply(df, 2, mean, trim = 0.2)

 x1  x2  x3 
4.5 5.5 6.5

Row mean:

Code

apply(df, 1, mean, trim = .2)

a b c d e f g h 
2 3 4 5 6 7 8 9

Get column quantile:

Code

apply(df, 2, quantile, probs = c(0.10, 0.25, 0.50, 0.75, 0.90))

      x1   x2   x3
10% 1.70 2.70 3.70
25% 2.75 3.75 4.75
50% 4.50 5.50 6.50
75% 6.25 7.25 8.25
90% 7.30 8.30 9.30

lapply

lapply() returns a list of the same length as X (list), each element of which is the result of applying FUN to the corresponding element of X. It loops over a list, iterating over each element in that list and then applies a function to each element of the list and finally returns a list (l stand for list).

lapply(x, FUN, …)

Where:

x is the list
FUN is the function to be applied
… is for any other arguments to be passed to the function

Code

# Create a list
mylist<-list(A=matrix(1:9,nrow=3),B=1:5,C=c(8,5),  logic = c(TRUE,FALSE,FALSE,TRUE, TRUE))
mylist

$A
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

$B
[1] 1 2 3 4 5

$C
[1] 8 5

$logic
[1]  TRUE FALSE FALSE  TRUE  TRUE

Code

lapply(mylist, mean)

$A
[1] 5

$B
[1] 3

$C
[1] 6.5

$logic
[1] 0.6

You can see how the results are saved as a list form. We can easily unlist the results:

Code

unlist(lapply(mylist,mean))

    A     B     C logic 
  5.0   3.0   6.5   0.6

sapply

sapply() is a wrapper of lapply() to simplify the result to vector or matrix.

Code

sapply(mylist, mean)

    A     B     C logic 
  5.0   3.0   6.5   0.6

tapply

tapply() is used to apply a function over subsets of a vector when a dataset can be broken up into groups (via categorical variables - aka factors)

Code

my.df <- data.frame(
  Landcover = rep(c("Forest", "Grassland", "Wetland"), each = 4),
  Site = rep(1:4, times = 3),
  pH = c(5.5, 6.0, 5.8, 6.2, 7.0, 6.8, 7.2, 7.1, 6.5, 6.7, 6.9, 7.0),
  SOC = c(3.2, 3.5, 3.1, 3.3, 2.5, 2.7, 2.6, 2.8, 4.0, 4.1, 4.2, 4.3)
)

We can use tapply() to calculate mean values of pH an SOC for land cover

Code

apply(my.df[3:4], 2, function(x) tapply(x, my.df$Landcover, mean))

             pH   SOC
Forest    5.875 3.275
Grassland 7.025 2.650
Wetland   6.775 4.150

mapply

mapply() is a multivariate version of sapply(). mapply() applies FUN to the first elements of each … argument, the second elements, the third elements, and so on.

Code

list( rep(2, 4), rep(3, 3), rep(4, 2))

[[1]]
[1] 2 2 2 2

[[2]]
[1] 3 3 3

[[3]]
[1] 4 4

You can see that the same function (rep) is being called repeatedly where the first argument (number vector) varies from 2 to 4, and the second argument (rep) varies from 4 to 2. Instead, you can use mapply()

Code

mapply(rep, 2:4, 4:2)

[[1]]
[1] 2 2 2 2

[[2]]
[1] 3 3 3

[[3]]
[1] 4 4

Writing Pure Functions with Apply

To align with FP, ensure the functions you pass to apply functions are pure. Here’s a practical example:

Process a Vector of Names:

Convert names to uppercase and sort them (like the JavaScript example from earlier).

Code

users <- c("alice", "bob", "charlie")
process_users <- function(users) {
  sort(sapply(users, toupper))
}
process_users(users) # Returns c("ALICE", "BOB", "CHARLIE")

    alice       bob   charlie 
  "ALICE"     "BOB" "CHARLIE"

sapply(users, toupper) applies the pure toupper function to each name.
sort() is pure, producing a new sorted vector without modifying the input.

Avoiding Side Effect

Avoid functions that print, modify global state, or perform I/O inside apply calls. Example:

Bad: Side Effect in Function

Code

bad_function <- function(x) {
  print(paste("Processing:", x)) # Side effect
  toupper(x)
}
sapply(users, bad_function) # Prints to console, not FP-friendly

[1] "Processing: alice"
[1] "Processing: bob"
[1] "Processing: charlie"

    alice       bob   charlie 
  "ALICE"     "BOB" "CHARLIE"

Good: Pure Function

Code

good_function <- function(x) {
  toupper(x)
}
result <- sapply(users, good_function)
print(result) # Side effect handled outside

    alice       bob   charlie 
  "ALICE"     "BOB" "CHARLIE"

Immutability in Base R

R doesn’t enforce immutability, but you can practice it by avoiding in-place modifications. Instead of changing a vector, create a new one:

Code

numbers <- c(1, 2, 3)
doubled <- sapply(numbers, function(x) x * 2) # New vector: c(2, 4, 6)
# Original 'numbers' unchanged

Function Composition

Combine pure functions to build complex logic. Base R doesn’t have a composition operator, but you can nest functions:

Code

add_one <- function(x) x + 1
double <- function(x) x * 2
composed <- function(x) add_one(double(x))
composed(5) # Returns 11 (double(5) = 10, add_one(10) = 11)

[1] 11

Real-World Example: Data Frame Processing

Process a data frame to filter rows and transform values using FP principles in base R.

Code

# Create a data frame
data <- data.frame(
  name = c("Alice", "Bob", "Charlie"),
  age = c(25, 30, 35)
)

# Function to filter ages > 25 and uppercase names
process_data <- function(df) {
  # Filter rows (like dplyr::filter)
  filtered <- df[df$age > 25, ]
  # Uppercase names
  filtered$name <- sapply(filtered$name, toupper)
  filtered
}
process_data(data) # Returns data frame with Bob and Charlie, names in uppercase

     name age
2     BOB  30
3 CHARLIE  35

Uses sapply for transformation and subsetting for filtering, both avoiding mutation of the original data.

Recursion in Base R

FP favors recursion over loops. Example: Calculate factorial recursively:

Code

factorial <- function(n) {
  if (n <= 1) return(1)
  n * factorial(n - 1)
}
factorial(5) # Returns 120

[1] 120

Note: R lacks tail recursion optimization, so use lapply/sapply for large datasets to avoid stack overflow.

Tips for FP in Base R

Keep Functions Pure: Ensure functions passed to lapply/sapply depend only on inputs.
Use Anonymous Functions: Pass function(x) ... to apply functions for one-off transformations.
```
sapply(numbers, function(x) x + 1) # Returns c(2, 3, 4)
```
Simplify with sapply: Use sapply when you want a vector output instead of a list from lapply.
Combine Functions: Nest apply calls for complex transformations, e.g., sapply(lapply(...), ...).

Limitations

Performance: Creating new objects for immutability can be memory-intensive.
Verbosity: Base R’s FP tools are less concise than purrr’s.
Side Effects: Real-world tasks (e.g., plotting) require side effects; isolate them outside pure functions.

Summary and Conclusion

Functional programming in base R leverages pure functions and the apply family (lapply, sapply, apply) to write declarative, predictable code. By avoiding side effects, practicing immutability, and using higher-order functions, you can create modular and testable programs. Start with small transformations (e.g., mapping over vectors) and scale to data frame processing for robust FP workflows.

These functions provide potent tools for applying operations across vectors, matrices, or data frames, streamlining complex operations, and enhancing code readability.

Resources

Advanced R by Hadley Wickham: https://adv-r.hadley.nz/functional-programming.html
R Documentation: Search ?lapply or ?sapply in R for official details.