rbind_listdf: Combine Large Lists of Data Frames.

Description

Quickly combine large lists of dataframes into a single data frame by combining them first into smaller data frames. This is only time- and memory-efficient when dealing with large (>100) lists of data frames.

Usage

rbind_listdf(lists = NULL, seq.by = 100)

Arguments

lists

List in which each component is a data frame. Each data frame must have the same number and types of columns, although the data frames can have different numbers of rows.

seq.by

Length of small data frames to work with at a time. Default is 100, which timing tests confirm is generally most efficient.

Value

Single data frame with number of rows equal to the sum of all data frames in the list, and columns the same as those of individual list data frames.

Details

Rather than combine all list data frames into a single data frame, this function builds smaller subsets of data frames, and then combines them together at the end. This process is more time- and memory-efficient. Timing tests confirm that seq.by = 100 is the optimal break-point size. See examples for confirmation. Function can break large lists into up to 456,976 data frame subparts, giving a warning if requires more subparts. Only time- and memory-efficient when dealing with large (>100) lists of data frames.

Examples

Run this code

# NOT RUN {
nl <- 500     # List length
lists <- vector("list", length = nl)
for(i in 1:nl) lists[[i]] <- list(x = rnorm(100), y = rnorm(100))
str(lists)
object.size(lists)
all <- rbind_listdf(lists)
str(all)
object.size(all)       # Also smaller object size

# Build blank ecospace framework to use in simulations
ecospace <- create_ecospace(nchar = 15, char.state = rep(3, 15),
                            char.type = rep("numeric", 15))

# Build 5 samples for neutral model:
nreps <- 1:5
Smax <- 10
n.samples <- lapply(X = nreps, FUN = neutral, Sseed = 3, Smax = Smax, ecospace)

# Calculate functional diversity metrics for simulated samples
n.metrics <- lapply(X = nreps, FUN = calc_metrics, samples = n.samples,
                    Model = "neutral", Param = "NA")
alarm()
str(n.metrics)

# rbind lists together into a single dataframe
all <- rbind_listdf(n.metrics)

# Calculate mean dynamics
means <- n.metrics[[1]]
for(n in 1:Smax) means[n,4:11] <- apply(all[which(all$S == means$S[n]),4:11],
                                        2, mean, na.rm = TRUE)
means

# Plot statistics as function of species richness, overlaying mean dynamics
op <- par()
par(mfrow = c(2,4), mar = c(4, 4, 1, .3))
attach(all)

plot(S, H, type = "p", cex = .75, col = "gray")
lines(means$S, means$H, type = "l", lwd = 2)
plot(S, D, type = "p", cex = .75, col = "gray")
lines(means$S, means$D, type = "l", lwd = 2)
plot(S, M, type = "p", cex = .75, col = "gray")
lines(means$S, means$M, type = "l", lwd = 2)
plot(S, V, type = "p", cex = .75, col = "gray")
lines(means$S, means$V, type = "l", lwd = 2)
plot(S, FRic, type = "p", cex = .75, col = "gray")
lines(means$S, means$FRic, type = "l", lwd = 2)
plot(S, FEve, type = "p", cex = .75, col = "gray")
lines(means$S, means$FEve, type = "l", lwd = 2)
plot(S, FDiv, type = "p", cex = .75, col = "gray")
lines(means$S, means$FDiv, type = "l", lwd = 2)
plot(S, FDis, type = "p", cex = .75, col = "gray")
lines(means$S, means$FDis, type = "l", lwd = 2)

par(op)

# }
# NOT RUN {
# Note each of following can take a few seconds to run
# Compare timings:
t0 <- Sys.time()
all <- rbind_listdf(lists)
(Sys.time() - t0)

t0 <- Sys.time()
all <- rbind_listdf(lists, seq.by = 20)
(Sys.time() - t0)

t0 <- Sys.time()
all <- rbind_listdf(lists, seq.by = 500)
(Sys.time() - t0)

# Compare to non-function version
all2 <- data.frame()
t0 <- Sys.time()
for(i in 1:nl) all2 <- rbind(all2, lists[[i]])
(Sys.time() - t0)
# }
# NOT RUN {
# Compare to data.table's 'rbindlist' version
library(data.table)
t0 <- Sys.time()
all <- rbindlist(lists)
(Sys.time() - t0)

# }

Run the code above in your browser using DataLab