unlist2d: Recursive Row-Binding / Unlisting in 2D - to Data Frame

Description

unlist2d efficiently unlists lists of regular R objects (objects built up from atomic elements) and creates a data.frame representation of the list. It is a faithful 2-dimensional generalization of base::unlist, and can also be understood as a recursive generalization of do.call(rbind, l), for lists of vectors, data.frames, arrays or heterogeneous objects.

Usage

unlist2d(l, idcols = ".id", row.names = FALSE, recursive = TRUE,
         id.factor = FALSE, DT = FALSE)

Arguments

a unlistable list, see is.unlistable.

idcols

a character stub or a vector of names for id-columns automatically added - one for each level of nesting in l. By default the stub is ".id", so columns will be of the form ".id.1", ".id.2", etc... . if idcols = TRUE, the stub is also set to ".id". If idcols = FALSE, id-columns are omitted. The content of the id columns are the list names, or (if missing) integers for the list elements. Missing elements in asymmetric nested structures are filled up with NA. See examples.

row.names

TRUE extracts row names from all the objects in l (where available) and adds them to the output in a column named "row.names". Alternatively, a column name i.e. row.names = "file" can be supplied.

recursive

if FALSE, only process the lowest (deepest) level of l.

id.factor

if TRUE and idcols != FALSE, create id columns as ordered factors instead of character or integer vectors. This is useful if id's are used for further analysis e.g. as inputs to ggplot2.

if TRUE, return a data.table, not a data.frame.

Value

A data.frame or (if DT = TRUE) data.table.

Details

The data.frame representation created by unlist2d is built as follows:

Recurse down to the lowest level of the list-tree, data.frames are exempted and treated as a final elements.
Check out the objects, if they are vectors, matrices or arrays convert them to data.frame (in the case of atomic vectors each element becomes a column).
Row-bind these data.frame's using data.table's rbindlist function. Columns are matched by name. If the number of columns differ, fill empty spaces with NA's. Create an id-column on the left, filled with the object names or indices (if unnamed). If row.names = TRUE, store row.names of the objects (if available) in a separate column.
Move up to the next higher level of the list-tree and repeat: Convert atomic objects to data.frame and row-bind while matching all columns and filling unmatched ones with NA's. Create another id-column for each level of nesting passed through. If the list-tree is asymmetric, fill empty spaces in lower-level id columns with NA's.

The result of this iterative procedure is a single data.frame containing on the left side id-columns for each level of nesting (from higher to lower level), followed by a column containing all the row.names of the objects if row.names = TRUE, followed by the object columns, matched at each level of recursion. Optimal results are of course obtained with symmetric lists of arrays, matrices or data.frames, which unlist2d nicely converts to a beautiful data.frame ready for plotting or further analysis.

Examples

Run this code

# NOT RUN {
## basic examples:
l <- list(mtcars, list(mtcars, mtcars))
unlist2d(l)
unlist2d(rapply2d(l, fmean))
l = list(a = qM(mtcars[1:8]),
         b = list(c = mtcars[4:11], d = list(e = mtcars[2:10], f = mtcars)))
unlist2d(l, row.names = TRUE)
unlist2d(rapply2d(l, fmean))
unlist2d(rapply2d(l, fmean), recursive = FALSE)

## Groningen Growth and Development Center 10-Sector Database
head(GGDC10S) # See ?GGDC10S
namlab(GGDC10S, class = TRUE)

# Panel-Summarize this data by Variable (Emloyment and Value Added)
l <- qsu(GGDC10S, by = ~ Variable,             # Output as list (instead of 4D array)
         pid = ~ Variable + Country,
         cols = 6:16, array = FALSE)
str(l)                                         # A list of 2-levels with matrices of statistics
head(unlist2d(l))                              # Default output, missing the variables (row-names)
head(unlist2d(l, row.names = TRUE))            # Here we go, but this is still not very nice
head(unlist2d(l, idcols = c("Sector","Trans"), # Now this is looking pretty good
              row.names = "Variable"))

dat <- unlist2d(l, c("Sector","Trans"),        # Id-columns can also be generated as ordered factors
                "Variable", id.factor = TRUE)
str(dat)

# Split this sectoral data, first by Variable (Emloyment and Value Added), then by Country
sdat <- rapply2d(split(GGDC10S[c(1,6:16)], GGDC10S$Variable), function(x) split(x[-1],x[[1]]))

# Compute pairwise correlations between sectors and recombine:
dat <- unlist2d(rapply2d(sdat, pwcor),
                idcols = c("Variable","Country"),
                row.names = "Sector")
head(dat)
plot(hclust(as.dist(1-pwcor(dat[-(1:3)]))))    # Using corrs. as distance metric to cluster sectors

# Together with other functions like psmat, unlist2d can also effectively help reshape data:
head(unlist2d(psmat(subset(GGDC10S, Variable == "VA"), ~Country, ~Year, cols = 6:16, array = FALSE),
              idcols = "Sector", row.names = "Country"))


# }

Run the code above in your browser using DataLab