loop: Apply functions to each matrix of a matrixset

Description

The apply_matrix function applies functions to each matrix of a matrixset. The apply_row/apply_column functions do the same but separately for each row/column. The functions can be applied to all matrices or only a subset.

The dfl/dfw versions differ in their output format and when possible, always return a tibble().

Empty matrices are simply left unevaluated. How that impacts the returned result depends on which flavor of apply_* has been used. See ‘Value’ for more details.

If .matrix_wise is FALSE, the function (or expression) is multivariate in the sense that all matrices are accessible at once, as opposed to each of them in turn.

See section "Multivariate".

Usage

apply_row(.ms, ..., .matrix = NULL, .matrix_wise = TRUE, .input_list = FALSE)
apply_row_dfl(
  .ms,
  ...,
  .matrix = NULL,
  .matrix_wise = TRUE,
  .input_list = FALSE,
  .force_name = FALSE
)
apply_row_dfw(
  .ms,
  ...,
  .matrix = NULL,
  .matrix_wise = TRUE,
  .input_list = FALSE,
  .force_name = FALSE
)
apply_column(
  .ms,
  ...,
  .matrix = NULL,
  .matrix_wise = TRUE,
  .input_list = FALSE
)
apply_column_dfl(
  .ms,
  ...,
  .matrix = NULL,
  .matrix_wise = TRUE,
  .input_list = FALSE,
  .force_name = FALSE
)
apply_column_dfw(
  .ms,
  ...,
  .matrix = NULL,
  .matrix_wise = TRUE,
  .input_list = FALSE,
  .force_name = FALSE
)
apply_matrix(
  .ms,
  ...,
  .matrix = NULL,
  .matrix_wise = TRUE,
  .input_list = FALSE
)
apply_matrix_dfl(
  .ms,
  ...,
  .matrix = NULL,
  .matrix_wise = TRUE,
  .input_list = FALSE,
  .force_name = FALSE
)
apply_matrix_dfw(
  .ms,
  ...,
  .matrix = NULL,
  .matrix_wise = TRUE,
  .input_list = FALSE,
  .force_name = FALSE
)

Value

A list for every matrix in the matrixset object. Each list is itself a list. For apply_matrix, it is a list of the function values - NULL if the matrix was empty. Otherwise, it is a list with one element for each row/column - these elements will be NULL if the corresponding matrix was empty. And finally, for apply_row/apply_column, each of these sub-list is a list, the results of each function.

If each function returns a vector of the same dimension, you can use either the _dfl or the _dfw version. What they do is to return a list of tibbles. The dfl version will stack the function results in a long format while the dfw version will put them side-by-side, in a wide format. An empty matrix will be returned for empty input matrices.

If the functions returned vectors of more than one element, there will be a column to store the values and one for the function ID (dfl), or one column per combination of function/result (dfw)

See the grouping section to learn about the result format in the grouping context.

Arguments

.ms

matrixset object

...

expressions, separated by commas. They can be specified in one of the following way:

a function name, e.g., mean.
a function call, where you can use .m to represent the current matrix (for apply_matrix),.ito represent the current row (forapply_row) and.j for the current column (apply_column). Bare names of object traits can be used as well. For instance, lm(.i ~ program).

The pronouns are also available for the multivariate version, under certain circumstances, but they have a different meaning. See the "Multivariate" section for more details.
a formula expression. The pronouns .m, .i and .j` can be used as well. See examples to see the usefulness of this.

The expressions can be named; these names will be used to provide names to the results.

.matrix

matrix indices of which matrix to apply functions to. The default, NULL, means all the matrices are used.

If not NULL, index is numeric or character vectors.

Numeric values are coerced to integer as by as.integer() (and hence truncated towards zero).

Character vectors will be matched to the matrix names of the object.

Can also be logical vectors, indicating elements/slices to replace. Such vectors are NOT recycled, which is an important difference with usual matrix replacement. It means that the logical vector must match the number of matrices in length.

Can also be negative integers, indicating elements/slices to leave out of the replacement.

.matrix_wise

logical. By default (TRUE), matrices are provided one by one, in turn, to the functions/expressions. But if .matrix_wise is FALSE, the functions/expressions have access to all matrices. See "Multivariate" for details, including how to reference the matrices.

.input_list

logical. If multivariate (.matrix_wise == FALSE), the matrices are provided as a single list, where each element is a matrix (or matrix row or column). The list elements are the matrix names.

.force_name

logical. Used only for the simplified output versions (dfl/dfw). By default (FALSE), function IDs will be provided only if the function outcome is a vector of length 2 or more. If .force_name is TRUE then function IDs are provided in all situations.

This can be useful in situation of grouping. As the functions are evaluated independently within each group, there could be situations where function outcomes are of length 1 for some groups and lenght 2 or more in other groups.

See examples.

Pronouns

The rlang pronouns .data and .env are available. Two scenarios for which they can be useful are:

The annotation names are stored in a character variable. You can make use of the variable by using .data[[var]]. See the example for an illustration of this.
You want to make use of a global variable that has the same name as an annotation. You can use .env[[var]] or .env$var to make sure to use the proper variable.

The matrixset package defines its own pronouns: .m, .i and .j, which are discussed in the function specification argument (...).

It is not necessary to import any of the pronouns (or load rlang in the case of .data and .env) in a interactive session.

It is useful however when writing a package to avoid the R CMD check notes. As needed, you can import .data and .env (from rlang) or any of .m, .i or .j from matrixset.

Multivariate

The default behavior is to apply a function or expression to a single matrix and each matrices of the matrixset object are provided sequentially to the function/expression.

If .matrix_wise is FALSE, all matrices are provided at once to the functions/expressions. They can be provided in two fashions:

separately (default behavior). Each matrix can be referred by .m1, ..., .mn, where n is the number of matrices. Note that this is the number as determined by .matrix.

For apply_row (and dfl/dfw variants), use .i1, .i2 and so on instead. What the functions/expressions have access to in this case is the first row of the first matrix, the first row of the second matrix and so on. Then, continuing the loop, the second row of each matrix will be accessible, and so on

Similarly, use .j1 and so on for the apply_column family.

Anonymous functions will be understood as a function with multiple arguments. In the example apply_row(ms, mean, .matrix_wise = FALSE), if there are 3 matrices in the ms object, mean is understood as mean(.i1, .i2, .i3). Note that this would fail because of the mean function.
In a list (.list_input = TRUE). The list will have an element per matrix. The list can be referred using the same pronouns (.m, .i, .j), and the matrix, by the matrix names or position.

For the multivariate setting, empty matrices are given as is, so it is important that provided functions can deal with such a scenario. An alternative is to skip the empty matrices with the .matrix argument.

Grouped matrixsets

If groups have been defined, functions will be evaluated within them. When both row and column grouping has been registered, functions are evaluated at each cross-combination of row/column groups.

The output format is different when the .ms matrixset object is grouped. A list for every matrix is still returned, but each of these lists now holds a tibble.

Each tibble has a column called .vals, where the function results are stored. This column is a list, one element per group. The group labels are given by the other columns of the tibble. For a given group, things are like the ungrouped version: further sub-lists for rows/columns - if applicable - and function values.

The dfl/dfw versions are more similar in their output format to their ungrouped version. The format is almost identical, except that additional columns are reported to identify the group labels.

See the examples.

Examples

Run this code

# The firs example takes the whole matrix average, while the second takes
# every row average
(mn_mat <- apply_matrix(student_results, mean))
(mn_row <- apply_row(student_results, mean))

# More than one function can be provided. It's a good idea in this case to
# name them
(mn_col <- apply_column(student_results, avr=mean, med=median))

# the dfl/dfw versions returns nice tibbles - if the functions return values
# of the same length.
(mn_l <- apply_column_dfl(student_results, avr=mean, med=median))
(mn_w <- apply_column_dfw(student_results, avr=mean, med=median))

# There is no difference between the two versions for length-1 vector results.
# hese will differ, however
(rg_l <- apply_column_dfl(student_results, rg=range))
(rg_w <- apply_column_dfw(student_results, rg=range))

# More complex examples can be used, by using pronouns and data annotation
(vals <- apply_column(student_results, avr=mean, avr_trim=mean(.j, trim=.05),
                                      reg=lm(.j ~ teacher)))

# You can wrap complex function results, such as for lm, into a list, to use
# the dfl/dfr version
(vals_tidy <- apply_column_dfw(student_results, avr=mean, avr_trim=mean(.j, trim=.05),
                                               reg=list(lm(.j ~ teacher))))

# You can provide complex expressions by using formulas
(r <- apply_column(student_results,
                                  res= ~ {
                                    log_score <- log(.j)
                                    p <- predict(lm(log_score ~ teacher + class))
                                    .j - exp(p)
                                  }))

# the .data pronoun can be useful to use names stored in variables
fn <- function(nm) {
  if (!is.character(nm) && length(nm) != 1) stop("this example won't work")
  apply_column(student_results, lm(.j ~ .data[[nm]]))
}
fn("teacher")

# You can use variables that are outside the scope of the matrixset object.
# You don't need to do anything special if that variable is not named as an
# annotation
pass_grade <- 0.5
(passed <- apply_row_dfw(student_results, pass = ~ .i >= pass_grade))

# use .env if shares an annotation name
previous_year_score <- 0.5
(passed <- apply_row_dfw(student_results, pass = ~ .i >= .env$previous_year_score))

# Grouping structure makes looping easy. Look at the output format
cl_prof_gr <- row_group_by(student_results, class, teacher)
(gr_summ <- apply_column(cl_prof_gr, avr=mean, med=median))
(gr_summ_tidy <- apply_column_dfw(cl_prof_gr, avr=mean, med=median))
# to showcase how we can play with format
(gr_summ_tidy_long <- apply_column_dfl(cl_prof_gr, summ = ~ c(avr=mean(.j), med=median(.j))))

# It is even possible to combine groupings
cl_prof_program_gr <- column_group_by(cl_prof_gr, program)
(mat_summ <- apply_matrix(cl_prof_program_gr, avr = mean, med = median, rg = range))
# it doesn' make much sense, but this is to showcase format
(summ_gr <- apply_matrix(cl_prof_program_gr, avr = mean, med = median, rg = range))
(summ_gr_long <- apply_column_dfl(cl_prof_program_gr,
                                 ct = ~ c(avr = mean(.j), med = median(.j)),
                                 rg = range))
(summ_gr_wide <- apply_column_dfw(cl_prof_program_gr,
                                 ct = c(avr = mean(.j), med = median(.j)),
                                 rg = range))


# This is an example where you may want to use the .force_name argument
(apply_matrix_dfl(column_group_by(student_results, program), FC = colMeans(.m)))
(apply_matrix_dfl(column_group_by(student_results, program), FC = colMeans(.m),
                  .force_name = TRUE))

Run the code above in your browser using DataLab