fmean: Fast (Grouped, Weighted) Mean for Matrix-Like Objects

Description

fmean is a generic function that computes the (column-wise) mean of x, (optionally) grouped by g and/or weighted by w. The TRA argument can further be used to transform x using its (grouped, weighted) mean.

Usage

fmean(x, ...)
# S3 method for default
fmean(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
      use.g.names = TRUE, ...)
# S3 method for matrix
fmean(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
      use.g.names = TRUE, drop = TRUE, ...)
# S3 method for data.frame
fmean(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
      use.g.names = TRUE, drop = TRUE, ...)
# S3 method for grouped_df
fmean(x, w = NULL, TRA = NULL, na.rm = TRUE,
      use.g.names = FALSE, keep.group_vars = TRUE, keep.w = TRUE, ...)

Arguments

a numeric vector, matrix, data.frame or grouped tibble (dplyr::grouped_df).

a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x.

a numeric vector of (non-negative) weights, may contain missing values.

TRA

an integer or quoted operator indicating the transformation to perform: 1 - "replace_fill" | 2 - "replace" | 3 - "-" | 4 - "-+" | 5 - "/" | 6 - "%" | 7 - "+" | 8 - "*" | 9 - "%%" | 10 - "-%%". See TRA.

na.rm

logical. Skip missing values in x. Defaults to TRUE and implemented at very little computational cost. If na.rm = FALSE a NA is returned when encountered.

use.g.names

make group-names and add to the result as names (vector method) or row-names (matrix and data.frame method). No row-names are generated for data.tables and (default) grouped tibbles.

drop

matrix and data.frame method: drop dimensions and return an atomic vector if g = NULL and TRA = NULL.

keep.group_vars

grouped_df method: Logical. FALSE removes grouping variables after computation.

keep.w

grouped_df method: Logical. Retain summed weighting variable after computation (if contained in grouped_df).

...

arguments to be passed to or from other methods.

Value

The (w weighted) mean of x, grouped by g, or (if TRA is used) x transformed by its mean, grouped by g.

Details

Missing-value removal as controlled by the na.rm argument is done very efficiently by simply skipping them in the computation (thus setting na.rm = FALSE on data with no missing values doesn't give extra speed). Large performance gains can nevertheless be achieved in the presence of missing values if na.rm = FALSE, since then the corresponding computation is terminated once a NA is encountered and NA is returned (unlike base::mean which just runs through without any checks).

The weighted mean is computed as sum(x * w) / sum(w). If na.rm = TRUE, missing values will be removed from both x and w i.e. utilizing only x[complete.cases(x,w)] and w[complete.cases(x,w)].

This all seamlessly generalizes to grouped computations, which are performed in a single pass (without splitting the data) and therefore extremely fast.

When applied to data frame's with groups or drop = FALSE, fmean preserves all column attributes (such as variable labels) but does not distinguish between classed and unclassed object (thus applying fmean to a factor column will give a 'malformed factor' error). The attributes of the data frame itself are also preserved.

Examples

Run this code

# NOT RUN {
## default vector method
mpg <- mtcars$mpg
fmean(mpg)                         # Simple mean
fmean(mpg, w = mtcars$hp)          # Weighted mean: Weighted by hp
fmean(mpg, TRA = "-")              # Simple transformation: demeaning (See also ?W)
fmean(mpg, mtcars$cyl)             # Grouped mean
fmean(mpg, mtcars[8:9])            # another grouped mean.
g <- GRP(mtcars[c(2,8:9)])
fmean(mpg, g)                      # Pre-computing groups speeds up the computation
fmean(mpg, g, mtcars$hp)           # Grouped weighted mean
fmean(mpg, g, TRA = "-")           # Demeaning by group
fmean(mpg, g, mtcars$hp, "-")      # Group-demeaning using weighted group means

## data.frame method
fmean(mtcars)
fmean(mtcars, g)
fmean(fgroup_by(mtcars, cyl, vs, am))  # another way of doing it...
fmean(mtcars, g, TRA = "-") # etc...


## matrix method
m <- qM(mtcars)
fmean(m)
fmean(m, g)
fmean(m, g, TRA = "-") # etc...

## method for grouped tibbles - for use with dplyr
library(dplyr)
mtcars %>% group_by(cyl,vs,am) %>% fmean           # Ordinary
mtcars %>% group_by(cyl,vs,am) %>% fmean(hp)       # Weighted
mtcars %>% group_by(cyl,vs,am) %>% fmean(hp,"-")   # Weighted Transform
mtcars %>% group_by(cyl,vs,am) %>%
           select(mpg,hp) %>% fmean(hp,"-")        # Only mpg

mtcars %>% fgroup_by(cyl,vs,am) %>%              # Equivalent but faster !!
           fselect(mpg,hp) %>% fmean(hp,"-")
# }

Run the code above in your browser using DataLab