fmode: Fast (Grouped, Weighted) Statistical Mode for Matrix-Like Objects

Description

fmode is a generic function and returns the (column-wise) statistical mode i.e. the most frequent value of x, (optionally) grouped by g and/or weighted by w. The TRA argument can further be used to transform x using its (grouped, weighted) mode.

Usage

fmode(x, ...)
# S3 method for default
fmode(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
      use.g.names = TRUE, ...)
# S3 method for matrix
fmode(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
      use.g.names = TRUE, drop = TRUE, ...)
# S3 method for data.frame
fmode(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
      use.g.names = TRUE, drop = TRUE, ...)
# S3 method for grouped_df
fmode(x, w = NULL, TRA = NULL, na.rm = TRUE,
      use.g.names = FALSE, keep.group_vars = TRUE, keep.w = TRUE, ...)

Arguments

a vector, matrix, data.frame or grouped tibble (dplyr::grouped_df).

a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x.

a numeric vector of (non-negative) weights, may contain missing values.

TRA

an integer or quoted operator indicating the transformation to perform: 1 - "replace_fill" | 2 - "replace" | 3 - "-" | 4 - "-+" | 5 - "/" | 6 - "%" | 7 - "+" | 8 - "*" | 9 - "%%" | 10 - "-%%". See TRA.

na.rm

logical. Skip missing values in x. Defaults to TRUE and implemented at very little computational cost. If na.rm = FALSE, NA is treated as any other value.

use.g.names

make group-names and add to the result as names (vector method) or row-names (matrix and data.frame method). No row-names are generated for data.tables and grouped tibbles.

drop

matrix and data.frame method: drop dimensions and return an atomic vector if g = NULL and TRA = NULL.

keep.group_vars

grouped_df method: Logical. FALSE removes grouping variables after computation.

keep.w

grouped_df method: Logical. Retain sum of weighting variable after computation (if contained in grouped_df).

...

arguments to be passed to or from other methods.

Value

The statistical mode of x, grouped by g, or (if TRA is used) x transformed by its mode, grouped by g. See also Details.

Details

fmode implements a pretty fast algorithm to find the statistical mode utilizing index- hashing implemented in the Rcpp::sugar::IndexHash class.

If all values are distinct, the first value is returned. If there are multiple distinct values having the top frequency, the first value established as having the top frequency when passing through the data from element 1 to element n is returned. If na.rm = FALSE, NA is not removed but treated as any other value (i.e. it's frequency is counted). If all values are NA, NA is always returned.

The weighted mode is computed by summing up the weights for all distinct values and choosing the value with the largest sum. If na.rm = TRUE, missing values will be removed from both x and w i.e. utilizing only x[complete.cases(x,w)] and w[complete.cases(x,w)].

This all seamlessly generalizes to grouped computations, which are currently performed by mapping the data to a sparse-array directed by g and then going group-by group.

fmode preserves all the attributes of the objects it is applied to (apart from names or row-names which are adjusted as necessary). If a data frame is passed to fmode and drop = TRUE, base::unlist will be called on the result, which might or might not be sensible depending on the data at hand.

Examples

Run this code

# NOT RUN {
## World Development Data
attach(wlddev)
## default vector method
fmode(PCGDP)                    # Numeric mode
fmode(PCGDP, iso3c)             # Grouped numeric mode
fmode(PCGDP, iso3c, LIFEEX)     # Grouped and weighted numeric mode
fmode(region)                   # Factor mode
fmode(date)                     # Date mode (defaults to first value since panel is balanced)
fmode(country)                  # Character mode (also defaults to first value)
fmode(OECD)                     # Logical mode
                                # ...all the above can also be performed grouped and weighted
## matrix method
m <- qM(airquality)
fmode(m)
fmode(m, na.rm = FALSE)         # NA frequency is also counted
fmode(m, airquality$Month)      # Groupwise
fmode(m, w = airquality$Day)    # Weighted: Later days in the month are given more weight
fmode(m>50, airquality$Month)   # Groupwise logical mode
                                # etc ...
## data.frame method
fmode(wlddev)                   # Gives one row
fmode(wlddev, drop = TRUE)      # calling unlist -> coerce to character vector
fmode(wlddev, iso3c)            # Grouped mode
fmode(wlddev, iso3c, LIFEEX)    # Grouped and weighted mode

detach(wlddev)
# }

Run the code above in your browser using DataLab