
Last chance! 50% off unlimited learning
Sale ends in
TRA
is an S3 generic that efficiently transforms data by either (column-wise) replacing data values with supplied statistics or sweeping the statistics out of the data. TRA
supports grouped sweeping and replacing operations, and is thus a generalization of sweep
.
TRA(x, STATS, FUN = "-", ...)# S3 method for default
TRA(x, STATS, FUN = "-", g = NULL, ...)
# S3 method for matrix
TRA(x, STATS, FUN = "-", g = NULL, ...)
# S3 method for data.frame
TRA(x, STATS, FUN = "-", g = NULL, ...)
# S3 method for grouped_df
TRA(x, STATS, FUN = "-", keep.group_vars = TRUE, ...)
a atomic vector, matrix, data frame or grouped data frame (class 'grouped_df').
a matching set of summary statistics. See Details and Examples.
an integer or character string indicating the operation to perform. There are 10 supported operations:
Int. | String | Description | ||
1 | "replace_fill" | replace and overwrite missing values in x |
||
2 | "replace" | replace but preserve missing values in x |
||
3 | "-" | subtract (i.e. center) | ||
4 | "-+" | subtract group-statistics but add group-frequency weighted average of group statistics (i.e. center on overall average statistic) | ||
5 | "/" | divide (i.e. scale. For mean-preserving scaling see also fscale ) |
||
6 | "%" | compute percentages (i.e. divide and multiply by 100) | ||
7 | "+" | add | ||
8 | "*" | multiply | ||
9 | "%%" | modulus (i.e. remainder from division by STATS ) |
grouped_df method: Logical. FALSE
removes grouping variables after computation. See Details and Examples.
arguments to be passed to or from other methods.
x
with columns replaced or swept out using STATS
, (optionally) grouped by g
.
Without groups (g = NULL
), TRA
is nothing more than a column based version of sweep
, albeit 4-times more efficient on matrices and many times more efficient on data frames. In this case all methods support an atomic vector of statistics of length NCOL(x)
passed to STATS
. The matrix and data frame methods also support a 1-row matrix or 1-row data frame / list, respectively. TRA
always preserves all attributes of x
.
With groups passed to g
, STATS
needs to be of the same type as x
and of appropriate dimensions [such that NCOL(x) == NCOL(STATS)
and NROW(STATS)
equals the number of groups (i.e. the number of levels if g
is a factor)]. If this condition is satisfied, TRA
will assume that the first row of STATS
is the set of statistics computed on the first group/level of g
, the second row on the second group/level etc. and do groupwise replacing or sweeping out accordingly.
For example Let x = c(1.2, 4.6, 2.5, 9.1, 8.7, 3.3)
, g is an integer vector in 3 groups g = c(1,3,3,2,1,2)
and STATS = fmean(x,g) = c(4.95, 6.20, 3.55)
. Then out = TRA(x,STATS,"-",g) = c(-3.75, 1.05, -1.05, 2.90, 3.75, -2.90)
[same as fmean(x, g, TRA = "-")
] does the equivalent of the following for-loop: for(i in 1:6) out[i] = x[i] - STATS[g[i]]
.
Correct computation requires that g
as used in fmean
and g
passed to TRA
are exactly the same vector. Using g = c(1,3,3,2,1,2)
for fmean
and g = c(3,1,1,2,3,2)
for TRA
will not give the right result. The safest way of programming with TRA
is thus to repeatedly employ the same factor or GRP
object for all grouped computations. Atomic vectors passed to g
will be converted to factors (see qF
) and lists will be converted to GRP
objects. This is also done by all Fast Statistical Functions and by default by BY
, thus together with these functions, TRA
can also safely be used with atomic- or list-groups. Problems may arise if functions from other packages internally group atomic vectors or lists in a non-sorted way. [Note: as.factor
conversions are ok as this also involves sorting.]
If x
is a grouped data frame ('grouped_df'), TRA
matches the columns of x
and STATS
and also checks for grouping columns in x
and STATS
. TRA.grouped_df
will then only transform those columns in x
for which matching counterparts were found in STATS
(exempting grouping columns) and return x
again (with columns in the same order). If keep.group_vars = FALSE
, the grouping columns are dropped after computation, however the "groups" attribute is not dropped (it can be removed using fungroup()
or dplyr::ungroup()
).
sweep
, Fast Statistical Functions, Data Transformations, Collapse Overview
# NOT RUN {
v <- iris$Sepal.Length # A numeric vector
f <- iris$Species # A factor
dat <- num_vars(iris) # Numeric columns
m <- qM(dat) # Matrix of numeric data
head(TRA(v, fmean(v))) # Simple centering [same as fmean(v, TRA = "-") or W(v)]
head(TRA(m, fmean(m))) # [same as sweep(m, 2, fmean(m)), fmean(m, TRA = "-") or W(m)]
head(TRA(dat, fmean(dat))) # [same as fmean(dat, TRA = "-") or W(dat)]
head(TRA(v, fmean(v), "replace")) # Simple replacing [same as fmean(v, TRA = "replace") or B(v)]
head(TRA(m, fmean(m), "replace")) # [same as sweep(m, 2, fmean(m)), fmean(m, TRA = 1L) or B(m)]
head(TRA(dat, fmean(dat), "replace")) # [same as fmean(dat, TRA = "replace") or B(dat)]
head(TRA(m, fsd(m), "/")) # Simple scaling... [same as fsd(m, TRA = "/")]...
# Note: All grouped examples also apply for v and dat...
head(TRA(m, fmean(m, f), "-", f)) # Centering [same as fmean(m, f, TRA = "-") or W(m, f)]
head(TRA(m, fmean(m, f), "replace", f)) # Replacing [same fmean(m, f, TRA = "replace") or B(m, f)]
head(TRA(m, fsd(m, f), "/", f)) # Scaling [same as fsd(m, f, TRA = "/")]
head(TRA(m, fmean(m, f), "-+", f)) # Centering on the overall mean ...
# [same as fmean(m, f, TRA = "-+") or
# W(m, f, mean = "overall.mean")]
head(TRA(TRA(m, fmean(m, f), "-", f), # Also the same thing done manually !!
fmean(m), "+"))
# }
# NOT RUN {
<!-- % No code relying on suggested package -->
# grouped tibble method
library(dplyr)
iris %>% group_by(Species) %>% TRA(fmean(.))
iris %>% group_by(Species) %>% fmean(TRA = "-") # Same thing
iris %>% group_by(Species) %>% TRA(fmean(.)[c(2,4)]) # Only transforming 2 columns
iris %>% group_by(Species) %>% TRA(fmean(.)[c(2,4)], # Dropping species column
keep.group_vars = FALSE)
iris %>% fgroup_by(Species) %>% TRA(fmean(.)) # Faster collapse grouping...
# }
Run the code above in your browser using DataLab