
Last chance! 50% off unlimited learning
Sale ends in
gscale
standardizes variables by dividing them by 2 standard
deviations and mean-centering them by default. It contains options for
handling binary variables separately. gscale()
is a fork of
rescale
from the arm
package---the key feature
difference is that gscale()
will perform the same functions for
variables in svydesign
objects. gscale()
is
also more user-friendly in that it is more flexible in how it accepts input.
gscale(
data = NULL,
vars = NULL,
binary.inputs = "center",
binary.factors = FALSE,
n.sd = 2,
center.only = FALSE,
scale.only = FALSE,
weights = NULL,
apply.weighted.contrasts = getOption("jtools-weighted.contrasts", FALSE),
x = NULL,
messages = FALSE
)
A data frame or survey design. Only needed if you would like to
rescale multiple variables at once. If x = NULL
, all columns will
be rescaled. Otherwise, x
should be a vector of variable names. If
x
is a numeric vector, this argument is ignored.
If data
is a data.frame or similar, you can scale only
select columns by providing a vector column names to this argument.
Options for binary variables. Default is center
;
0/1
keeps original scale; -0.5/0.5
rescales 0 as -0.5 and 1
as 0.5; center
subtracts the mean; and full
subtracts the
mean and divides by 2 sd.
Coerce two-level factors to numeric and apply scaling functions to them? Default is FALSE.
By how many standard deviations should the variables be divided
by? Default for gscale
is 2, like arm
's rescale
.
1 is the more typical standardization scheme.
A logical value indicating whether you would like to mean -center the values, but not scale them.
A logical value indicating whether you would like to scale the values, but not mean-center them.
A vector of weights equal in length to x
. If iterating
over a data frame, the weights will need to be equal in length to all the
columns to avoid errors. You may need to remove missing values before using
the weights.
Factor variables cannot be scaled, but you
can set the contrasts such that the intercept in a regression model will
reflect the true mean (assuming all other variables are centered). If set
to TRUE, the argument will apply weighted effects coding to all factors.
This is similar to the R default effects coding, but weights according to
how many observations are at each level. An adapted version of
contr.wec()
from the wec
package is used to do this. See
that package's documentation and/or Grotenhuis et al. (2016) for more
info.
Deprecated. Pass numeric vectors to data
. Pass vectors of column
names to vars
.
Print messages when variables are not processed due to being non-numeric or all missing? Default is FALSE.
Jacob Long jacob.long@sc.edu
This function is adapted from the rescale
function of
the arm
package. It is named gscale()
after the
popularizer of this scaling method, Andrew Gelman. By default, it
works just like rescale
. But it contains many additional options and
can also accept multiple types of input without breaking a sweat.
Only numeric variables are altered when in a data.frame or survey design. Character variables, factors, etc. are skipped.
For those dealing with survey data, if you provide a survey.design
object you can rest assured that the mean-centering and scaling is performed
with help from the svymean()
and
svyvar()
functions, respectively. It was among the
primary motivations for creating this function. gscale()
will not
center or scale the weights variables defined in the survey design unless
the user specifically requests them in the x =
argument.
Gelman, A. (2008). Scaling regression inputs by dividing by two standard deviations. Statistics in Medicine, 27, 2865–2873. http://www.stat.columbia.edu/~gelman/research/published/standardizing7.pdf
Grotenhuis, M. te, Pelzer, B., Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2017). When size matters: Advantages of weighted effect coding in observational studies. International Journal of Public Health, 62, 163–167. https://doi.org/10.1007/s00038-016-0901-1 ( open access)
j_summ
is a replacement for the summary
function for
regression models. On request, it will center and/or standardize variables
before printing its output.
standardization, scaling, and centering tools
center()
,
center_mod()
,
scale_mod()
,
standardize()
x <- rnorm(10, 2, 1)
x2 <- rbinom(10, 1, .5)
# Basic use
gscale(x)
# Normal standardization
gscale(x, n.sd = 1)
# Scale only
gscale(x, scale.only = TRUE)
# Center only
gscale(x, center.only = TRUE)
# Binary inputs
gscale(x2, binary.inputs = "0/1")
gscale(x2, binary.inputs = "full") # treats it like a continous var
gscale(x2, binary.inputs = "-0.5/0.5") # keep scale, center at zero
gscale(x2, binary.inputs = "center") # mean center it
# Data frame as input
# loops through each numeric column
gscale(data = mtcars, binary.inputs = "-0.5/0.5")
# Specified vars in data frame
gscale(mtcars, vars = c("hp", "wt", "vs"), binary.inputs = "center")
# Weighted inputs
wts <- runif(10, 0, 1)
gscale(x, weights = wts)
# If using a weights column of data frame, give its name
mtcars$weights <- runif(32, 0, 1)
gscale(mtcars, weights = weights) # will skip over mtcars$weights
# If using a weights column of data frame, can still select variables
gscale(mtcars, vars = c("hp", "wt", "vs"), weights = weights)
# Survey designs
if (requireNamespace("survey")) {
library(survey)
data(api)
## Create survey design object
dstrat <- svydesign(id = ~1, strata = ~stype, weights = ~pw,
data = apistrat, fpc=~fpc)
# Creating test binary variable
dstrat$variables$binary <- rbinom(200, 1, 0.5)
gscale(data = dstrat, binary.inputs = "-0.5/0.5")
gscale(data = dstrat, vars = c("api00","meals","binary"),
binary.inputs = "-0.5/0.5")
}
Run the code above in your browser using DataLab