gscale: Scale and/or center data, including survey designs

Description

gscale standardizes variables by dividing them by 2 standard deviations and mean-centering them by default. It contains options for handling binary variables separately. gscale() is a fork of rescale from the arm package---the key feature difference is that gscale() will perform the same functions for variables in svydesign objects. gscale() is also more user-friendly in that it is more flexible in how it accepts input.

Usage

gscale(
  data = NULL,
  vars = NULL,
  binary.inputs = "center",
  binary.factors = FALSE,
  n.sd = 2,
  center.only = FALSE,
  scale.only = FALSE,
  weights = NULL,
  apply.weighted.contrasts = getOption("jtools-weighted.contrasts", FALSE),
  x = NULL,
  messages = FALSE
)

Arguments

data: A data frame or survey design. Only needed if you would like to rescale multiple variables at once. If x = NULL, all columns will be rescaled. Otherwise, x should be a vector of variable names. If x is a numeric vector, this argument is ignored.
vars: If data is a data.frame or similar, you can scale only select columns by providing a vector column names to this argument.
binary.inputs: Options for binary variables. Default is center; 0/1 keeps original scale; -0.5/0.5 rescales 0 as -0.5 and 1 as 0.5; center subtracts the mean; and full subtracts the mean and divides by 2 sd.
binary.factors: Coerce two-level factors to numeric and apply scaling functions to them? Default is FALSE.
n.sd: By how many standard deviations should the variables be divided by? Default for gscale is 2, like arm's rescale. 1 is the more typical standardization scheme.
center.only: A logical value indicating whether you would like to mean -center the values, but not scale them.
scale.only: A logical value indicating whether you would like to scale the values, but not mean-center them.
weights: A vector of weights equal in length to x. If iterating over a data frame, the weights will need to be equal in length to all the columns to avoid errors. You may need to remove missing values before using the weights.
apply.weighted.contrasts: Factor variables cannot be scaled, but you can set the contrasts such that the intercept in a regression model will reflect the true mean (assuming all other variables are centered). If set to TRUE, the argument will apply weighted effects coding to all factors. This is similar to the R default effects coding, but weights according to how many observations are at each level. An adapted version of contr.wec() from the wec package is used to do this. See that package's documentation and/or Grotenhuis et al. (2016) for more info.
x: Deprecated. Pass numeric vectors to data. Pass vectors of column names to vars.
messages: Print messages when variables are not processed due to being non-numeric or all missing? Default is FALSE.

Author

Jacob Long jacob.long@sc.edu

Details

This function is adapted from the rescale function of the arm package. It is named gscale() after the popularizer of this scaling method, Andrew Gelman. By default, it works just like rescale. But it contains many additional options and can also accept multiple types of input without breaking a sweat.

Only numeric variables are altered when in a data.frame or survey design. Character variables, factors, etc. are skipped.

For those dealing with survey data, if you provide a survey.design object you can rest assured that the mean-centering and scaling is performed with help from the svymean() and svyvar() functions, respectively. It was among the primary motivations for creating this function. gscale() will not center or scale the weights variables defined in the survey design unless the user specifically requests them in the x = argument.

References

Gelman, A. (2008). Scaling regression inputs by dividing by two standard deviations. Statistics in Medicine, 27, 2865–2873. http://www.stat.columbia.edu/~gelman/research/published/standardizing7.pdf

Grotenhuis, M. te, Pelzer, B., Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2017). When size matters: Advantages of weighted effect coding in observational studies. International Journal of Public Health, 62, 163–167. https://doi.org/10.1007/s00038-016-0901-1 ( open access)

Examples

Run this code


x <- rnorm(10, 2, 1)
x2 <- rbinom(10, 1, .5)

# Basic use
gscale(x)
# Normal standardization
gscale(x, n.sd = 1)
# Scale only
gscale(x, scale.only = TRUE)
# Center only
gscale(x, center.only = TRUE)
# Binary inputs
gscale(x2, binary.inputs = "0/1")
gscale(x2, binary.inputs = "full") # treats it like a continous var
gscale(x2, binary.inputs = "-0.5/0.5") # keep scale, center at zero
gscale(x2, binary.inputs = "center") # mean center it

# Data frame as input
# loops through each numeric column
gscale(data = mtcars, binary.inputs = "-0.5/0.5")

# Specified vars in data frame
gscale(mtcars, vars = c("hp", "wt", "vs"), binary.inputs = "center")

# Weighted inputs

wts <- runif(10, 0, 1)
gscale(x, weights = wts)
# If using a weights column of data frame, give its name
mtcars$weights <- runif(32, 0, 1)
gscale(mtcars, weights = weights) # will skip over mtcars$weights
# If using a weights column of data frame, can still select variables
gscale(mtcars, vars = c("hp", "wt", "vs"), weights = weights)

# Survey designs
if (requireNamespace("survey")) {
  library(survey)
  data(api)
  ## Create survey design object
  dstrat <- svydesign(id = ~1, strata = ~stype, weights = ~pw,
                       data = apistrat, fpc=~fpc)
  # Creating test binary variable
  dstrat$variables$binary <- rbinom(200, 1, 0.5)

  gscale(data = dstrat, binary.inputs = "-0.5/0.5")
  gscale(data = dstrat, vars = c("api00","meals","binary"),
         binary.inputs = "-0.5/0.5")
}

Run the code above in your browser using DataLab