Finds the numeric variables, and ignores the others. (See
summarizeFactors
for a function that handles non-numeric
variables). It will provide quantiles (specified probs
as
well as other summary statistics, specified stats
. Results
are returned in a data frame. The main benefits from this compared
to R's default summary are 1) more summary information is returned
for each variable (dispersion), 2) the results are returned in a
form that is easy to use in further analysis, 3) the variables in
the output may be alphabetized.
summarizeNumerics(
dat,
alphaSort = FALSE,
probs = c(0, 0.5, 1),
stats = c("mean", "sd", "skewness", "kurtosis", "nobs", "nmiss"),
na.rm = TRUE,
unbiased = TRUE,
digits = 2
)
a data.frame with one column per summary element (rows are the variables).
a data frame or a matrix
If TRUE, the columns are re-organized in alphabetical order. If FALSE, they are presented in the original order.
Controls calculation of quantiles (see the R
quantile
function's probs
argument). If FALSE or
NULL, no quantile estimates are provided. Default is
c("min" = 0, "med" = 0.5, "max" = 1.0)
, which will
appear in output as c("min", "med", "max")
. Other
values between 0 and 1 are allowed. For example, c(0.3, 0.7)
will appear in output as pctile_30%
and
pctile_70%
.
A vector including any of these: c("min",
"med", "max", "mean", "sd", "var", "skewness", "kurtosis",
"nobs", "nmiss")
. Default includes all except var
.
"nobs" means number of observations with non-missing, finite
scores (not NA, NaN, -Inf, or Inf). "nmiss" is the number of
cases with values of NA. If FALSE or NULL, provide none of
these.
default TRUE. Should missing data be removed to calculate summaries?
If TRUE (default), skewness and kurtosis are calculated with biased corrected (N-1) divisor in the standard deviation.
Number of digits reported after decimal point. Default is 2
Paul E. Johnson pauljohn@ku.edu
summarize and summarizeFactors