compareGroups: Descriptives by groups

Description

This function performs descriptives by groups for several variables. Depending on the nature of these variables, different descriptive statistics are calculated (mean, median, frequencies) and different tests are computed as appropriate (t-test, ANOVA, Kruskall-Wallis, Fisher,...).

Usage

compareGroups(X, ...)
## S3 method for class 'default':
compareGroups(X, y, selec = NA, method = 1, alpha = 0.05, min.dis = 
          5, max.ylev = 5, max.xlev=10, include.label=TRUE, ...)
## S3 method for class 'formula':
compareGroups(X, data, subset, na.action= NULL, include.label=TRUE, ...)
## S3 method for class 'compareGroups':
plot(x, z=1.5 ,n.breaks="Sturges", ...)

Arguments

either a data.frame or a matrix (then method 'compareGroups.default' is called), or a formula (then method 'compareGroups.formula' is called). When 'X' is a formula, it must be an object of class "formula" (or one that can be coerced to that class). Right

a vector variable that distinguishes the groups. It must be either a numeric, character or factor.

selec

a character vector with as many components as row-variables. If it has length 1 it is recycled for all row-variables. Every component of 'selec' is an expression character that will be evaluated to select the individuals to be analysed for every row-varia

method

an integer vector with as many components as row-variables. If it has length 1 it is recycled for all row-variables. It only applies for continuous row-variables (for factor row-variables it is ignored). Possible values are: 1 - forces to be analysed as n

alpha

double between 0 and 1. Significance threshold for the shapiro.test normality test for continuous row-variables. Default value is 0.05.

min.dis

an integer. If a non-factor row-variable contains less than 'min.dis' different values and 'method' argument is set to NA, then it will be converted to a factor. Default value is 5.

max.ylev

an integer indicating the maximum number of levels of grouping variable ('y'). If 'y' contains more than 'max.ylev' levels, then the function 'compareGroups' produces an error. Default value is 5.

max.xlev

an integer indicating the maximum number of levels when the row-variable is a factor. If the row-variable is a factor (or converted to a factor if it is a character, for example) and contains more than 'max.xlev' levels, then it is removed from the analys

data

an optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If they are not found in 'data', the variables are taken from 'environment(formula)'.

subset

an optional vector specifying a subset of individuals to be used in the computation process. It is applied to all row-variables. 'subset' and 'selec' are added in the sense of '&' to be applied in every row-variable.

na.action

a function which indicates what should happen when the data contains NAs. The default is NULL, and that is equivalent to na.pass, which means no action. Value

include.label

logical, indicating whether or not variable labels have to be shown in the results. Default value is TRUE

an object of class 'compareGroups'

double. threshold limits to be placed in the deviation from normality plot. It is considered that too many points beyond this threshold indicates that current variable is far to be normal-distributed. Default value is 1.5.

n.breaks

same as argument 'breaks' of hist

...

further arguments passed to 'compareGroups.default' or other methods

Value

An object of class 'compareGroups'.
'print' returns a table sample size, overall p-values, type of variable ('categorical', 'normal' or 'non-normal') and the subset of individuals selected.
'summary' returns a much more detailed list. Every component of the list is the result for each row-variable, showing frequencies, mean, standard deviations or quartiles as appropriate. Also, it shows overall p-values as well as p-trends and pairwise p-values among the groups. 'plot' displays multiple devices with normality plots and Shapiro-Wilks test, for each of the continuous row-variables. If row-variable has less than 5 different values, it plots nothing. An update method for 'compareGroups' objects has been implemented and works as usual to change all the arguments of previous analysis.
See examples for further illustration about all previous issues.

Details

Depending whether the row-variable is considered as continuous normal-distributed (1), continuous non-normal distributed (2) or categorical (3), the following descriptives and tests are performed: 1- mean, standard deviation and t-test or ANOVA 2- median, 1st and 3rd quartiles, and Kruskall-Wallis test 3- or absolute and relative frequencies and chi-squared or exact Fisher test when the expected frequencies is less than 5 in some cell When there are more than 2 groups, it also performs pairwise comparisons adjusting for multiple testing (Tukey when row-variable is normal-distributed and Benjamini & Hochberg method otherwise), and computes p-value for trend. The p-value for trend is computed from the Pearson test when row-variable is normal and from the Spearman test when it is continuous non normal. If the row-variable is categorical, the p-value for trend is computed as 1-pchisq(cor(as.integer(x),as.integer(y))^2*(length(x)-1),1), where 'x' is the row-variable and 'y' is the grouping variable.

See the vignette to see more detailed examples illustrating the use of this function and their methods.

Examples

Run this code

data(myData)

# by formula
ans<-compareGroups(y~.,data=myData)
ans
summary(ans)
update(ans,y~.-a)

# by data.frame
X<-myData[,c("a","b","c")]
y<-myData[,"y"]
ans<-compareGroups(X,y)
ans
summary(ans)