Last chance! 50% off unlimited learning
Sale ends in
tt
, tt.brief
Provides enhanced output from the standard t.test
function applied to the analysis of the mean of a single variable or the independent groups analysis of the mean difference, from either data or summary statistics. The data can be in the form of a data frame or two separate vectors of data, one for each group. This output includes the basic descriptive statistics, analysis of assumptions and the hypothesis test and confidence interval. For two groups the output also includes the analysis for both with and without the assumption of homogeneous variances, the pooled or within-group standard deviation, and the standardized mean difference or Cohen's d and its confidence interval. The output from data for two groups introduces the ODDSMD plot, which displays the Overlapping Density Distributions of the two groups as well as the means, mean difference and Standardized Mean Difference. The plot also includes the results of the descriptive and inferential analyses.
Can also be called from the more general model
function.
ttest(x=NULL, y=NULL, data=mydata, paired=FALSE, n=NULL, m=NULL, s=NULL, mu0=NULL,
n1=NULL, n2=NULL, m1=NULL, m2=NULL, s1=NULL, s2=NULL,
Ynm="Y", Xnm="X", X1nm="Group1", X2nm="Group2",
brief=getOption("brief"), digits.d=NULL, conf.level=0.95,
alternative=c("two.sided", "less", "greater"),
mmd=NULL, msmd=NULL,
show.title=TRUE, bw1="nrd", bw2="nrd", graph=TRUE,
pdf.file=NULL, pdf.width=5, pdf.height=5, ...)
tt.brief(..., brief=TRUE)
tt(...)
formula
of the form Y ~ X, where Y is the
numeric response variable compared across the two groups, and X is a
grouping variable with two levels that define the corresponding groups,
x
is not a formula, the responses for the second group,
otherwise NULL
.mydata
.TRUE
for a dependent-samples t-test with two data vectors
or variables from a data frame.TRUE
, reduced text output. Can change system default with set
function."two.sided"
. Other values are "less"
and "greater"
.TRUE
, then display the graph of the overlapping density distributions.n
or n1
are set to numeric values, then the analysis proceeds from the summary statistics, the sample size and mean and standard deviation of each group. Missing data are counted and then removed for further analysis of the non-missing data values. Otherwise the analysis proceeds from data, which can be in a data frame, by default named mydata
, with a grouping variable and response variable, or in two data vectors, one for each group.Following the format and syntax of the standard t.test
function, to specfiy the two-group test with a formula, formula
, the data must include a variable that has exactly two values, a grouping variable or factor generically referred to as X, and a numerical response variable, generically referred to as Y. The formula is of the form Y ~ X, with the names Y and X replaced by the actual variable names specific to a particular analysis. The formula method automatically retrieves the names of the variables and data values for display on the resulting output.
The values of the response variable Y can be organized into two vectors, the values of Y for each group in its corresponding vector. The vectors must be defined in the user workspace as they are generally of unequal length and so generally not conformable to a data frame. When submitting data in this form, the output is enhanced if the actual names of the variables referred to generically as X and Y, as well as the names of the levels of the factor X, are explicitly provided.
For the output, when computed from the data the two groups are automatically arranged so that the group with the larger mean is listed as the first group. The result is that the resulting mean difference, as well as the standardized mean difference, is always non-negative.
The inferential analysis in the full version provides both homogeneity of variance and the Welch test which does not assume homogeneity of variance. Only a two-sided test is provided. The null hypothesis is a population mean difference of 0.
If computed from the data, the bandwidth parameter controls the smoothness of the estimated density curve. To obtain a smoother curve, increase the bandwidth from the default value.
The confidence interval of the standardized mean difference is computed by the ci.smd
function, written by Ken Kelley, from the MBESS
package.
DATA
If the input data frame is named something different than mydata
, then specify the name with the data
option. Regardless of its name, the data frame need not be attached to reference the variable directly by its name without having to invoke the mydata$name notation.
PRACTICAL IMPORTANCE The practical importance of the size of the mean difference is addressed when one of two parameter values are supplied, the minimum mean difference of practical importance, mmd, or the corresponding standardized version, msmd. The remaining value is calculated and both values are added to the graph and the console output.
DECIMAL DIGITS
The number of decimal digits is determined by default from the largest number of decimal digits of the entered descriptive statistics. The number of decimal digits is then set at that value, plus one more with a minimum of two decimal digits by default. Or, override the default with the digits.d
parameter.
VARIABLE LABELS
If variable labels exist, then the corresponding variable label is by default listed as the label for the horizontal axis and on the text output. For more information, see Read
.
PDF OUTPUT
Because of the customized graphic windowing system that maintains a unique graphic window for the Help function, the standard graphic output functions such as pdf
do not work with the lessR
graphics functions. Instead, to obtain pdf output, use the pdf.file
option, perhaps with the optional pdf.width
and pdf.height
options. These files are written to the default working directory, which can be explicitly specified with the R setwd
function.
t.test
, density
, plot.density
, ttestPower
, formula
.# ----------------------------------------------------------
# tt for two groups, from a formula
# ----------------------------------------------------------
# create simulated data, no population mean difference
# X has two values only, Y is numeric
# put into a data frame, required for formula version
n <- 12
X <- sample(c("Group1","Group2"), size=n, replace=TRUE)
Y <- round(rnorm(n=n, mean=50, sd=10),2)
mydata <- data.frame(X,Y)
rm(X); rm(Y)
# analyze data with formula version
# variable names and levels of X are automatically obtained from data
# although data frame not attached, reference variable names directly
ttest(Y ~ X)
# short form
tt(Y ~ X)
# brief version of results
tt.brief(Y ~ X)
# separate the data values for the two groups and analyze separately
t.out <- ttest(Y ~ X)
Histogram(group1, data=t.out)
Histogram(group2, data=t.out)
# Compare to standard R function t.test
t.test(mydata$Y ~ mydata$X, var.equal=TRUE)
# consider the practical importance of the difference
ttest(Y ~ X, msmd=.5)
# variable of interest is in a data frame which is not the default mydata
# access the data frame in the lessR dat.twogroup data set
# although data not attached, access the variables directly by their name
data(dataLearn)
ttest(Score ~ StudyType, data=dataLearn)
# ----------------------------------------------------------
# tt for a single group, from data
# ----------------------------------------------------------
# confidence interval only, from data
ttest(Y)
# confidence interval and hypothesis test, from data
ttest(Y, mu0=52)
# -------------------------------------------------------
# tt for two groups from data stored in two vectors
# -------------------------------------------------------
# create two separate vectors of response variable Y
# the vectors exist are not in a data frame
# their lengths need not be equal
Y1 <- round(rnorm(n=10, mean=50, sd=10),2)
Y2 <- round(rnorm(n=10, mean=60, sd=10),2)
# analyze the two vectors directly
# usually explicitly specify variable names and levels of X
# to enhance the readability of the output
ttest(Y1, Y2, Ynm="MyY", Xnm="MyX", X1nm="Group1", X2nm="Group2")
# dependent t-test from vectors in global environment
ttest(Y1, Y2, paired=TRUE)
# dependent t-test from variables in data frame mydata
mydata <- data.frame(Y1,Y2)
rm(Y1); rm(Y2)
ttest(Y1, Y2, paired=TRUE)
# -------------------------------------------------------
# tt from summary statistics
# -------------------------------------------------------
# one group: sample size, mean and sd
# optional variable name added
tt(n=34, m=8.92, s=1.67, Ynm="Time")
# confidence interval and hypothesis test, from descriptive stats
tt(n=34, m=8.92, s=1.67, mu0=9, conf.level=0.90)
# two groups: sample size, mean and sd for each group
# specify the briefer form of the output
tt.brief(n1=19, m1=9.57, s1=1.45, n2=15, m2=8.09, s2=1.59)
Run the code above in your browser using DataLab