escalc: Calculate Effect Size and Outcome Measures

Description

The function can be used to calculate various effect size or outcome measures (and the corresponding sampling variances) that are commonly used in meta-analyses.

Usage

escalc(measure, formula, ...)

## S3 method for class 'default':
escalc(measure, formula, ai, bi, ci, di, n1i, n2i, 
       x1i, x2i, t1i, t2i, m1i, m2i, sd1i, sd2i, 
       xi, mi, ri, ni, ti, data, 
       add=1/2, to="only0", vtype="LS", append=FALSE, ...)

## S3 method for class 'formula':
escalc(measure, formula, weights, data, 
       add=1/2, to="only0", vtype="LS", ...)

Arguments

measure

a character string indicating which effect size or outcome measure should be calculated. See Details for possible options and how the data should then be specified.

formula

when using the formula interface of the function (see Details below), a model formula specifying the data structure should be specified via this argument. When not using the formula interface, this argument can be ignored and the data req

weights

vector of weights to specify the group sizes or cell frequencies (only needed when using the formula interface). See Details.

vector to specify the 2x2 table frequencies (upper left cell). See Details.

vector to specify the 2x2 table frequencies (upper right cell). See Details.

vector to specify the 2x2 table frequencies (lower left cell). See Details.

vector to specify the 2x2 table frequencies (lower right cell). See Details.

n1i

vector to specify the group sizes or row totals (first group/row). See Details.

n2i

vector to specify the group sizes or row totals (second group/row). See Details.

x1i

vector to specify the number of cases (first group). See Details.

x2i

vector to specify the number of cases (second group). See Details.

t1i

vector to specify the total person-times (first group). See Details.

t2i

vector to specify the total person-times (second group). See Details.

m1i

vector to specify the means (first group). See Details.

m2i

vector to specify the means (second group). See Details.

sd1i

vector to specify the standard deviations (first group). See Details.

sd2i

vector to specify the standard deviations (second group). See Details.

vector to specify the frequencies of the event of interest. See Details.

vector to specify the frequencies of the complement of the event. See Details.

vector to specify the raw correlation coefficients. See Details.

vector to specify the sample sizes. See Details.

vector to specify the total person-times. See Details.

data

an optional data frame containing the variables given to the arguments above.

add

a non-negative number indicating the amount to add to zero cells, counts, or frequencies. See Details.

a string indicating when the values under add should be added (either "all", "only0", "if0all", or "none"). See Details.

vtype

a string indicating the type of sampling variances to calculate (either "LS" or "UB"). See Details.

append

logical indicating whether the data frame specified via the data argument (if one has been specified) should be returned together with the effect sizes and sampling variances (default is FALSE).

...

other arguments.

Value

A data frame with the following elements:
yivalue of the effect size or outcome measure.
vicorresponding (estimated) sampling variance.
If append=TRUE and a data frame was specified via the data argument, then yi and vi are append to this data frame.

Details

There are two interfaces to using the escalc function, the default and a formula interface. The two interfaces are described below. Default Interface{ The default interface works as follows. The argument measure is a character string specifying which outcome measure should be calculated (see below for the various options), arguments ai through ni are then used to specify the needed information to calculate the various measures (depending on the outcome measure, different arguments need to be supplied), and data can be used to specify a data frame containing the variables given to the previous arguments. The add and to arguments may be needed when dealing with 2x2 table data that contain cells with zeros. Finally, the vtype argument is used to specify how to calculate the sampling variance estimate (see below). Effect Size and Outcome Measures for 2x2 Table Data{ Meta-analyses in the health/medical sciences are often based on studies providing data in terms of 2x2 tables. In particular, assume that we have $k$ tables of the form: lccc{ outcome 1 outcome 2 total group 1 ai bi n1i group 2 ci di n2i } where ai, bi, ci, and di denote the cell frequencies and n1i and n2i the row totals. For example, in a set of randomized clinical trials (RCTs) or cohort studies, group 1 and group 2 may refer to the treatment (exposed) and placebo/control (not exposed) group, with outcome 1 denoting some event of interest (e.g., death) and outcome 2 its complement. In a set of case-control studies, group 1 and group 2 may refer to the group of cases and the group of controls, with outcome 1 denoting, for example, exposure to some risk factor and outcome 2 non-exposure. The 2x2 table may also be the result of cross-sectional (i.e., multinomial) sampling, so that none of the table margins (except the total sample size n1i+n2i) are fixed through the study design. Depending on the type of design (sampling method), a meta-analysis of 2x2 table data can be based on one of several different outcome measures, including the odds ratio, the relative risk (also called risk ratio), the risk difference, and the arcsine transformed risk difference (for example, for case-control, the odds ratio is the measure of choice, while for RCTs and cohort studies, all of these measures may be applicable). The phi coefficient, Yule's Q, and Yule's Y are additional measures of association for 2x2 table data (although they are not frequently used in meta-analyses). For these outcome measures, one needs to specify either ai, bi, ci, and di or alternatively ai, ci, n1i, and n2i. The options for the measure argument are then:

"RR": Thelog relative riskis equal to the log of(ai/n1i)/(ci/n2i).
"OR": Thelog odds ratiois equal to the log of(ai*di)/(bi*ci).
"RD": Therisk differenceis equal to(ai/n1i)-(ci/n2i).
"AS": The arcsine transformation is a variance stabilizing transformation for proportions. Thearcsine transformed risk differenceis equal toasin(sqrt(ai/n1i)) - asin(sqrt(ci/n2i)). See Ruecker et al. (2009) for a discussion of this and other outcome measures for 2x2 table data.
"PETO": Thelog odds ratio estimated with Peto's method(see Yusuf et al., 1985) is equal to(ai-si*n1i/ni)/((si*ti*n1i*n2i)/(ni^2*(ni-1))), wheresi=ai+ci,ti=bi+di, andni=n1i+n2i.
"PHI": Thephi coefficientis equal to(ai*di-bi*ci)/sqrt(n1i*n2i*si*ti), wheresi=ai+ciandti=bi+di.
"YUQ":Yule's Qis equal to(oi-1)/(oi+1), whereoiis the odds ratio.
"YUY":Yule's Yis equal to(sqrt(oi)-1)/(sqrt(oi)+1), whereoiis the odds ratio.

Note that the log is taken of the relative risk and the odds ratio, which makes these outcome measures symmetric around 0 and helps to make the distribution of these outcome measure closer to normal. Cell entries with a zero can be problematic, especially for the relative risk and the odds ratio. Adding a small constant to the cells of the 2x2 tables is a common solution to this problem. When to="all", the value of add is added to each cell of all 2x2 tables. When to="only0", the value of add is added to each cell of the 2x2 tables with at least one cell equal to 0. When to="if0all", the value of add is added to each cell of all 2x2 tables, but only when there is at least one 2x2 table with a zero cell. Setting to="none" or add=0 has the same effect: No adjustment to the observed table frequencies is made. Depending on the outcome measure and the data, this may lead to division by zero inside of the function (when this occurs, the resulting Inf value is recoded to NA). An example dataset corresponding to data of this type is provided in dat.bcg. } Incidence Rate Ratios and Differences{ Epidemiological studies often compare the incidence rates (i.e., the rate of occurrence of a particular outcome, e.g., a certain disease, over a particular period of time) of two different groups (e.g., exposed, not exposed). In particular, assume that we have $k$ tables of the form: lcc{ cases person-time group 1 x1i t1i group 2 x2i t2i } where x1i and x2i denote the number of cases in the first and the second group, respectively, and t1i and t2i the corresponding total person-times at risk. Commonly used effect size or outcome measures in this context are the ratio or the difference between the two incidence rates. The options for the measure argument are then:

"IRR": Thelog incidence rate ratiois equal to the log of(x1i/t1i)/(x2i/t2i).
"IRD": Theincidence rate differenceis equal to(x1i/t1i)-(x2i/t2i).
"IRSD": The square-root transformation is a variance stabilizing transformation for incidence rates. Thesquare-root transformed incidence rate differenceis equal tosqrt(x1i/t1i)-sqrt(x2i/t2i).

Note that the log is taken of the incidence rate ratio, which makes this outcome measure symmetric around 0 and helps to make its distribution closer to normal. Studies with zero cases in one or both groups can be problematic, especially for the incidence rate ratio. Adding a small constant to the number of cases is a common solution to this problem. When to="all", the value of add is added to x1i and x2i in all k studies. When to="only0", the value of add is added to x1i and x2i only in the studies that have zero cases in one or both groups. When to="if0all", the value of add is added to x1i and x2i in all k studies, but only when there is at least one study with zero cases in one or both groups. Setting to="none" or add=0 has the same effect: No adjustment to the observed number of cases is made. Depending on the outcome measure and the data, this may lead to division by zero inside of the function (when this occurs, the resulting Inf value is recoded to NA). An example dataset corresponding to data of this type is provided in dat.warfarin. } Raw and Standardized Mean Differences{ The raw mean difference and standardized mean difference are useful effect size measures when meta-analyzing a set of studies comparing two experimental groups (e.g., treatment and control groups) or two naturally occurring groups (e.g., men and women) with respect to some quantitative (and ideally normally distributed) dependent variable. For these outcome measures, m1i and m2i are used to specify the means of the two groups, sd1i and sd2i the standard deviations of the scores in the two groups, and n1i and n2i the sample sizes of the two groups. The options for the measure argument are then:

"MD": Theraw mean differenceis equal tom1i-m2i.
"SMD": Thestandardized mean differenceis equal to(m1i-m2i)/spi, wherespiis the pooled standard deviation of the two groups (which is calculated inside of the function based onsd1iandsd2i). The standardized mean difference is automatically corrected for its slight positive bias within the function (see Hedges & Olkin, 1985). Whenvtype="LS", the sampling variances are calculated based on the large sample approximation. Alternatively, the unbiased estimates of the sampling variances can be obtained withvtype="UB".

An example dataset corresponding to data of this type is provided in dat.los. } Raw and Transformed Correlation Coefficients{ Another frequently used outcome measure in meta-analyses is the correlation coefficient, which is used to measure the strength of the (linear) relationship between two quantitative variables. Here, one needs to specify ri, the vector with the raw correlation coefficients, and ni, the corresponding sample sizes. The options for the measure argument are then:

"COR": Theraw correlation coefficientis simply equal torias supplied to the function. Whenvtype="LS", the sampling variances are calculated based on the large sample approximation. Alternatively, an approximation to the unbiased estimates of the sampling variances can be obtained withvtype="UB"(see Hedges, 1989).
"UCOR": Theunbiased estimate of the correlation coefficientis obtained by correcting the raw correlation coefficient for its slight negative bias (based on equation 2.7 in Olkin & Pratt, 1958). Again,vtype="LS"andvtype="UB"can be used to choose between the large sample approximation or approximately unbiased estimates of the sampling variances.
"ZCOR": Fisher's r-to-z transformation is a variance stabilizing transformation for correlation coefficients with the added benefit of also being a rather effective normalizing transformation (Fisher, 1921). TheFisher's r-to-z transformed correlation coefficientis equal to1/2*log((1+ri)/(1-ri)).

An example dataset corresponding to data of this type is provided in dat.empint. } Proportions and Transformations Thereof{ When the studies provide data for single groups with respect to a dichotomous dependent variable, then the raw proportion, the logit transformed proportion, the arcsine transformed proportion, and the Freeman-Tukey (double arcsine) transformed proportion are useful outcome measures (the log transformed proportion is also a possibility, but not frequently used in meta-analyses). Here, one needs to specify xi and ni, denoting the number of individuals experiencing the event of interest and the total number of individuals, respectively. Instead of specifying ni, one can use mi to specify the number of individuals that do not experience the event of interest. The options for the measure argument are then:

"PR": Theraw proportionis equal toxi/ni.
"PLN": Thelog transformed proportionis equal to the log ofxi/ni.
"PLO": Thelogit transformed proportionis equal to the log ofxi/(ni-xi)(i.e., the log of the odds).
"PAS": The arcsine transformation is a variance stabilizing transformation for proportions. Thearcsine transformed proportionis equal toasin(sqrt(xi/ni)).
"PFT": Another variance stabilizing transformation for proportions was suggested by Freeman & Tukey (1950). TheFreeman-Tukey double arcsine transformed proportionis equal to1/2*(asin(sqrt(xi/(ni+1))) + asin(sqrt((xi+1)/(ni+1)))).

Zero cell entries can be problematic for certain outcome measures. When to="all", the value of add is added to xi and mi in all $k$ studies. When to="only0", the value of add is added only for studies where xi or mi is equal to 0. When to="if0all", the value of add is added in all $k$ studies, but only when there is at least one study with a zero value for xi or mi. Setting to="none" or add=0 again means that no adjustment to the observed values is made. } Incidence Rates and Transformations Thereof{ Instead of proportions, we may also be interested in aggregating individual incidence rates. Here, one needs to specify xi and ti, denoting the number of individuals experiencing the event of interest and the total person-time at risk, respectively. The options for the measure argument are then:

"IR": Theraw incidence rateis equal toxi/ti.
"IRLN": Thelog transformed incidence rateis equal to the log ofxi/ti.
"IRS": The square-root transformation is a variance stabilizing transformation for incidence rates. Thesquare-root transformed incidence rateis equal tosqrt(xi/ti).
"IRFT": Another variance stabilizing transformation for incidence rates can be based on Freeman & Tukey (1950). TheFreeman-Tukey transformed incidence rateis equal tosqrt(xi/ti) + sqrt(xi/ti+1/ti).

Studies with zero cases can be problematic, especially for the log transformed incidence rate. Adding a small constant to the number of cases is a common solution to this problem. When to="all", the value of add is added to xi in all k studies. When to="only0", the value of add is added to xi only in the studies that have zero cases. When to="if0all", the value of add is added to xi in all k studies, but only when there is at least one study with zero cases. Setting to="none" or add=0 has the same effect: No adjustment to the observed number of cases is made. Depending on the outcome measure and the data, this may lead to division by zero inside of the function (when this occurs, the resulting Inf value is recoded to NA). } } Formula Interface{ The formula interface works as follows. As above, the argument measure is a character string specifying which outcome measure should be calculated. The formula argument is then used to specify the data structure as a multipart formula. The data argument can be used to specify a data frame containing the variables in the formula. The add, to, and vtype arguments work as described above. Effect Size and Outcome Measures for 2x2 Table Data{ For 2x2 table data, the formula argument takes the form outcome ~ group | study, where group is a two-level factor specifying the rows of the tables, outcome is a two-level factor specifying the columns of the tables (the two possible outcomes), and study is a factor specifying the study factor. The weights argument is used to specify the frequencies in the various cells. } Incidence Rate Ratios and Differences{ For these outcome measures, the formula argument takes the form cases/times ~ group | study, where group is a two-level factor specifying the group factor and study is a factor specifying the study factor. The left-hand side of the formula is composed of two parts, with the first variable for the number of cases and the second variable for the person-time at risk. } Raw and Standardized Mean Differences{ For these outcome measures, the formula argument takes the form means/sds ~ group | study, where group is a two-level factor specifying the group factor and study is a factor specifying the study factor. The left-hand side of the formula is composed of two parts, with the first variable for the means and the second variable for the standard deviations. The weights argument is used to specify the sample sizes in the groups. } Raw and Transformed Correlation Coefficients{ For these outcome measures, the formula argument takes the form outcome ~ 1 | study, where outcome is used to specify the observed correlations and study is a factor specifying the study factor. The weights argument is used to specify the sample sizes. } Proportions and Transformations Thereof{ For these outcome measures, the formula argument takes the form outcome ~ 1 | study, where outcome is a two-level factor specifying the columns of the tables (the two possible outcomes) and study is a factor specifying the study factor. The weights argument is used to specify the frequencies in the various cells. } Incidence Rates and Transformations Thereof{ For these outcome measures, the formula argument takes the form cases/times ~ 1 | study, where study is a factor specifying the study factor. The left-hand side of the formula is composed of two parts, with the first variable for the number of cases and the second variable for the person-time at risk. } }

References

Cooper, H. C., Hedges, L. V., & Valentine, J. C. (Eds.) (2009). The handbook of research synthesis and meta-analysis (2nd ed.). New York: Russell Sage Foundation. Fisher, R. A. (1921). On the probable error of a coefficient of correlation deduced from a small sample. Metron, 1, 1--32. Freeman, M. F. & Tukey, J. W. (1950). Transformations related to the angular and the square root. Annals of Mathematical Statistics, 21, 607--611. Hedges, L. V. (1989). An unbiased correction for sampling error in validity generalization studies. Journal of Applied Psychology, 74, 469--477. Hedges, L. V. & Olkin, I. (1985). Statistical methods for meta-analysis. San Diego, CA: Academic Press. Ruecker, G., Schwarzer, G., Carpenter, J., & Olkin, I. (2009). Why add anything to nothing? The arcsine difference as a measure of treatment effect in meta-analysis with zero cells. Statistics in Medicine, 28, 721--738. Olkin, I. & Pratt, J. W. (1958). Unbiased estimation of certain correlation coefficients. Annals of Mathematical Statistics, 29, 201--211. Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1--48. http://www.jstatsoft.org/v36/i03/. Yusuf, S., Peto, R., Lewis, J., Collins, R., & Sleight, P. (1985). Beta blockade during and after myocardial infarction: An overview of the randomized trials. Progress in Cardiovascular Disease, 27, 335--371.

Examples

Run this code

### load BCG vaccine data
data(dat.bcg)

### calculate log relative risks and corresponding sampling variances
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, 
              data=dat.bcg, append=TRUE)
dat

### using formula interface (first rearrange data into required format)
k <- length(dat.bcg$trial)
dat.fm      <- data.frame(study=factor(rep(1:k, each=4)))
dat.fm$grp  <- factor(rep(c("T","T","C","C"), k), levels=c("T","C"))
dat.fm$out  <- factor(rep(c("+","-","+","-"), k), levels=c("+","-"))
dat.fm$freq <- with(dat.bcg, c(rbind(tpos, tneg, cpos, cneg)))
dat.fm
escalc(out ~ grp | study, weights=freq, data=dat.fm, measure="RR")

Run the code above in your browser using DataLab