In many meta-analyses, multiple effect sizes or outcomes can be extracted from the same study. Ideally, such structures should be analyzed using an appropriate multilevel/multivariate model as can be fitted with the rma.mv
function. However, there may occasionally be reasons for aggregating multiple effect sizes or outcomes belonging to the same study (or to the same level of some other clustering variable) into a single combined effect size or outcome. The present function can be used for this purpose.
The input must be an object of class "escalc"
. The error ‘Error in match.fun(FUN): argument "FUN" is missing, with no default
’ indicates that a regular data frame was passed to the function, but this does not work. One can turn a regular data frame (containing the effect sizes or outcomes and the corresponding sampling variances) into an "escalc"
object with the escalc
function. See the ‘Examples’ below for an illustration of this.
The cluster
variable is used to specify which estimates/outcomes belong to the same study/cluster.
In the simplest case, the estimates/outcomes within clusters (or, to be precise, their sampling errors) are assumed to be independent. This is usually a safe assumption as long as each study participant (or whatever the study units are) only contributes data to a single estimate/outcome. For example, if a study provides effect size estimates for male and female subjects separately, then the sampling errors can usually be assumed to be independent. In this case, one can set struct="ID"
and multiple estimates/outcomes within the same cluster are combined using standard inverse-variance weighting (i.e., using weighted least squares) under the assumption of independence.
In other cases, the estimates/outcomes within clusters cannot be assumed to be independent. For example, if multiple effect size estimates are computed for the same group of subjects (e.g., based on different scales to measure some construct of interest), then the estimates are likely to be correlated. If the actual correlation between the estimates is unknown, one can often still make an educated guess and set argument rho
to this value, which is then assumed to be the same for all pairs of estimates within clusters when struct="CS"
(for a compound symmetric structure). Multiple estimates/outcomes within the same cluster are then combined using inverse-variance weighting taking their correlation into consideration (i.e., using generalized least squares). One can also specify a different value of rho
for each cluster by passing a vector (of the same length as the number of clusters) to this argument.
If multiple effect size estimates are computed for the same group of subjects at different time points, then it may be more sensible to assume that the correlation between estimates decreases as a function of the distance between the time points. If so, one can specify struct="CAR"
(for a continuous-time autoregressive structure), set phi
to the autocorrelation (for two estimates one time-unit apart), and use argument time
to specify the actual time points corresponding to the estimates. The correlation between two estimates, y_ity_it and y_it'y_it', in the ithith cluster, with time points time_ittime_it and time_it'time_it', is then given by ^|time_it - time_it'|^|time_it - time_it'|. One can also specify a different value of phi
for each cluster by passing a vector (of the same length as the number of clusters) to this argument.
One can also combine the compound symmetric and autoregressive structures if there are multiple time points and multiple observed effect sizes or outcomes at these time points. One option is struct="CS+CAR"
. In this case, one must specify the time
argument and both rho
and phi
. The correlation between two estimates, y_ity_it and y_it'y_it', in the ithith cluster, with time points time_ittime_it and time_it'time_it', is then given by + (1 - ) ^|time_it - time_it'| + (1 - ) * ^|time_it - time_it'|.
Alternatively, one can specify struct="CS*CAR"
. In this case, one must specify both the time
and obs
arguments and both rho
and phi
. The correlation between two estimates, y_ijty_ijt and y_ijt'y_ijt', with the same value for obs
but different values for time
, is then given by ^|time_ijt - time_ijt'|^|time_ijt - time_ijt'|, the correlation between two estimates, y_ijty_ijt and y_ij'ty_ij't, with different values for obs
but the same value for time
, is then given by , and the correlation between two estimates, y_ijty_ijt and y_ij't'y_ij't, with different values for obs
and different values for time
, is then given by ^|time_ijt - time_ijt'| * ^|time_ijt - time_ijt'|.
Finally, if one actually knows the correlation (and hence the covariance) between each pair of estimates (or has an approximation thereof), one can also specify the entire variance-covariance matrix of the estimates (or more precisely, their sampling errors) via the V
argument (in this case, arguments struct
, time
, obs
, rho
, and phi
are ignored). Note that the vcalc
function can be used to construct such a V
matrix and provides even more flexibility for specifying various types of dependencies. See the ‘Examples’ below for an illustration of this.
Instead of using inverse-variance weighting (i.e., weighted/generalized least squares) to combine the estimates within clusters, one can set weighted=FALSE
in which case the estimates are averaged within clusters without any weighting (although the correlations between estimates as specified are still taken into consideration).
Other variables (besides the estimates) will also be aggregated to the cluster level. By default, numeric/integer type variables are averaged, logicals are also averaged (yielding the proportion of TRUE
values), and for all other types of variables (e.g., character variables or factors) the most frequent category/level is returned. One can also specify a list of three functions via the fun
argument for aggregating variables belonging to these three types.
Argument na.rm
controls how missing values should be handled. By default, any missing estimates are first removed before aggregating the non-missing values within each cluster. The same applies when aggregating the other variables. One can also specify a vector with two logicals for the na.rm
argument to control how missing values should be handled when aggregating the estimates and when aggregating all other variables.