direct_standardization: Direct Adjusting in popEpi Using Weights

Description

Several functions in popEpi have support for direct standardization of estimates. This document explains the usage of weighting with those functions.

Arguments

Basic usage - one adjusting variable

In the simple case where we are adjusting by only one variable (e.g. by age group), one can simply supply a vector of weights: FUN(weights = c(0.1, 0.25, 0.25, 0.2, 0.2)) which may be stored in advance: w <- c(0.1, 0.25, 0.25, 0.2, 0.2) FUN(weights = w) The order of the weights matters. popEpi functions with direct adjusting enabled match the supplied weights to the adjusting variables as follows: If the adjusting variable is a factor, the order of the levels is used. Otherwise, the alphabetic order of the unique values is used (try sort to see how it works). For clarity and certainty we recommend using factor or numeric variables when possible. character variables should be avoided: to see why, try sort(15:9) and sort(as.character(15:9)). It is also possible to supply a character string corresponding to one of the age group standardization schemes integrated into popEpi:

'europe_1976_18of5' - european std. popupulation (1976), 18 age groups
'nordic_2000_18of5' - nordic std. popupulation (2000), 18 age groups
'world_1966_18of5' - world standard (1966), 18 age groups
'world_2000_18of5' - world standard (2000), 18 agegroups
'world_2000_20of5' - world standard (2000), 20 agegroups
'world_2000_101of1' - world standard (2000), 101 agegroups

You may also supply weights = "internal" to use internally computed weights, i.e. usually simply the counts of subjects / person-time experienced in each stratum. E.g. FUN(weights = "world_2000_18of5") will use the world standard population from 2000 as weights for 18 age groups, that your adjusting variable is assumed to contain. The adjusting variable must be coded in this case as a numeric variable containing 1:18 or as a factor with 18 levels (coded from the youngest to the oldest age group).

More than one adjusting variable

In the case that you employ more than one adjusting variable, separate weights should be passed to match to the levels of the different adjusting variables. When supplied correctly, "grand" weights are formed based on the variable-specific weights by multiplying over the variable-specific weights (e.g. if men have w = 0.5 and the age group 0-4 has w = 0.1, the "grand" weight for men aged 0-4 is 0.5*0.1). The "grand" weights are then used for adjusting after ensuring they sum to one. When using multiple adjusting variables, you are allowed to pass either a named list of weights or a data.frame of weights. E.g. WL <- list(agegroup = age_w, sex = sex_w) FUN(weights = WL) where age_w and sex_w are numeric vectors. Given the conditions explained in the previous section are satisfied, you may also do e.g. WL <- list(agegroup = "world_2000_18of", sex = sex_w) FUN(weights = WL) and the world standard pop is used as weights for the age groups as outlined in the previous section. Sometimes using a data.frame can be clearer (and it is fool-proof as well). To do this, form a data.frame that repeats the levels of your adjusting variables by each level of every other adjusting variable, and assign the weights as a column named "weights". E.g. wdf <- data.frame(sex = rep(0:1, each = 18), agegroup = rep(1:18, 2)) wdf$weights <- rbinom(36, size = 100, prob = 0.25) FUN(weights = wdf) If you want to use the counts of subjects in strata as the weights, one way to do this is by e.g. wdf <- as.data.frame(x$V1, x$V2, x$V3) names(wdf) <- c("V1", "V2", "V3", "weights")

Details

Direct standardization is performed by computing estimates of E by the set of adjusting variables A, to which a set of weights W is applicable. The weighted average over A is then the direct-adjusted estimate of E (E*).

To enable both quick and easy as well as more rigorous usage of direct standardization with weights, the weights arguments in popEpi can be supplied in several ways. Ability to use the different ways depends on the number of adjusting variables.

The weights are always handled internally to sum to 1, so they do not need to be scaled in this manner when they are supplied. E.g. counts of subjects in strata may be passed.

References

Source of the Nordic standard population in 5-year age groups (also contains European & 1966 world standards): http://www-dep.iarc.fr/NORDCAN/english/glossary.htm

Source of the 1976 European standard population:

Waterhouse, J.,Muir, C.S.,Correa, P.,Powell, J., eds (1976). Cancer Incidence in Five Continents, Vol. III. IARC Scientific Publications, No. 15, Lyon, IARC

A comparison of the 1966 vs. 2000 world standard populations in 5-year age groups: http://www3.ha.org.hk/cancereg/e_asr.asp