sir(coh.data, coh.obs, coh.pyrs, ref.data = NULL, ref.obs = NULL,
ref.pyrs = NULL, ref.rate = NULL, subset = NULL, print = NULL,
adjust = NULL, mstate = NULL, test.type = "homogeneity", alpha = 0.95,
p.adj = NULL, EAR = FALSE, round.by = 2, round.by.pvalue = 4)
lexpand
coh.data
is stratified in print
.ref.pyrs
and ref.obs
.coh.data
before any computationscoh.obs
length is two or more. See details.help(p.adjust)
for options. Default NULL doesn't add adjusted p-values.data.table
objects, vector of starta variables and global p-value.sir
is a comprehensive tool for modelling SIRs/SMRs with flexible
options to adjust and print SIR's, test homogeneity and utilize
multistate data. The cohort data and the variable names for observation
counts and person-years are required.
The reference data is optional, since the cohort data
can be stratified (with print
) and compared to total.
Adjust and print
A SIR can be adjusted by the covariates found in both coh.data
and ref.data
.
Variables to adjust by are supplied as character
strings of the names of variables to adjust
. Variable names needs to
match in both coh.data
and ref.data
. Typical variables to adjust by are
gender, age group and calendar period.
print
is used to stratify the SIR output. In other words, the variables
assigned to print
are the covariates of the Poisson model.
Variable levels are treaded as categorical.
Variables can be assigned in both print
and adjust
.
This means the output it adjusted and printed by these variables.
print
can also be a list of expressions. This allows changing variable
names or transforming variables with functions such as cut
andround
.
For example, the existing variables agegroup
and year
could be
transformed to new levels using cut
by
print = list( age.category = cut(agegroup, breaks = c(seq(0,85,5), 120)), year.cat = cut(year, seq(1950,2015,10)))
ref.rate or ref.obs & ref.pyrs
The population rate variable can be given to the ref.rate
parameter.
That is, when using e.g. the popmort
or a comparable data file, one may
supply ref.rate
instead of ref.obs
and ref.pyrs
, which
will be ignored if ref.rate
is supplied.
Note that if all the stratifying variables in
ref.data
aren't listed in adjust
,
or when the categories are otherwise combined,
the (unweighted) mean of rates is used for computing expected cases.
This might incur a small bias in comparison to when exact numbers of observations
and person-years are available.
mstate
E.g. with lexpand
it's possible to compute counts for several outcomes
so that the population at risk is same for each
outcome such as a certain kind of cancer.
The transition counts are in wide data format,
and the relevant columns can be supplied to sir
in a vector via the coh.obs
argument.
The name of the corresponding new column in ref.data
is given in
mstate
. It's recommended to include the mstate
variable in adjust
,
so the corresponding information should also be available in ref.data
.
This approach is analogous to where SIRs are calculated separately their
own function calls.
Other parameters
The univariate multiple-comparison-adjusted p-value uses p.adjust
.
Univariate confidence intervals are calculated using exact
Poisson intervals (poisson.ci). The multivariate result
is based on a poisson regression model with profile-likelihood confidence intervals
when possible. Otherwise Wald's normal-approximation is used.
The p-value is a test for the levels of print
. The test can be either
"homogeneity"
, a likelihood ratio test where the model with variable(s) in
print
(categorical factor) is compared to the constant model.
Option "trend"
is the same likelihood ratio test except the
variable(s) in print
are/is continous.
EAR: Excess Absolute Risk
A simple way to quantify the absolute difference between cohort risk and
population risk.
Make sure that the person-years are calculated accordingly before using EAR.
Formula for EAR:
$$EAR = \frac{observed - expected}{person years} \times 1000.$$
Data format
The data should be given in aggregated format, i.e the number of observations
and person-years are represented for each stratum.
The extra variables and levels are reduced automatically before estimating SIRs.
Example of data format:
plot.sir
, lexpand
data(popmort)
data(sire)
c <- lexpand( sire, status = status, birth = bi_date, exit = ex_date, entry = dg_date,
breaks = list(per = 1950:2013, age = 1:100, fot = c(0,10,20,Inf)),
aggre = list(fot, agegroup = age, year = per, sex) )
## SMR due other causes: status = 2
se <- sir( coh.data = c, coh.obs = 'from0to2', coh.pyrs = 'pyrs',
ref.data = popmort, ref.rate = 'haz',
adjust = c('agegroup', 'year', 'sex'), print = 'fot')
se
## for examples see: vignette('sir')
Run the code above in your browser using DataLab