per), age, and follow-up
time (fot, from 0 to the end of follow-up)
into subject-time-interval rows according to
given breaks and additionally processed if requested.lexpand(data, birth = bi_date, entry = dg_date, exit = ex_date,
event = NULL, status = status != 0, entry.status = NULL,
breaks = list(fot = c(0, Inf)), id = NULL, overlapping = TRUE,
aggre = NULL, aggre.type = c("unique", "cross-product"), drop = TRUE,
pophaz = NULL, pp = TRUE, subset = NULL, merge = TRUE,
verbose = FALSE, ...)exit;
typically only used in certain SIR/SMR calculations - see Details;
keep NULL if exit is the time of the event; quoted or unquotedexit or event;
e.g. status = status != 0; expression or quoted variable namestatus;
status at entry; see Detailsbreaks = list(fot=0:5, age=c(0,45,65,Inf)); see Detailsid = my_id; quoted or unquotedFALSE AND if data contains
multiple rows per subject AND event is defined,
ensures that the timelines lex.id-specific rows do not overlap;
this ensures e.g. that person-years areaggre = list(sex, fot);
a list of unquoted variables and/or expressions thereof,
which are interpreted as factors; data events and person-years will
be aggregated by the unique combinations of these; see Details"unique" or "cross-product";
can be abbreviated;
state transitions and person-year will be calculated either for all
existing levels of expressions in aggre, or
for the cross-product of all possible existiTRUE, drops all resulting rows
after splitting that reside outside
the time window as defined by the given breaks (all time scales)TRUE, computes Pohar-Perme weights using
pophaz; adds variable with reserved name pp;
see Details for computing methodTRUE, retains all
original variables from the dataTRUE, the function is chatty and
returns some messages along the wayfot = 0:5; instead of specifying a breaks list,
correctly named breaks vectors can be given
for fot, age, and per; these override any breaks in the
breaks listaggre = NULL, returns
a data.table or data.frame
(depending on options("popEpi.datatable"); see ?popEpi)
object expanded to accommodate split observations with time scales as
fractional years and pophaz merged in if given. Population
hazard levels in new variable pop.haz, and Pohar-Perme
weights as new variable pp if requested.
If aggre is defined, returns a long-format
data.table/data.frame with the variable pyrs (person-years),
and variables for the counts of transitions in state or state at end of
follow-up formatted fromXtoY, where X and Y are
the states transitioned from and to, respectively.lexpand splits a given data set (with e.g. cancer diagnoses
as rows) to subintervals of time over
calendar time, age, and follow-up time with given time breaks
using splitMulti.
The dataset must contain appropriate
Date / IDate / date format or
other numeric variables that can be used
as the time variables.
You may take a look at a simulated cohort
sire as an example of the
minimum required information for processing data with lexpand.
Breaks
You should define all breaks as left inclusive and right exclusive
time points (e.g.[a,b) )
for 1-3 time dimensions so that the last member of a breaks vector
is a meaningful "final upper limit",
e.g. per = c(2002,2007,2012)
to create a last subinterval of the form [2007,2012).
All breaks are explicit, i.e. if drop = TRUE,
any data beyond the outermost breaks points are dropped.
If one wants to have unspecified upper / lower limits on one time scale,
use Inf: e.g. breaks = list(fot = 0:5, age = c(0,45,Inf)).
Breaks for per can also be given in
Date/IDate/date format, whereupon
they are converted to fractional years before used in splitting.
Time variables
If any of the given time variables
(birth, entry, exit, event)
is in any kind of date format, they are first coerced to
fractional years before splitting
using get.yrs (with year.length = "actual").
Sometimes in e.g. SIR/SMR calculation one may want the event time to differ
from the time of exit from follow-up, if the subject is still considered
to be at risk of the event. If event is specified, the transition to
status is moved to event from exit
using cutLexis. See Examples.
The status variable
The statuses in the expanded output (lex.Cst and lex.Xst)
are determined by using either only status or both status
and entry.status. If entry.status = NULL, the status at entry
is guessed according to the type of variable supplied via status:
For numeric variables it will be zero, for factors the first level
(levels(status)[1]) and otherwise the first unique value in alphabetical
order (sort(unique(status))[1]).
Using numeric or factor status
variables is strongly recommended. Logical expressions are also allowed
(e.g. status = my_status != 0L) and are converted to integer internally.
Merging population hazard information
To enable computing relative/net survivals with survtab
and relpois, lexpand merges an appropriate
population hazard data (pophaz) to the expanded data
before dropping rows outside the specified
time window (if drop = TRUE). pophaz must, for this reason,
contain at a minimum the variables named
agegroup, year, and haz. pophaz may contain additional variables to specify
different population hazard levels in different strata; e.g. popmort includes sex.
All the strata-defining variables must be present in the supplied data. lexpand will
automatically detect variables with common names in the two datas and merge using them.
Currently year must be an integer variable specifying the appropriate year. agegroup
must currently also specify one-year age groups, e.g. popmort specifies 101 age groups
of length 1 year. In both
year and agegroup variables the values are interpreted as the lower bounds of intervals
(and passed on to a cut call). The mandatory variable haz
must specify the appropriate average rate at the person-year level;
e.g. haz = -log(survProb) where survProb is a one-year conditional
survival probability will be the correct hazard specification. **tajuan, mutta en osaa korjata!**
The corresponding pophaz population hazard value is merged by using the mid points
of the records after splitting as reference values. E.g. if age=89.9 at the start
of a 1-year interval, then the reference age value is 90.4 for merging.
This way we get a "typical" population hazard level for each record.
Computing Pohar-Perme weights
If pp = TRUE, Pohar-Perme weights
(the inverse of cumulative population survival) are computed. This will
create the new pp variable in the expanded data. pp is a
reserved name and lexpand throws exception if a variable with that name
exists in data.
When a survival interval contains one or several rows per subject
(e.g. due to splitting by the per scale),
pp is cumulated from the beginning of the first record in a survival
interval for each subject to the mid-point of the remaining time within that
survival interval, and that value is given for every other record
that a given person has within the same survival interval.
E.g. with 5 rows of duration 1/5 within a survival interval
[0,1)], pp is determined for all records by a cumulative
population survival from 0 to 0.5. Th existing accuracy is used,
so that the weight is cumulated first up to the end of the second row
and then over the remaining distance to the mid-point (first to 0.4, then to
0.5). This ensures that more accurately merged population hazards are fully
used.
Aggregating
Certain analyses such as SIR/SMR calculations require tables of events and
person-years by the unique combinations (interactions) of several variables.
For this, aggre can be specified as a list of such variables
(preferably factor variables but nto mandatory)
and any arbitrary functions of the
variables at one's disposal. E.g.
aggre = list(sex, agegr = cut(dg_age, 0:100))
would tabulate events and person-years by sex and an ad-hoc age group
variable. Every ad-hoc-created variable should be named.
fot, per, and age are special reserved variables which,
when present in the aggre list, are outputted as categories of the
corresponding time scale variables by using
e.g.
cut(fot, breaks$fot, right=FALSE).
This only works if
the corresponding breaks are defined in breaks or via ....
E.g.
aggre = list(sex, fot.int = fot) with
breaks = list(fot=0:5).
The outputted variable fot.int in the above example will have
the lower limits of the appropriate intervals as values.
aggre as a named list will output numbers of events and person-years
with the given new names as categorizing variable names, e.g.
aggre = list(follow_up = fot, gender = sex, agegroup = age).
The ouputted table has person-years (pyrs) and event (mutation) counts
(e.g. from0to1) as columns. Event counts are the numbers of mutations
(lex.Cst != lex.Xst) or the lex.Xst value at a subject's
last record (subject possibly defined by id).
If aggre.type = "unique", the above results are computed for existing
combinations of expressions given in aggre, but also for non-existing
combinations if aggre.type = "cross-product". E.g. if a
factor variable has levels "a", "b", "c" but the data is limited
to only have levels "a", "b" present
(more than zero rows have these level values), the former setting only
computes results for "a", "b", and the latter also for "c"
and any combination with other variables or expression given in aggre.splitMulti, Lexis, survtab, relpois, popmort sir## prepare data for e.g. 5-year cohort survival calculation
x <- lexpand(sire, breaks=list(fot=seq(0, 5, by = 1/12)),
status = status != 0, pophaz=popmort)
## prepare data for e.g. 5-year "period analysis" for 2008-2012
BL <- list(fot = seq(0, 5, by = 1/12), per = c("2008-01-01", "2013-01-01"))
x <- lexpand(sire, breaks = BL, pophaz=popmort, status = status != 0)
## aggregating
BL <- list(fot = 0:5, per = c("2003-01-01","2008-01-01", "2013-01-01"))
ag <- lexpand(sire, breaks = BL, status = status != 0,
aggre=list(sex, period = per, surv.int = fot))
## using "..."
x <- lexpand(sire, fot=0:5, pophaz=popmort, status = status != 0)
x <- lexpand(sire, fot=0:5, status = status != 0,
aggre=list(sex, surv.int = fot))
## using the "event" argument: it just places the transition to given "status"
## at the "event" time instead of at the end, if possible using cutLexis
x <- lexpand(sire, status = status, event = dg_date, birth=bi_date, entry=bi_date, exit=ex_date)
## aggregating with custom "event" time
x <- lexpand(sire, status = status, event = dg_date, birth=bi_date, entry=bi_date, exit=ex_date,
per = 1970:2014, age = c(0:100,Inf),
aggre = list(sex, year = per, agegroup = age))Run the code above in your browser using DataLab