per
), age
, and follow-up
time (fot
, from 0 to the end of follow-up)
into subject-time-interval rows according to
given breaks
and additionally processed if requested.lexpand(data, birth = bi_date, entry = dg_date, exit = ex_date,
event = NULL, status = status != 0, entry.status = NULL,
breaks = list(fot = c(0, Inf)), id = NULL, overlapping = TRUE,
aggre = NULL, aggre.type = c("unique", "cross-product"), drop = TRUE,
pophaz = NULL, pp = TRUE, subset = NULL, merge = TRUE,
verbose = FALSE, ...)
exit
;
typically only used in certain SIR/SMR calculations - see Details;
keep NULL
if exit
is the time of the event; quoted or unquotedexit
or event
;
e.g. status = status != 0
; expression or quoted variable namestatus
;
status at entry
; see Detailsbreaks = list(fot=0:5, age=c(0,45,65,Inf))
; see Detailsid = my_id
; quoted or unquotedFALSE
AND if data
contains
multiple rows per subject AND event
is defined,
ensures that the timelines lex.id
-specific rows do not overlap;
this ensures e.g. that person-years areaggre = list(sex, fot)
;
a list of unquoted variables and/or expressions thereof,
which are interpreted as factors; data events and person-years will
be aggregated by the unique combinations of these; see Details"unique"
or "cross-product"
;
can be abbreviated;
state transitions and person-year will be calculated either for all
existing levels of expressions in aggre
, or
for the cross-product of all possible existiTRUE
, drops all resulting rows
after splitting that reside outside
the time window as defined by the given breaks (all time scales)TRUE
, computes Pohar-Perme weights using
pophaz
; adds variable with reserved name pp
;
see Details for computing methodTRUE
, retains all
original variables from the dataTRUE
, the function is chatty and
returns some messages along the wayfot = 0:5
; instead of specifying a breaks
list,
correctly named breaks vectors can be given
for fot
, age
, and per
; these override any breaks in the
breaks
listaggre = NULL
, returns
a data.table
or data.frame
(depending on options("popEpi.datatable")
; see ?popEpi
)
object expanded to accommodate split observations with time scales as
fractional years and pophaz
merged in if given. Population
hazard levels in new variable pop.haz
, and Pohar-Perme
weights as new variable pp
if requested.
If aggre
is defined, returns a long-format
data.table
/data.frame
with the variable pyrs
(person-years),
and variables for the counts of transitions in state or state at end of
follow-up formatted fromXtoY
, where X
and Y
are
the states transitioned from and to, respectively.lexpand
splits a given data set (with e.g. cancer diagnoses
as rows) to subintervals of time over
calendar time, age, and follow-up time with given time breaks
using splitMulti
.
The dataset must contain appropriate
Date
/ IDate
/ date
format or
other numeric variables that can be used
as the time variables.
You may take a look at a simulated cohort
sire
as an example of the
minimum required information for processing data with lexpand
.
Breaks
You should define all breaks as left inclusive and right exclusive
time points (e.g.[a,b)
)
for 1-3 time dimensions so that the last member of a breaks vector
is a meaningful "final upper limit",
e.g. per = c(2002,2007,2012)
to create a last subinterval of the form [2007,2012)
.
All breaks are explicit, i.e. if drop = TRUE
,
any data beyond the outermost breaks points are dropped.
If one wants to have unspecified upper / lower limits on one time scale,
use Inf
: e.g. breaks = list(fot = 0:5, age = c(0,45,Inf))
.
Breaks for per
can also be given in
Date
/IDate
/date
format, whereupon
they are converted to fractional years before used in splitting.
Time variables
If any of the given time variables
(birth
, entry
, exit
, event
)
is in any kind of date format, they are first coerced to
fractional years before splitting
using get.yrs
(with year.length = "actual"
).
Sometimes in e.g. SIR/SMR calculation one may want the event time to differ
from the time of exit from follow-up, if the subject is still considered
to be at risk of the event. If event
is specified, the transition to
status
is moved to event
from exit
using cutLexis
. See Examples.
The status variable
The statuses in the expanded output (lex.Cst
and lex.Xst
)
are determined by using either only status
or both status
and entry.status
. If entry.status = NULL
, the status at entry
is guessed according to the type of variable supplied via status
:
For numeric variables it will be zero, for factors the first level
(levels(status)[1]
) and otherwise the first unique value in alphabetical
order (sort(unique(status))[1]
).
Using numeric or factor status
variables is strongly recommended. Logical expressions are also allowed
(e.g. status = my_status != 0L
) and are converted to integer internally.
Merging population hazard information
To enable computing relative/net survivals with survtab
and relpois
, lexpand
merges an appropriate
population hazard data (pophaz
) to the expanded data
before dropping rows outside the specified
time window (if drop = TRUE
). pophaz
must, for this reason,
contain at a minimum the variables named
agegroup
, year
, and haz
. pophaz
may contain additional variables to specify
different population hazard levels in different strata; e.g. popmort
includes sex
.
All the strata-defining variables must be present in the supplied data
. lexpand
will
automatically detect variables with common names in the two datas and merge using them.
Currently year
must be an integer variable specifying the appropriate year. agegroup
must currently also specify one-year age groups, e.g. popmort
specifies 101 age groups
of length 1 year. In both
year
and agegroup
variables the values are interpreted as the lower bounds of intervals
(and passed on to a cut
call). The mandatory variable haz
must specify the appropriate average rate at the person-year level;
e.g. haz = -log(survProb)
where survProb
is a one-year conditional
survival probability will be the correct hazard specification. **tajuan, mutta en osaa korjata!**
The corresponding pophaz
population hazard value is merged by using the mid points
of the records after splitting as reference values. E.g. if age=89.9
at the start
of a 1-year interval, then the reference age value is 90.4
for merging.
This way we get a "typical" population hazard level for each record.
Computing Pohar-Perme weights
If pp = TRUE
, Pohar-Perme weights
(the inverse of cumulative population survival) are computed. This will
create the new pp
variable in the expanded data. pp
is a
reserved name and lexpand
throws exception if a variable with that name
exists in data
.
When a survival interval contains one or several rows per subject
(e.g. due to splitting by the per
scale),
pp
is cumulated from the beginning of the first record in a survival
interval for each subject to the mid-point of the remaining time within that
survival interval, and that value is given for every other record
that a given person has within the same survival interval.
E.g. with 5 rows of duration 1/5
within a survival interval
[0,1)]
, pp
is determined for all records by a cumulative
population survival from 0
to 0.5
. Th existing accuracy is used,
so that the weight is cumulated first up to the end of the second row
and then over the remaining distance to the mid-point (first to 0.4, then to
0.5). This ensures that more accurately merged population hazards are fully
used.
Aggregating
Certain analyses such as SIR/SMR calculations require tables of events and
person-years by the unique combinations (interactions) of several variables.
For this, aggre
can be specified as a list of such variables
(preferably factor
variables but nto mandatory)
and any arbitrary functions of the
variables at one's disposal. E.g.
aggre = list(sex, agegr = cut(dg_age, 0:100))
would tabulate events and person-years by sex and an ad-hoc age group
variable. Every ad-hoc-created variable should be named.
fot
, per
, and age
are special reserved variables which,
when present in the aggre
list, are outputted as categories of the
corresponding time scale variables by using
e.g.
cut(fot, breaks$fot, right=FALSE)
.
This only works if
the corresponding breaks are defined in breaks
or via ...
.
E.g.
aggre = list(sex, fot.int = fot)
with
breaks = list(fot=0:5)
.
The outputted variable fot.int
in the above example will have
the lower limits of the appropriate intervals as values.
aggre
as a named list will output numbers of events and person-years
with the given new names as categorizing variable names, e.g.
aggre = list(follow_up = fot, gender = sex, agegroup = age)
.
The ouputted table has person-years (pyrs
) and event (mutation) counts
(e.g. from0to1
) as columns. Event counts are the numbers of mutations
(lex.Cst != lex.Xst
) or the lex.Xst
value at a subject's
last record (subject possibly defined by id
).
If aggre.type = "unique"
, the above results are computed for existing
combinations of expressions given in aggre
, but also for non-existing
combinations if aggre.type = "cross-product"
. E.g. if a
factor variable has levels "a", "b", "c"
but the data is limited
to only have levels "a", "b"
present
(more than zero rows have these level values), the former setting only
computes results for "a", "b"
, and the latter also for "c"
and any combination with other variables or expression given in aggre
.splitMulti
, Lexis
, survtab
, relpois
, popmort
sir
## prepare data for e.g. 5-year cohort survival calculation
x <- lexpand(sire, breaks=list(fot=seq(0, 5, by = 1/12)),
status = status != 0, pophaz=popmort)
## prepare data for e.g. 5-year "period analysis" for 2008-2012
BL <- list(fot = seq(0, 5, by = 1/12), per = c("2008-01-01", "2013-01-01"))
x <- lexpand(sire, breaks = BL, pophaz=popmort, status = status != 0)
## aggregating
BL <- list(fot = 0:5, per = c("2003-01-01","2008-01-01", "2013-01-01"))
ag <- lexpand(sire, breaks = BL, status = status != 0,
aggre=list(sex, period = per, surv.int = fot))
## using "..."
x <- lexpand(sire, fot=0:5, pophaz=popmort, status = status != 0)
x <- lexpand(sire, fot=0:5, status = status != 0,
aggre=list(sex, surv.int = fot))
## using the "event" argument: it just places the transition to given "status"
## at the "event" time instead of at the end, if possible using cutLexis
x <- lexpand(sire, status = status, event = dg_date, birth=bi_date, entry=bi_date, exit=ex_date)
## aggregating with custom "event" time
x <- lexpand(sire, status = status, event = dg_date, birth=bi_date, entry=bi_date, exit=ex_date,
per = 1970:2014, age = c(0:100,Inf),
aggre = list(sex, year = per, agegroup = age))
Run the code above in your browser using DataLab