The function as.epidata is used to generate objects
  of class "epidata".  Objects of this class are
  specific data frames containing the event history of an epidemic together
  with some additional attributes.  These objects are the basis for fitting
  spatio-temporal epidemic intensity models with the function
  twinSIR.  Their implementation is illustrated in Meyer
  et al. (2017, Section 4), see vignette("twinSIR").
  Note that the spatial information itself, i.e.
  the positions of the individuals, is assumed to be constant over time.  
  Besides epidemics following the SIR compartmental model, also data from SI,
  SIRS and SIS epidemics may be supplied.
as.epidata(data, ...)# S3 method for data.frame
as.epidata(data, t0,
           tE.col, tI.col, tR.col, id.col, coords.cols,
           f = list(), w = list(), D = dist,
           max.time = NULL, keep.cols = TRUE, ...)
# S3 method for default
as.epidata(data, id.col, start.col, stop.col,
           atRiskY.col, event.col, Revent.col, coords.cols,
           f = list(), w = list(), D = dist, .latent = FALSE, ...)
# S3 method for epidata
print(x, ...)
# S3 method for epidata
[(x, i, j, drop)
# S3 method for epidata
update(object, f = list(), w = list(), D = dist, ...)
a data.frame with the columns "BLOCK", "id",
"start", "stop", "atRiskY", "event",
"Revent" and the coordinate columns (with the original names from
data), which are all obligatory.  These columns are followed by any 
  remaining columns of the input data.  Last but not least, the newly
  generated columns with epidemic variables corresponding to the functions
  in the list f are appended, if length(f) > 0.
The data.frame is given the additional attributes
numeric vector of infection time points (sorted chronologically).
numeric vector of length 2: c(min(start), max(stop)).
numeric vector containing the column indices of the coordinate columns in the resulting data frame.
this equals the argument f.
this equals the argument w.
For the data.frame-method, a data frame with as many rows as
    there are individuals in the population and time columns indicating
    when each individual became exposed (optional), infectious
    (mandatory, but can be NA for non-affected individuals) and
    removed (optional). Note that this data format does not allow for
    re-infection (SIRS) and time-varying covariates.
    The data.frame-method converts the individual-indexed data
    frame to the long event history start/stop format and then feeds it
    into the default method. If calling the generic function
    as.epidata on a data.frame and the t0 argument
    is missing, the default method is called directly.
    For the default method, data can be a matrix or
    a data.frame.
    It must contain the observed event history in a form similar to 
    Surv(, type="counting") in package survival,
    with additional information (variables) along 
    the process.  Rows will be sorted automatically during conversion.
    The observation period is split up into consecutive
    intervals of constant state - thus constant infection intensities.
    The data frame consists of a block of \(N\) (number of individuals) 
    rows for each of those time intervals (all rows in a block have the same start 
    and stop values... therefore the name “block”), where there is one 
    row per individual in the block.  Each row describes the (fixed) state of 
    the individual during the interval given by the start and stop columns 
    start.col and stop.col.
    Note that there may not be more than one event (infection or removal) in a
    single block.  Thus, in a single block, only one entry in the 
    event.col and Revent.col may be 1, all others are 0.  This
    rule follows the point process characteristic that there are no
    concurrent events (infections or removals).
observation period. In the resulting "epidata", the time
    scale will be relative to the start time t0.
    Individuals that have already been removed prior to t0, i.e.,
    rows with tR <= t0, will be dropped.
    The end of the observation period (max.time) will by default
    (NULL, or if NA) coincide with the last observed event.
single numeric or character indexes of the time columns in
    data, which specify when the individuals became exposed,
    infectious and removed, respectively.
    tE.col and tR.col can be missing, corresponding to
    SIR, SEI, or SI data. NA entries mean that the respective
    event has not (yet) occurred. Note that is.na(tE) implies
    is.na(tI) and is.na(tR), and is.na(tI) implies
    is.na(tR) (and this is checked for the provided data).
    CAVE: Support for latent periods (tE.col) is experimental!
          twinSIR cannot handle them anyway.
single numeric or character index of the id column in data.
    The id column identifies the individuals in the data frame.
    It is converted to a factor by calling factor, i.e.,
    unused levels are dropped if it already was a factor.
single index of the start column in data.  Can be numeric
    (by column number) or character (by column name).
    The start column contains the (numeric) time points of the beginnings
    of the consecutive time intervals of the event history.  The minimum value
    in this column, i.e. the start of the observation period should be 0.
single index of the stop column in data.  Can be numeric
    (by column number) or character (by column name).
    The stop column contains the (numeric) time points of the ends
    of the consecutive time intervals of the event history.  The stop value must
    always be greater than the start value of a row.
single index of the atRiskY column in data.  Can be numeric
    (by column number) or character (by column name).
    The atRiskY column indicates if the individual was “at-risk”
    of becoming infected during the time interval (start; stop].  This variable 
    must be logical or in 0/1-coding.
    Individuals with atRiskY == 0 in the first time interval (normally 
    the rows with start == 0) are taken as initially infectious.
single index of the event column in data.  Can be numeric
    (by column number) or character (by column name).
    The event column indicates if the individual became infected
    at the stop time of the interval.  This variable must be logical or
    in 0/1-coding.
single index of the Revent column in data.  Can be numeric
    (by column number) or character (by column name).
    The Revent column indicates if the individual was recovered 
    at the stop time of the interval.  This variable must be logical or
    in 0/1-coding.
indexes of the coords columns in data. Can be
    numeric (by column number), character (by column name), or NULL
    (no coordinates, e.g., if D is a pre-specified distance matrix).
    These columns contain the individuals' coordinates, which determine
    the distance matrix for the distance-based components of the force
    of infection (see argument f). By default, Euclidean distance
    is used (see argument D).
    Note that the functions related to twinSIR currently assume
    fixed positions of the individuals during the whole epidemic.  Thus, an
    individual has the same coordinates in every block.  For simplicity, the
    coordinates are derived from the first time block only (normally the rows 
    with start == 0).
    The animate-method requires coordinates.
a named list of vectorized functions for a
    distance-based force of infection.
    The functions must interact elementwise on a (distance) matrix D so that
    f[[m]](D) results in a matrix.  A simple example is
    function(u) {u <= 1}, which indicates if the Euclidean distance
    between the individuals is smaller than or equal to 1.
    The names of the functions determine the names of the epidemic variables
    in the resulting data frame.  So, the names should not coincide with
    names of other covariates.
    The distance-based weights are computed as follows:
    Let \(I(t)\) denote the set of infectious
    individuals just before time \(t\).
    Then, for individual \(i\) at time \(t\), the
    \(m\)'th covariate has the value
    \(\sum_{j \in I(t)} f_m(d_{ij})\),
    where \(d_{ij}\) denotes entries of the distance matrix
    (by default this is the Euclidean distance \(||s_i - s_j||\)
    between the individuals' coordinates, but see argument D).
a named list of vectorized functions for extra 
    covariate-based weights \(w_{ij}\) in the epidemic component.
    Each function operates on a single time-constant covariate in
    data, which is determined by the name of the first argument:
    The two function arguments should be named varname.i and
    varname.j, where varname is one of names(data).
    Similar to the components in f, length(w) epidemic
    covariates will be generated in the resulting "epidata" named
    according to names(w).  So, the names should not coincide with
    names of other covariates.  For individual \(i\) at time
    \(t\), the \(m\)'th such covariate has the value
    \(\sum_{j \in I(t)} w_m(z^{(m)}_i, z^{(m)}_j)\),
    where \(z^{(m)}\) denotes the variable in data associated
    with w[[m]].
either a function to calculate the distances between the individuals
    with locations taken from coord.cols (the default is
    Euclidean distance via the function dist) and
    the result converted to a matrix via as.matrix,
    or a pre-computed distance matrix with dimnames containing
    the individual ids (a classed "Matrix" is supported).
logical indicating if all columns in data
    should be retained (and not only the obligatory "epidata"
    columns), in particular any additional columns with 
    time-constant individual-specific covariates.
    Alternatively, keep.cols can be a numeric or character vector
    indexing columns of data to keep.
(internal) logical indicating whether to allow for latent periods (EXPERIMENTAL). Otherwise (default), the function verifies that an event (i.e., switching to the I state) only happens when the respective individual is at risk (i.e., in the S state).
an object of class "epidata".
arguments passed to print.data.frame. Currently unused
    in the as.epidata-methods.
arguments passed to [.data.frame.
Sebastian Meyer
The print method for objects of class "epidata" simply prints
  the data frame with a small header containing the time range of the observed
  epidemic and the number of infected individuals.  Usually, the data frames
  are quite long, so the summary method summary.epidata might be
  useful.  Also, indexing/subsetting "epidata" works exactly as for
  data.frames, but there is an own method, which
  assures consistency of the resulting "epidata" or drops this class, if
  necessary.
  The update-method can be used to add or replace distance-based
  (f) or covariate-based (w) epidemic variables in an
  existing "epidata" object.
SIS epidemics are implemented as SIRS epidemics where the length of the removal period equals 0. This means that an individual, which has an R-event will be at risk immediately afterwards, i.e. in the following time block. Therefore, data of SIS epidemics have to be provided in that form containing “pseudo-R-events”.
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. tools:::Rd_expr_doi("10.18637/jss.v077.i11")
The hagelloch data as an example.
The plot and the
summary method for class "epidata".
Furthermore, the function animate.epidata for the animation of
epidemics.
Function twinSIR for fitting spatio-temporal epidemic intensity
models to epidemic data.
Function simEpidata for the simulation of epidemic data.
data("hagelloch")   # see help("hagelloch") for a description
head(hagelloch.df)
## convert the original data frame to an "epidata" event history
myEpi <- as.epidata(hagelloch.df, t0 = 0,
                    tI.col = "tI", tR.col = "tR", id.col = "PN",
                    coords.cols = c("x.loc", "y.loc"),
                    keep.cols = c("SEX", "AGE", "CL"))
if (surveillance.options("allExamples")) {
## test consistency with default method
evHist <- as.data.frame(myEpi)[,-1]
myEpi2 <- as.epidata(
    evHist, id.col = 1, start.col = "start", stop.col = "stop",
    atRiskY.col = "atRiskY", event.col = "event", Revent.col = "Revent",
    coords.cols = c("x.loc", "y.loc")
)
stopifnot(identical(myEpi, myEpi2))
}
str(myEpi)
head(as.data.frame(myEpi))  # "epidata" has event history format
summary(myEpi)              # see 'summary.epidata'
plot(myEpi)                 # see 'plot.epidata' and also 'animate.epidata'
## add distance- and covariate-based weights for the force of infection
## in a twinSIR model, see vignette("twinSIR") for a description
myEpi <- update(myEpi,
    f = list(
        household    = function(u) u == 0,
        nothousehold = function(u) u > 0
    ),
    w = list(
        c1 = function (CL.i, CL.j) CL.i == "1st class" & CL.j == CL.i,
        c2 = function (CL.i, CL.j) CL.i == "2nd class" & CL.j == CL.i
    )
)
## this is now identical to the prepared hagelloch "epidata"
stopifnot(all.equal(myEpi, hagelloch))
if (surveillance.options("allExamples")) {
## test with precomputed distance matrix D
myEpi3 <- suppressWarnings( # from overwriting existing f columns
    update(hagelloch, f = attr(hagelloch, "f"),
           D = as.matrix(dist(hagelloch.df[c("x.loc", "y.loc")])))
)
stopifnot(identical(hagelloch, myEpi3))
}
Run the code above in your browser using DataLab