inverseSurv: Inverse Survivorship Models in the Fossil Record

Description

This function replicates the model-fitting procedure for forward and reverse survivorship curve data, described by Michael Foote in a series of papers (2001, 2003a, 2003b, 2005). These methods are discrete interval taxon ranges, as are used in many other functions in paleotree (see function arguments). This function can implement the continuous time, pulsed interval and mixed models described in Foote (2003a and 2005).

Usage

make_inverseSurv(timeList, groups = NULL, p_cont = TRUE, q_cont = TRUE,
  PA_n = "fixed", PB_1 = 0, Nb = 1, drop.extant = TRUE)

Arguments

timeList

A 2 column matrix with the first and last occurrences of taxa given in relative time intervals (i.e. ordered from first to last). If a list of length two is given for timeData, such as would be expected if the output of binTimeData was directly input,

groups

Either NULL (the default) or matrix with the number of rows equal to the number of taxa and the number of columns equal to the number of 'systems' of categories for taxa. Taxonomic membership in different groups is indicated by numeric values. For exam

p_cont

If TRUE (the default), then origination is assumed to be a continuous time process with an instantaneous rate. If FALSE, the origination is treated as a pulsed discrete-time process with a probability.

q_cont

If TRUE (the default), then extinction is assumed to be a continuous time process with an instantaneous rate. If FALSE, the extinction is treated as a pulsed discrete-time process with a probability.

PA_n

The probability of sampling a taxon after the last interval included in a survivorship study. Usually zero for extinct groups, although more logically has the value of 1 when there are still extant taxa (i.e., if the last interval is the Holocene and t

PB_1

The probability of sampling a taxon before the first interval included in a survivorship study. Should be a value of 0 to 1, or NULL. If NULL, PB_1 is treated as an additional free parameter in the output model.

The number of taxa that enter an interval (b is for 'bottom'). This is an arbitrary constant used to scale other values in these calculations and can be safely set to 1.

drop.extant

Drops all extant taxa from a dataset, w

Value

A function of class "paleotreeFunc", which takes a vector equal to the number of parameters and returns the *negative* log likelihood (for use with optim and similar optimizing functions, which attempt to minimize support values). See the functions listed at modelMethods for manipulating and examining such functions and constrainParPaleo for constraining parameters. The function output will take the largest number of parameters possible with respect to groupings and time-intervals, which means the number of parameters may number in the hundreds. Constraining the function for optimization is recommended except when datasets are very large. Parameters in the output functions are named 'p', 'q' and 'r', which are respectively the origination, extinction and sampling parameters. If the respective arguments 'p_cont' and 'q_cont' are TRUE, then 'p' and 'q' will represent the instantaneous per-capita origination and extinction rates (in units of per lineage time-units). When one of these arguments is given as FALSE, the respective parameter (p or q) will represent per-lineage-interval rates. For p, this will be the per lineage-interval rate of a lineage producing another lineage (which can exceed 1 because diversity can more than double) and for q, this will be the per lineage-interval 'rate' of a lineage going extinct, which cannot be observed to exceed 1 (because the proportion of diversity that goes extinct cannot exceed 1). To obtain the per lineage-interval rates from a set of per lineage-time unit rates, simply multiply the per lineage-time-unit rate by the duration of an interval (or divide, to do the reverse; see Foote, 2003 and 2005). 'r' is always the instantaneous per-capita sampling rate, in units per lineage-time units. If PA_n or PB_1 were given as NULL in the arguments, two additional parameters will be added, named respectively 'PA_n' and 'PB_1', and listed separately for every additional grouping. These are the probability of a taxon occurring before the first interval in the dataset (PB_1) and the probability of a taxon occurring after the last interval in a dataset (PA_n). Theses will be listed as 'PA_n.0' and 'PB_1.0' to indicate that they are not related to any particular time-interval included in the analysis, unlike the p, q, and r parameters (see below). Groupings follow the parameter names, separated by periods; by default, the parameters will be placed in groups corresponding to the discrete intervals in the input timeList, such that make_inverseSurv will create a function with parameters 'p.1', 'q.1' and 'r.1' for interval 1; 'p.2', 'q.2' and 'r.2' for interval 2 and so on. Additional groupings given by the user are listed after this first set (e.g. 'p.1.2.2'). Calculating The Results of an Inverse Survivorship Model{ Because of the complicated grouping and time interval scheme, combined with the probable preference of users to use constrained models rather that the full models, it may be difficult to infer what the rates for particular intervals and groups actually are in the final model, given the parameters that were found in the final optimization. To account for this, the function output by inverseSurv also contains an alternative mode which takes input rates and returns the final values along with a rudimentary plotting function. This allows users to output per-interval and per-group parameter estimates. To select these feature, the argument altMode must be TRUE. This function will invisibly return the rate values for each group and interval as a list of matrices, with each matrix composed of the p, q and r rates for each interval, for a specific grouping. This plotting is extremely primitive, and most users will probably find the invisibly returned rates to be of more interest. The function layout is used to play plots for different groupings in sequence, and this may lead to plots which are either hard to read or even cause errors (because of too many groupings, producing impossible plots). To repress this, the argument plotPar can be set to FALSE. This capability means the function has more arguments that just the usual par argument that accepts the vector of parameters for running an optimization. The first of these additional arguments, altMode enables this alternative mode, instead of trying to estimate the negative log-likelihood from the given parameters. The other arguments augment the calculation and plotting of rates. To summarize, a function output by inverseSurv has the following arguments: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object] }

Details

The design of this function to handle mixed continuous and discrete time models means that parameters can mean very different things, dependent on how a model is defined. Users should carefully evaluate their function arguments and the discussion of parameter values as described in the Value section.

References

Foote, M. 2001. Inferring temporal patterns of preservation, origination, and extinction from taxonomic survivorship analysis. Paleobiology 27(4):602-630. Foote, M. 2003a. Origination and Extinction through the Phanerozoic: A New Approach. The Journal of Geology 111(2):125-148. Foote, M. 2003b. Erratum: Origination and Extinction through the Phanerozoic: a New Approach. The Journal of Geology 111(6):752-753. Foote, M. 2005. Pulsed origination and extinction in the marine realm. Paleobiology 31(1):6-20.

Examples

Run this code

# let's simulate some taxon ranges from an imperfectly sampled fossil record
set.seed(444)
taxa <- simFossilTaxa(p=0.1,q=0.1,nruns=1,mintaxa=20,maxtaxa=30,
      maxtime=1000, maxExtant=0)
rangesCont <- sampleRanges(taxa,r=0.5)
#bin the ranges into discrete time intervals
rangesDisc <- binTimeData(rangesCont,int.length=5)

#apply make_inverseSurv
likFun<-make_inverseSurv(rangesDisc)
#use constrainParPaleo to make the model time-homogenous
  	#match.all~match.all will match parameters so only 2 pars: p=q and r
constrFun<-constrainParPaleo(likFun,match.all~match.all)
results <- optim(parInit(constrFun), constrFun,
      lower=parLower(constrFun), upper=parUpper(constrFun),
      method="L-BFGS-B", control=list(maxit=1000000))
results

#plot the results
constrFun(results$par, altMode=TRUE)

#unconstrained function with ALL of 225 parameters!!!
    # this will take forever to converge, so it isn't run
optim(parInit(likFun),likFun,lower=parLower(likFun),upper=parUpper(likFun),
      method="L-BFGS-B",control=list(maxit=1000000))

Run the code above in your browser using DataLab