upgma_model_selection: Model selection function based on a upgma grouping algorithm

Description

upgma_model_selection function conducts a model selection procedure intended to find an optimal partition that mimimize AIC values. Maximum likelihood estimation of model parameters (Colonization, Extinction) or (Colonization, Extinction, Detectability, P_0) is performed assuming either perfect detectability or imperfect detectability, respectively. In the latter case, the input data frame should contain multiple transects per sampling time. This function can handle missing data defining a heterogeneous sampling structure across the rows of the input data matrix. The function generates, as an output, a Sx6 matrix with the following 6 columns (for the S diffirent partitions)): (No of Model Parameters, NLL, AIC, AIC_c, AIC_d, AIC_w) which compares all upgma-generarated partitions.

Usage

upgma_model_selection(Data, Time, Factor, Tags, Colonization = 1,
  Extinction = 1, Detectability_Value = 0.5, Phi_Time_0_Value = 0.5,
  Tol = 1e-08, MIT = 100, C_MAX = 10, C_min = 0, E_MAX = 10,
  E_min = 0, D_MAX = 0.99, D_min = 0, P_MAX = 0.99, P_min = 0.01,
  I_0 = 0, I_1 = 1, I_2 = 2, I_3 = 3, z = 2, Verbose = 0,
  MV_FLAG = 0.1, PerfectDetectability = TRUE)

Arguments

Data

data frame containing presence data per time (in cols) and sites (in rows)

Time

an array of length n containing sampling times

Factor

column number containing the 'data frame' factor used to split total data into level-based groups

Details

The output matrix contains a row for the S different binary partitions of the set of S groups. Searches are conducted using Nelder-Mead simplex method in a bounded parameter space which means that in case a neg loglikelihood (NLL) evaluation is called out from these boundaries, the returned value for this NLL evaluation is artifically given as the maximum number the machine can hold. The input is a data frame containing presence data per time (in cols) and sites (in rows). Different factors (for instance, OTU, location, etc) can slide the initial data frame in their different levels, accordingly. Each initial group (usually, species, OUTs, factors, ...) is named by a short-length-character label (ideally, 3 or 4 characters). The length of Tags array should match the number of levels in which the given factor is subdivided. All labels should have the same character length to fulfill memmory alignment requriement of the shared object called by .C(...) function. I_0, I_1, I_2, and I_3 are model parameter keys. They are used to define a 4D-vector (Index). The model parameter keys correspond to the colonization (0), extinction (1), detectability (2), and Phi_0 (3) model parameters in case detectability is imperfect or, alternatively, only colonization (0) and extinction (1) in case detectability is perfect. For instance, if (I_0, I_1) is (1, 0), searches will take place within the paremeter space defined by extinction, as the first axis, and colonization, as the second.

Examples

Run this code

# NOT RUN {
Data <- lakshadweepPLUS[[1]]
Guild_Tag = c("Alg", "Cor", "Mac", "Mic", "Omn", "Pis", "Zoo")
Time <- as.vector(c(2000, 2000, 2001, 2001, 2001, 2001, 2002, 2002, 2002, 
2002, 2003, 2003, 2003, 2003, 2010, 2010, 2011, 2011, 2011, 2011, 2012, 
2012, 2012, 2012, 2013, 2013, 2013, 2013))
R <- upgma_model_selection(Data, Time, Factor = 3, Tags = Guild_Tag, 
PerfectDetectability = FALSE, z = 4)
Guild_Tag = c("Agt", "Kad", "Kvt")
R <- upgma_model_selection(Data, Time, Factor = 2, Tags = Guild_Tag, 
PerfectDetectability = FALSE, z = 4)
# }

Run the code above in your browser using DataLab