prep_data: Prepare Data for Evaluation

Description

Formats and arranges the initial data so that it can be readily used by the other functions in the package. The function first gets the species names and the number of samples for each species from the input data frame. Then, it permutes the sampling efforts and calculates the pseudo-F statistic and the mean squares for each permutation. Finally, it returns a data frame with the permutations, pseudo-F statistic, and mean squares.

Usage

prep_data(
  data,
  type = "counts",
  Sest.method = "average",
  cases = 5,
  N = 100,
  M = NULL,
  n,
  m = NULL,
  k = 50,
  transformation = "none",
  method = "bray",
  dummy = FALSE,
  useParallel = TRUE,
  model = "single.factor",
  jitter.base = 0.5
)

Value

prep_data() returns an object of class "ecocbo_data".

An object of class "ecocbo_data" is a list containing:

$Results, a data frame that lists the estimates of pseudoF for simH0 and simHa, useful for statistical power analysis. It also includes mean squares for variance component estimation.
$model, a label for keeping track of the model that is being used in the analysis.
$a, an integer for the number of treatments recorded from the original data.

Arguments

data

Data frame where columns represent species names and rows correspond to samples.

For "single.factor" analysis: The first column should indicate the replicate to which the sample belongs.
For "nested.symmetric" analysis: The first column should indicate the treatment, and the second column should indicate the replicate.

type

Character. Nature of the data to be processed. It may be presence / absence ("P/A"), counts of individuals ("counts"), or coverage ("cover").

Sest.method

Character Method for estimating species richness using vegan::specpool(). Available methods are the incidence-based Chao ("chao"), first order jackknife ("jack1"), second order jackknife ("jack2") and Bootstrap ("boot"). By default, the average ("average") of the four estimates is used.

cases

Integer. Number of simulated datasets.

N

Integer. Total number of samples simulated per site.

M

Integer. Total number of replicates simulated per dataset. Not needed for single factor experiments.

n

Integer. Maximum number of samples to consider (must be <= N).

m

Integer. Number of replicates to consider. (must be <=M). Not needed for single factor experiments.

k

Integer. Number of resampling iterations. Defaults to 50.

transformation

Character. Transformation applied to reduce the weight of dominant species: "square root", "fourth root", "Log (X+1)", "P/A", "none".

method

Character. Dissimilarity metric used vegan::vegdist(). Common options include: "Gower", "Bray–Curtis", "Jaccard", etc.

dummy

Logical. If TRUE, adds a small constant to empty observations.

useParallel

Logical. If TRUE, enables parallel computation. Defaults to TRUE.

model

Character. Select the model to use. Options are "single.factor" and "nested.symmetric".

jitter.base

Numeric. Standard deviation multiplier used to add Gaussian jitter to fs and fw. Defaults to 0.5.

Author

Edlin Guerra-Castro (edlinguerra@gmail.com), Arturo Sanchez-Porras

Details

The input dataset should have:

One or two leading columns for treatment/replicate labels.
Subsequent columns representing species presence/absence, counts, or coverage.
"single.factor" requires a single column for replicates.
"nested.symmetric" requires two columns: treatment and replicate in that order.

References

Underwood, A. J. (1997). Experiments in ecology: their logical design and interpretation using analysis of variance. Cambridge university press.
Underwood, A. J., & Chapman, M. G. (2003). Power, precaution, Type II error and sampling design in assessment of environmental impacts. Journal of Experimental Marine Biology and Ecology, 296(1), 49-70.

Examples

Run this code

# \donttest{
simResults <- prep_data(data = epiDat, type = "counts", Sest.method = "average",
                        cases = 5, N = 100, M = 10,
                        n = 5, m = 5, k = 30,
                        transformation = "none", method = "bray",
                        dummy = FALSE, useParallel = FALSE,
                        model = "single.factor",
                        jitter.base = 0)
# }
simResults

Run the code above in your browser using DataLab