contactTest: Determine if Observed Contacts are More or Less Frequent than in a Random Distribution

Description

This function is used to determine if tracked individuals in an empirical dataset had more or fewer contacts with other tracked individuals/specified locations than would be expected at random. The function works by comparing an empirically-based contactDur.all or contactDur.area function output (emp.input) to the contactDur.all or contactDur.area output generated from randomized data (rand.input).

Usage

contactTest(emp.input, rand.input, dist.input, test = "chisq",
  numPermutations = 5000, alternative.hyp = "two.sided",
  importBlocks = FALSE, shuffle.type = 0)

Arguments

emp.input

List or data frame containing contactDur.all or contactDur.area output refering to the empirical data. Note that if emp.input is a list of data frames, contacts used in analyses will be determined by averaging contacts reported in each list entry.

rand.input

List or data frame containing contactDur.all or contactDur.area output refering to the randomized-path data. Note that if rand.input is a list of data frames, contacts used in analyses will be determined by averaging contacts reported in each list entry.

dist.input

List or data frame containing dist.all/distToArea function output refering to the empirical data. Note that if test == "chisq," a dist.input argument is required. If test == "mantel," however, dist.input can be set to NULL. This input is used to determine the number of durations that each pair of individuals (or individuals and fixed locations/polygons if dist2Area output is used) were observed during the same timestep (i.e., the maximum number of durations dyad members could potentially be in contact with one another).

test

Character string. Describes the statistical test used to evaluate differences. Currently only takes the values "chisq," or "mantel." Defaults to "chisq." More tests will be added in later versions.

numPermutations

Integer. Number of times to permute the data given test == "mantel."

alternative.hyp

Character string. Describes the nature of the alternative hypothesis being tested when test == "mantel." Takes the values "two.sided," "less," or "greater." Defaults to "two.sided."

importBlocks

Logical. If true, each block in emp.input will be analyzed separately. Defaults to FALSE. Note that the "block" column must exist in emp.input.

shuffle.type

Integer. Describes which shuffle.type (from the randomizePaths function) was used to randomize the rand.input data set(s). Takes the values "0," "1," or "2." For tests other than "chisq" this value is irrelevant.

Value

Output format is dependent on test value.

If test == "chisq," output will be a list of two data frames. The first data frame contains pairwise analyses of node degree and total edge weight (i.e., the sum of all observed contacts involving each individual). The second data frame contains results of pairwise analyses specific dyadic relationships (e.g., contacts between individuals 1 and 2). Each data frame contains the following columns:

id1

the id of the first individual involved in the contact.

id2

designation of what is being compared (e.g., totalDegree, totalContactDurations, individual 2, etc.). Content will change depending on which data frame is being observed.

method

Statistical test used to determine significance.

statistic

Test statistic associated with the specific method.

p.value

p.values associated with each comparison.

Degrees of freedom associated with the statistical test.

block

Denotes the relevant time block for each analysis. (if applicable)

warning

Denotes if any specific warning occurred during analysis.

empiricalContactDurations

Describes the number of observed events in emp.input.

randContactDurations.mean

Describes the average number of observed events in rand.input.

empiricalNoContactDurations

Describes the number of events that were not observed given the total number of potential events in emp.input.

randNoContactDurations.mean

Describes the average number of events that were not observed given the total number of potential events in rand.input.

difference

The value given by subtracting randContactDurations.mean from empiricalContactDurations.

If test == "mantel," output will be a single data frame with the following columns:

method

Statistical test used to determine significance.

z.val

z statistic associated with the specific method.

p.value

p.values associated with each comparison.

emp.mean

mean contacts in the emp.input overall or by block (if applicable).

rand.mean

mean contacts in the rand.input overall or by block (if applicable).

alternative.hyp

The nature of the alternative hypothesis being tested.

nperm

Number of permutations used to generate p value.

warning

Denotes if any specific warning occurred during analysis.

Details

Note: The current functionality is limited to comparisons using the X-squared "goodness of fit" test or Mantel test for evaluating correlations between two matrices. Please note that the output of this function changes based on what test is run. The assumptions and intricacies associated with running these tests here are described below in brief.

X-Squared (chisq.test): In this function, chisq.test is used to compare the distribution of observed inter-animal or animal-environment contacts in an empirical dataset, emp.input, to a distribution described in a NULL model, rand.input (i.e., expected contact counts). This test requires equidistant TSWs (temporal-sampling windows; see the tempAggregate function) in each movement path within dist.input. The dist.input (i.e., output from dist.all or dist2Area functions) is used here to determine how frequently each individual was observed in the empirical dataset/block of interest, allowing us to calculate the number of TSWs each individual was present but not involved in contacts. Note here that if X-squared expected values will be very small, approximations of p may not be correct (and in fact, all estimates will be poor). It may be best to weight these tests differently. To address this, We've added the "warning" column to the output which notifies users when the chisq.test function reported that results may be inaccurate.

Mantel test (abe::mantel.test): tests for similarity of the emp.input to rand.input. Please note that abe::mantel.test does not allow for missing values in matrices, so all NAs will be treated as 0. Output is a single data frame describing the test results.

This function was inspired by the methods described by Spiegel et al. 2016. Who determined individuals to be expressing social behavior when nodes had greater degree values than would be expected at random, with randomized contact networks derived from movement paths randomized according to their novel methodology (i.e., shuffle.type == 2). Here, however, by specifying a p-value threshold, users can also identify when more or fewer (demonstrated by the sign of values in the "difference" column) contacts with specific individuals than would be expected at random. Such relationships suggest social affinities or aversions, respectively, may exist between specific individuals.

Note:The default tested column (i.e., categorical data column from which data is drawn to be compared to randomized sets herein) is "id." This means that contacts involving each individual (defined by a unique "id") will be compared to randomized sets. Users may not use any data column for analysis other than "id." If users want to use another categorical data column in analyses rather than "id," we recommend re-processing data (starting from the dist.all/distToArea functions), while specifying this new data as an "id." For example, users may annotate an illness status column to the empirical input, wherein they describe if the tracked individual displayed gastrointestinal ("gastr"), respiratory ("respr"), both ("both"), illness symptoms, or were consistently healthy ("hel") over the course of the tracking period. Users could set this information as the "id," and carry it forward as such through the data-processing pipeline. Ultimately, they could determine if each of these disease states affected contact rates, relative to what would be expected at random.

Note: if importBlocks == TRUE, a "block" column MUST exist in emp.input. However, a "block" column need not exist in rand.contact. If no "block" column exists in rand.input, empirical values in all emp.input blocks will be compared to the overall average values in rand.input. Block columns will also be appended to function outputs.

References

Farine, D.R., 2017. A guide to null models for animal social network analysis. Methods in Ecology and Evolution 8:1309-1320. https://doi.org/10.1111/2041-210X.12772.

Spiegel, O., Leu, S.T., Sih, A., and C.M. Bull. 2016. Socially interacting or indifferent neighbors? Randomization of movement paths to tease apart social preference and spatial constraints. Methods in Ecology and Evolution 7:971-979. https://doi.org/10.1111/2041-210X.12553.

Mantel, N. 1967. The detection of disease clustering and a generalized regression approach. Cancer Research, 27:209<U+2013>220.

Examples

Run this code

# NOT RUN {
data(calves)

calves.dateTime<-datetime.append(calves, date = calves$date, 
   time = calves$time) 
   
calves.agg<-tempAggregate(calves.dateTime, id = calves.dateTime$calftag, 
   dateTime = calves.dateTime$dateTime, point.x = calves.dateTime$x, 
   point.y = calves.dateTime$y, secondAgg = 300, extrapolate.left = FALSE, 
   extrapolate.right = FALSE, resolutionLevel = "reduced", parallel = FALSE, 
   na.rm = TRUE, smooth.type = 1) 

calves.dist<-dist2All_df(x = calves.agg, parallel = FALSE, 
   dataType = "Point", lonlat = FALSE) 
   
calves.contact.block<-contactDur.all(x = calves.dist, dist.threshold=1, 
   sec.threshold=10, blocking = TRUE, blockUnit = "hours", blockLength = 1, 
   equidistant.time = FALSE, parallel = FALSE, reportParameters = TRUE) 

calves.agg.rand<-randomizePaths(x = calves.agg, id = "id", 
   dateTime = "dateTime", point.x = "x", point.y = "y", poly.xy = NULL, 
   parallel = FALSE, dataType = "Point", numVertices = 1, blocking = TRUE, 
   blockUnit = "mins", blockLength = 10, shuffle.type = 0, shuffleUnit = NA,
   indivPaths = TRUE, numRandomizations = 1) 

calves.dist.rand<-dist2All_df(x = calves.agg.rand, point.x = "x.rand", 
   point.y = "y.rand", parallel = FALSE, dataType = "Point", lonlat = FALSE) 
   
calves.contact.rand<-contactDur.all(x = calves.dist.rand, 
   dist.threshold=1, sec.threshold=10, blocking = TRUE, blockUnit = "hours",
   blockLength = 1, equidistant.time = FALSE, parallel = FALSE, 
   reportParameters = TRUE) 

nullTest<- contactTest(emp.input = calves.contact.block, 
   rand.input = calves.contact.rand, dist.input = calves.dist, 
   importBlocks = FALSE, shuffle.type = 0)
   
# }

Run the code above in your browser using DataLab