Learn R Programming

rehh (version 3.2.3)

calc_candidate_regions: Determine candidate regions of selection

Description

Determine candidate regions of selection.

Usage

calc_candidate_regions(
  scan,
  threshold = NA,
  negativeThreshold = NA,
  pval = FALSE,
  ignore_sign = FALSE,
  window_size = 1e+06,
  overlap = 0,
  right = TRUE,
  min_n_mrk = 1,
  min_n_extr_mrk = 1,
  min_perc_extr_mrk = 0,
  join_neighbors = TRUE,
  keepNA = FALSE
)

Value

A data frame with chromosomal regions, i.e. windows that fulfill the necessary conditions to qualify as candidate regions under selection. For each region the overall number of markers, their mean and maximum, the number of markers with extreme values, their percentage of all markers and their average are reported. In case only a positive threshold is specified, only positive scores are taken into account for the calculation of mean and maximum values. Vice versa for only a negative threshold being specified. In case both thresholds are specified, the absolute scores are used for mean and max.

Arguments

scan

a data frame containing scores (output of ihh2ihs, ines2rsb or ies2xpehh).

threshold

a positive numeric value. Scores which are higher are considered extreme.

negativeThreshold

a negative numeric value. Scores which are below are considered extreme.

pval

logical. If TRUE use the (negative log-) p-value instead of the score.

ignore_sign

logical. If TRUE, take absolute values of score.

window_size

size of sliding windows. If set to 1, no windows are constructed and only the individual extremal markers are reported.

overlap

size of window overlap (default 0, i.e. no overlap). Note that if you use this option together with join_neigbhors=TRUE, candidate regions might get bigger since the markers with extreme scores re-appear in several windows

right

logical, indicating if the windows should be closed on the right (and open on the left) or vice versa.

min_n_mrk

minimum number of markers per window.

min_n_extr_mrk

minimum number of markers with extreme value in a window.

min_perc_extr_mrk

minimum percentage of extremal markers among all markers.

join_neighbors

logical. If TRUE (default), merge neighboring windows with extreme values to a bigger interval.

keepNA

keep markers with a value of NA, i.e. for which no score could be calculated (e.g. due to a too small minor allele frequency). This option will affect the calculated number of markers in a window.

Details

There is no generally agreed method how to determine genomic regions which might have been under recent selection. Since selection tends to yield clusters of markers with outlier values, a common approach is to search for regions with an elevated number or fraction of outlier or extremal markers. This function allows to set three conditions a window must fulfill in order to classify as candidate region:

  • min_n_mrk a minimum number of (any) markers.

  • min_n_extr_mrk a minimum number of markers with outlier / extreme value.

  • min_perc_extr_mrk a minimum percentage of extremal markers among all markers.

"Extreme" markers are defined by having a score above the specified threshold.

See Also

calc_region_stats

Examples

Run this code
#toy example of an ihs scan
scan <- data.frame(CHR = "1", POSITION = c(2, 3, 6, 7, 8) * 10000, IHS = c(-4, 0.5, 1, 6, NA))
scan
#candidate regions with default window size
calc_candidate_regions(scan, threshold = 2)
#with smaller window size
calc_candidate_regions(scan, threshold = 2, window_size = 20000)
#add negative threshold
calc_candidate_regions(scan, threshold = 2, negativeThreshold = -2, window_size = 20000)
#ignoring sign yields the same
calc_candidate_regions(scan, threshold = 2, ignore_sign = TRUE, window_size = 20000)
#'#use overlapping windows
calc_candidate_regions(scan, threshold = 2, ignore_sign = TRUE, window_size = 20000,
overlap = 10000)
#do not join windows with extreme values
calc_candidate_regions(scan, threshold = 2, ignore_sign = TRUE, window_size = 20000,
overlap = 10000, join_neighbors = FALSE)
#include windows without extreme values by 'min_n_extr_mrk = 0'
calc_candidate_regions(scan, threshold = 2, ignore_sign = TRUE, window_size = 20000,
overlap = 10000, join_neighbors = FALSE, min_n_extr_mrk = 0)
#include markers without score by 'keepNA = TRUE'
calc_candidate_regions(scan, threshold = 2, ignore_sign = TRUE, window_size = 20000,
overlap = 10000, join_neighbors = FALSE, min_n_extr_mrk = 0, keepNA = TRUE)
#include windows without markers by 'min_n_mrk = 0'
calc_candidate_regions(scan, threshold = 2, ignore_sign = TRUE, window_size = 20000,
overlap = 10000, join_neighbors = FALSE, min_n_mrk = 0, min_n_extr_mrk = 0, keepNA = TRUE)

Run the code above in your browser using DataLab