quick_peak: Fast peak finder in GWAS data

Description

Simple but fast function for finding peaks in genome-wide association study (GWAS) data based on setting a minimum distance between peaks.

Usage

quick_peak(
  data,
  npeaks = NA,
  p_cutoff = 5e-08,
  span = 1e+06,
  min_points = 2,
  chrom = NULL,
  pos = NULL,
  p = NULL
)

Value

Vector of row indices

Arguments

data: GWAS dataset (data.frame or data.table)
npeaks: Number of peaks to find. If set to NA, algorithm finds all distinct peaks separated from one another by region size specified by span.
p_cutoff: Specifies cut-off for p-value significance above which p-values are ignored.
span: Minimum genomic distance between peaks (default 1 Mb)
min_points: Minimum number of p-value significant points which must lie within the span of a peak. This removes peaks with single or only a few low p-value SNPs. To disable set min_points to 1 or less.
chrom: Determines which column in data contains chromosome information. If NULL tries to autodetect the column.
pos: Determines which column in data contains position information. If NULL tries to autodetect the column.
p: Determines which column in data contains SNP p-values. If NULL tries to autodetect the column.

Details

This function is designed for speed. SNP p-values are filtered to only those which are significant as specified by p_cutoff. Each peak is identified as the SNP with the lowest p-value and then SNPs in proximity to each peak within the distance specified by span are removed. Regions such as the HLA whose peaks may well be broader than span may produce multiple entries.