QQ_plot
generates a simple QQ plot of the expected and
reported p-value distribution. It includes the option to
filter the data with the high-quality filter. QQ_series
generates a series of such QQ plots for multiple filter
settings.
QQ_plot(dataset, save_name = "dataset", save_dir = getwd(),
filter_FRQ = NULL, filter_cal = NULL,
filter_HWE = NULL, filter_imp = NULL,
filter_NA = TRUE,
filter_NA_FRQ = filter_NA, filter_NA_cal = filter_NA,
filter_NA_HWE = filter_NA, filter_NA_imp = filter_NA,
p_cutoff = 0.05, plot_QQ_bands = FALSE,
header_translations,
check_impstatus = FALSE, ignore_impstatus = FALSE,
T_strings = c("1", "TRUE", "yes", "YES", "y", "Y"),
F_strings = c("0", "FALSE", "no", "NO", "n", "N"),
NA_strings = c(NA, "NA", ".", "-"), ...)
QQ_series(dataset, save_name = "dataset", save_dir = getwd(),
filter_FRQ = NULL, filter_cal = NULL,
filter_HWE = NULL, filter_imp = NULL,
filter_NA = TRUE,
filter_NA_FRQ = filter_NA, filter_NA_cal = filter_NA,
filter_NA_HWE = filter_NA, filter_NA_imp = filter_NA,
p_cutoff = 0.05, plot_QQ_bands = FALSE,
header_translations,
check_impstatus = FALSE, ignore_impstatus = FALSE,
T_strings = c("1", "TRUE", "yes", "YES", "y", "Y"),
F_strings = c("0", "FALSE", "no", "NO", "n", "N"),
NA_strings = c(NA, "NA", ".", "-"), ...)
a data frame containing the p-value column and (depending on the settings) columns for chromosome number, position, the quality parameters, sample size and imputation status.
for QQ_plot
, a character string;
for QQ_series
, a vector of character strings;
specifying the filename(s) of the graph, without
extension.
character string; the directory where the output files are saved. Note that R uses forward slash (/) where Windows uses the backslash (\).
Filter threshold-values for allele-frequency, callrate,
HWE p-value and imputation quality, respectively. Passed to
HQ_filter
. QQ_plot
takes only
single values, but QQ_series
accepts vectors
as well (see 'details').
logical; if TRUE
, then missing filter
variables will be excluded; if FALSE
, they will be
ignored. QQ_plot
takes only single values, but
QQ_series
accepts vectors as well (see
'Details'). filter_NA
is the default setting for all
variables; variable-specific settings can be specified with
the following arguments.
logical; variable-specific settings for filter_NA
.
These arguments are passed to HQ_filter
.
numeric; the threshold of p-values to be
shown in the QQ plot(s). Higher (less significant) p-values
are excluded from the plot. The default setting is 0.05
,
which excludes 95% of data-points. It's not recommended to
increase the value above 0.05
, as this may
dramatically increase running time and memory usage.
logical; should probability bands be added to the QQ plot?
translation table for column names.
See translate_header
for more information. If
the argument is left empty, dataset
is assumed to use
the standard column-names of QC_GWAS
.
logical; should the imputation-status
column be passed to convert_impstatus
?
logical; if FALSE
, HWE p-value
and callrate filters are applied only to genotyped SNPs, and
imputation quality filters only to imputed SNPs. If
TRUE
, the filters are applied to all SNPs regardless
of the imputation status.
arguments passed
to convert_impstatus
.
arguments passed to plot
.
Both functions return an invisible value NULL
.
QQ_series
accepts multiple filter-values, and
passes these one by one to QQ_plot
to generate a
series of plots. For example, specifying:
filter_FRQ = c(0.05, 0.10), filter_cal = c(0.90, 0.95)
will generate two plots. The first excludes SNPs with
allele frequency < 0.05 or callrate < 0.90; the second allele
frequency < 0.10 or callrate < 0.95. The same principle
applies to the NA_filter
settings. If the vectors
submitted to the filter arguments are of unequal length, the
shorter vector will be recycled until it equals the length of
the longer (if possible). To filter missing values only, set
the filter to NA
and the corresponding NA-filter
argument to TRUE
. Setting the filter argument to
NULL
will disable the filter entirely, regardless of
the NA-filter setting.
QC_plots
for generating more complex QQ plots
as well as Manhattan plots.
QC_histogram
for creating histograms.
check_P
for comparing the reported p-values to
the p expected from the effect size and standard error.
# NOT RUN {
# }
# NOT RUN {
data("gwa_sample")
QQ_plot(dataset = gwa_sample,
save_name = "sample_QQ",
filter_FRQ = 0.01, filter_cal = 0.95,
filter_NA = FALSE)
QQ_series(dataset = gwa_sample,
save_name = "sample_QQ",
filter_FRQ = c(NA, 0.01, 0.01),
filter_cal = c(NA, 0.95, 0.95),
filter_NA = c(FALSE, FALSE, TRUE))
# }
Run the code above in your browser using DataLab