This function creates the most important graphs of the QC: the QQ plots and the Manhattan plot. It also calculates lambda, and determines the effect of the filters.
QC_plots(dataset,
plot_QQ = TRUE, plot_Man = TRUE,
FRQfilter_values = NULL, FRQfilter_NA = filter_NA,
HWEfilter_values = NULL, HWEfilter_NA = filter_NA,
calfilter_values = NULL, calfilter_NA = filter_NA,
impfilter_values = NULL, impfilter_NA = filter_NA,
impfilter_min = min(dataset$IMP_QUALITY, na.rm = TRUE),
manfilter_FRQ = NULL, manfilter_HWE = NULL,
manfilter_cal = NULL, manfilter_imp = NULL,
filter_NA = TRUE,
plot_cutoff_p = 0.05, plot_names = FALSE,
QQ_colors = c("red", "blue", "orange", "green3", "yellow"),
plot_QQ_bands = FALSE,
save_name = "dataset", save_dir = getwd(),
header_translations, use_log = FALSE,
check_impstatus = FALSE, ignore_impstatus = FALSE,
T_strings = c("1", "TRUE", "yes", "YES", "y", "Y"),
F_strings = c("0", "FALSE", "no", "NO", "n", "N"),
NA_strings = c(NA, "NA", ".", "-"))
vector of p-values or a data frame containing the p-value column and (depending on the settings) columns for chromosome number, position, the quality parameters, sample size and imputation status.
logical; should QQ and Manhattan plots be saved?
numeric vectors; the threshold values for the QQ plot filters. The filters are for allele-frequency, HWE p-values, callrate and imputation-quality parameters, respectively. A maximum of five values can be specified per parameter.
Set to NULL
to disable the QQ filter for that
parameter.
To filter missing values only, set the filter value
to NA
and the corresponding filter_NA
argument to TRUE
.
The allele-frequency filter is two-sided: for a
filter-value of x
, it will exclude entries
with freq < x
and freq > 1 - x
.
Values >= 1 will be divided by the SNP's sample size.
This allows sample-size dependent filtering of allele
frequencies. Note that this uses the sample size reported
in the sample-size column for that specific SNP. SNPs
without sample size will be excluded if the corresponding
filter_NA
argument is TRUE
and ignored if
it is FALSE
.
logical; should the filters exclude (TRUE
) or
ignore (FALSE
) missing values? filter_NA
is the default setting, the others allow
variable-specific settings.
numeric; the lowest possible value for imputation-quality. This argument is currently redundant, as it is calculated automatically.
single, numeric values; the filter-settings for allele-frequency,
HWE p-values, callrate and imputation quality respectively,
for the Manhattan plot. The arguments are passed to
HQ_filter
.
To filter missing values only, set to NA
and the
corresponding filter_NA
argument to TRUE
. To
disable filtering entirely, set to NULL
.
numeric; the threshold of p-values to be
shown in the QQ & Manhattan plots. Higher (less
significant) p-values are excluded from the plot. The default
setting is 0.05
, which excludes 95% of data-points.
It's not recommended to increase the value above 0.05
,
as this may dramatically increase running time and memory usage.
argument currently redundant.
vector of R color-values; the color of the
QQ filter-plots. The unfiltered data is black by default.
This argument sets the colors of the least (first value) to
most (last value) stringent filters. (For this setting,
filter values >= 1
(i.e. sample-size based filtering)
are considered less stringent than values < 1
.)
logical; should probability bands be added to the QQ plot?
character string; the filename, without extension, for the graphs.
character string; the directory where the graphs are saved. Note that R uses forward slash (/) where Windows uses the backslash (\).
translation table for column names.
See translate_header
for more information. If
the argument is left empty, dataset
is assumed to use
the standard column-names of QC_GWAS
.
argument used by QC_GWAS
;
redundant when QC_plots
is used separately.
logical; should the imputation-status
column be passed to convert_impstatus
?
logical; if FALSE
, HWE p-value
and callrate filters are applied only to genotyped SNPs, and
imputation quality filters only to imputed SNPs. If
TRUE
, the filters are applied to all SNPs regardless
of the imputation status.
arguments passed
to convert_impstatus
.
An object of class 'list' with the following components:
vector of the lambda values of all SNPs, genotyped SNPs and imputed SNPs, respectively.
logical value indicating whether imputation status was used when applying the filters.
character vectors naming the specified QQ filters.
numeric vectors; the number of SNPs removed by the specified
filters. Note that the filters are sorted before being
applied, so the order may not match that of the input.
Check the filter_names
output to see the order
that was used inside QC_plots
.
numeric; the number of SNPs removed by the Manhattan filter. This does not include those SNPs removed because they lacked p or chromosome/position-values, or failed the p-cutoff threshold.
The function QC_plots
grew out of phase 4 of
QC_GWAS
. It carries out three functions, hence
the vague name: it calculates lambda, it applies the
QQ filters, and it creates the QQ and Manhattan plots (a
separate function is available for creating
regional-association plots: see below). The function schematic
is as follows:
Preparing the dataset: this step involves translating
the dataset header to the standard column-names (by
identify_column
) and converting imputation
status (by convert_impstatus
). Both steps
are optional, and are disabled by default. If the function
cannot identify the imputation status column, it will
generate a warning message and disable the
imputation-status dependent filters.
Calculating the QC stats: here it generates the filters an calculates how many SNPs are removed. Lambda is also calculated at this point.
Creating a QQ graph of every variable for which filters have been specified. Every graph contains an unfiltered plot, plus plots for every effective filter. ("Effective" means "excludes more SNPs than the previous, less-stringent filter".)
Creating the Manhattan plot. The default Manhattan plot covers chromosomes 1 to 23 (X). Fields for XY, Y and M are added when such SNPs are present.
plot_regional
for creating a regional association
plot.
check_P
for comparing the reported p-values to
the p expected from the effect size and standard error.
QQ_plot
for generating simpler QQ plots.
# NOT RUN {
# }
# NOT RUN {
data("gwa_sample")
QC_plots(dataset = gwa_sample,
plot_QQ = TRUE, plot_QQ_bands = TRUE, plot_Man = TRUE,
FRQfilter_values = c(NA, 0.01, 0.05, 3),
calfilter_values = c(NA, 0.95, 0.99),
manfilter_FRQ = 0.05, manfilter_cal = 0.95,
filter_NA = TRUE, save_name = "sample_plots")
# }
Run the code above in your browser using DataLab