QC_histogram
creates two histograms: one showing the
observed data distribution of a numeric variable, and one
showing the expected distribution.
It includes the option to filter the data with the
high-quality filter. histogram_series
generates a
series of such histograms for multiple filter settings.
QC_histogram(dataset, data_col = 1,
save_name = "dataset", save_dir = getwd(),
export_outliers = FALSE,
filter_FRQ = NULL, filter_cal = NULL,
filter_HWE = NULL, filter_imp = NULL,
filter_NA = TRUE,
filter_NA_FRQ = filter_NA, filter_NA_cal = filter_NA,
filter_NA_HWE = filter_NA, filter_NA_imp = filter_NA,
breaks = "Sturges",
graph_name = colnames(dataset)[data_col],
header_translations, check_impstatus = FALSE,
ignore_impstatus = FALSE,
T_strings = c("1", "TRUE", "yes", "YES", "y", "Y"),
F_strings = c("0", "FALSE", "no", "NO", "n", "N"),
NA_strings = c(NA, "NA", ".", "-"), ...)
histogram_series(dataset, data_col = 1,
save_name = paste0("dataset_F", 1:nrow(plot_table)),
save_dir = getwd(), export_outliers = FALSE,
filter_FRQ = NULL, filter_cal = NULL,
filter_HWE = NULL, filter_imp = NULL,
filter_NA = TRUE,
filter_NA_FRQ = filter_NA, filter_NA_cal = filter_NA,
filter_NA_HWE = filter_NA, filter_NA_imp = filter_NA,
breaks = "Sturges",
header_translations, ignore_impstatus = FALSE,
check_impstatus = FALSE,
T_strings = c("1", "TRUE", "yes", "YES", "y", "Y"),
F_strings = c("0", "FALSE", "no", "NO", "n", "N"),
NA_strings = c(NA, "NA", ".", "-"),
...)
vector or table containing the variable of interest.
name or number of the column of dataset
containing the variable of interest.
for QC_histogram
, a character string;
for histogram_series
, a vector of character strings;
specifying the filename(s) of the graph, without
extension.
character string; the directory where the output files are saved. Note that R uses forward slash (/) where Windows uses the backslash (\).
logical or numeric value; should outlying entries (which are excluded from the plot) be exported to an output file? If numeric, the number specifies the max. number of entries that is exported.
Filter threshold-values for allele-frequency, callrate,
HWE p-value and imputation quality, respectively. Passed to
HQ_filter
. QC_histogram
takes only
single values, but histogram_series
accepts vectors
as well (see 'details').
logical; if TRUE
, then missing filter
variables will be excluded; if FALSE
, they will be
ignored. QC_histogram
takes only single values, but
histogram_series
accepts vectors as well (see
'Details'). filter_NA
is the default setting for all
variables; variable-specific settings can be specified with
the following arguments.
logical; variable-specific settings for filter_NA
.
These arguments are passed to HQ_filter
.
argument passed to hist
; determines
the cell-borders in the histogram.
character string; used in the title of the plot.
translation table for column names.
See translate_header
for more information. If
the argument is left empty, dataset
is assumed to use
the standard column-names used by QC_GWAS
.
logical; should
convert_impstatus
be called to convert the
imputation-status column into standard values?
logical; if FALSE
, HWE p-value
and callrate filters are applied only to genotyped SNPs, and
imputation quality filters only to imputed SNPs. If
TRUE
, the filters are applied to all SNPs regardless
of the imputation status.
arguments passed to
convert_impstatus
.
in histogram_series
: arguments passed to
QC_histogram
; in QC_histogram
, arguments passed
to hist
.
Both functions return an invisible value NULL
.
histogram_series
accepts multiple filter-values, and
passes these one by one to QC_histogram
to generate a
series of histograms. For example, specifying:
filter_FRQ = c(0.05, 0.10), filter_cal = c(0.90, 0.95)
will generate two histograms. The first excludes SNPs with
allele frequency < 0.05 or callrate < 0.90; the second allele
frequency < 0.10 or callrate < 0.95. The same principle
applies to the NA_filter
settings. If the vectors
submitted to the filter arguments are of unequal length, the
shorter vector will be recycled until it equals the length of
the longer (if possible). To filter missing values only, set
the filter to NA
and the corresponding NA-filter
argument to TRUE
. Setting the filter argument to
NULL
will disable the filter entirely, regardless of
the NA filter setting.
For creating QQ plots: QQ_plot
.
# NOT RUN {
data("gwa_sample")
QC_histogram(dataset = gwa_sample, data_col = "EFFECT",
save_name = "sample_histogram",
filter_FRQ = 0.01, filter_cal = 0.95,
filter_NA = FALSE,
graph_name = "Effect size histogram")
histogram_series(dataset = gwa_sample, data_col = "EFFECT",
save_name = "sample_histogram",
filter_FRQ = c(NA, 0.01, 0.01),
filter_cal = c(NA, 0.95, 0.95),
filter_NA = c(FALSE, FALSE, TRUE))
# }
Run the code above in your browser using DataLab