prepare_ncv: Prepare NCV calculation

Description

Determine the best NCV chromosomes, calculate NCV scores and asses normal distribution control group using Shapiro-Wilk test

Usage

prepare_ncv(nipt_control_group, chr_focus, max_elements, exclude_chromosomes = NULL, include_chromosomes = NULL, use_test_train_set = T, size_of_train_set = 0.6)

Arguments

nipt_control_group

The NIPTControlGroup object used in the analysis

chr_focus

Integer.The chromosome of interest. Most commonly chromosome 13, 18 or 21. However, every autosomal chromosome can be predicted

max_elements

Integer, The maximum number of denominator chromosomes.

exclude_chromosomes

Integer. Exclude which autosomal chromosomes as potential predictors? Default potential trisomic chromosomes 13, 18 and 21 are exluded.

include_chromosomes

Integer. Which potential trisomic chromosomes (13,18 and 21) to include?

use_test_train_set

Boolean. Use a test and train set?

size_of_train_set

Double The size of the train set expressed in a decimal. Default is 0.6 (60% of the control group samples)

Value

ncv template object

Details

chromosomes to calculate the chromosomal fractions. The 'best' subset is the set which yields the lowest coefficient of variation for the chromosomal fractions of the chromosome of interest in the control group. Because a brute force approach is used to determine the best subset, which can be computationally intensive,this method is divided into two functions, prepare_ncv and calculate_ncv. prepare_ncv returns a template object (NCVTemplate) for a given chromosome of interest and the control group used. This template can be used for any number of analyses. If the control group or chromosome of interest changes, a new template must be made.

The ncv_template object is a list containing:

Character denominators The set of denominator chromosomes
Character focus_chromosomeThe chromosome of interest used for this `NCVTemplate` object
Character nipt_sample_names The sample names of the test set samples
Character correction_status The correction status(es) of the control group samples
Data.frame control_group_Z_scores The NCV scores for the test set samples
Character potential_denominators The total pool of denominators the best denominators are selected from
Numeric control_group_statistics Named num of length 3, the first field being the mean (name mean), the second field is the standard deviation (name SD) and the third field is the P value of the Shapiro-Wilk test (name Shapiro_P_value)

If a Test and Train set is used the ncv_template object also includes:

Character sample_names_train_set The sample name where the model is trained on
Numeric train_set_statistics Mean, SD and Shapiro-Wilk test P value of the Z scores of the train set
Data.frame train_set_Zscores The Z scores of the train set

References

Sehnert et al.

Examples

Run this code

## Not run: 
# ##Create NCVTemplates for chromosome 13 with max 9 denominators and default settings, so:
# ##All autosomals chromosomes are potential predictors, 
# ##except the potential trisomic chromosomes 13, 18 and 21
# new_ncv_template_13 <- prepare_ncv(nipt_control_group = control_group, 
#                                    chr_focus = 13, max_elements = 9)
# ## End(Not run)

Run the code above in your browser using DataLab