perform_regression: Regression based Z score

Description

Make multiple models using linear regression and calculate Z-score

Usage

perform_regression(nipt_sample, nipt_control_group, chromo_focus, n_models = 4, n_predictors = 4, exclude_chromosomes = NULL, include_chromosomes = NULL, use_test_train_set = T, size_of_train_set = 0.6, overdispersion_rate = 1.15, force_practical_cv = F)

Arguments

nipt_sample

The NIPTSample object that is the focus of the analysis

nipt_control_group

The NIPTControlGroup object used in the analysis

chromo_focus

The chromosome of interest. Most commonly chromosome 13, 18 or 21. However, every autosomal chromosome can be predicted

n_models

Integer Number of linear models to be made. Default setting is 4 models

n_predictors

Integer The number of predictors each model contains. Default is 4

exclude_chromosomes

integer. Exclude which autosomal chromosomes as potential predictors? Default potential trisomic chromosomes 13, 18 and 21 are exluded.

include_chromosomes

integer. Include potential trisomic chromosomes? Options are: chromosomes 13, 18 and 21

use_test_train_set

Use a test and train set to build the models? Default is TRUE

size_of_train_set

The size of the train set expressed in a decimal. Default is 0.6 (60 of the control samples)

overdispersion_rate

The standard error of the mean is multiplied by this factor

force_practical_cv

Boolean, Ignore the theoretical CV and always use the practical CV?

Value

RegressionResult object

Details

The regression based Z-score builds n models with m predictors using stepwise regression with forward selection. The models are used to predict the chromosomal fraction of interest, for the sample and for the control group. The observed fractions are then divided by the expected fraction, and Z-scores are calculated over these ratios. The Z-score is calculated by subtracting one from the ratio of the sample and dividing this result by the coefficient of variation. The coefficient of variation (CV) can either be the Practical or Theoretical CV. The Theoretical CV is the standard error multiplied by the overdispersion. Theoretically, the CV cannot be lower than the standard error of the mean. If it is case the CV is lower than Theoretical CV, then the Theoretical CV is used.

The output of this function is an object of type RegressionResult, a named list containing:

prediction_statistics A dataframe with 7 rows and a column for every model. The rows are:
- Z_score_sample The regression based Z score for the model
- CV The coefficient of varation for the model
- cv_types The CV type used to calculate the regression based Z score for the model. Either Practical_CV or Theoretical_CV
- P_value_shapiro The P value of the Shaipro-Wilk test for normality of the control group regression based Z scores for the model
- Predictor_chromosomes The predictor chromosomes used in the model
- Mean_test_set The mean of the test set. Note that for calculating the regression based Z scores the mean is replaced by one. The mean, however, can be seen as a quality metric for the model
- CV_train_set The CV of the train set. The difference between this CV and the CV of the test can be used as a measure to quantify overfit

control_group_Zscores A matrix containing the regression based Z-scores for the control sample

focus_chromosome he chromosome of interest. Most commonly chromosome 13, 18 or 21. However, every autosomal chromosome can be predicted

correction_status The correction status of the control group autosomes

control_group_sample_names The sample names of the test set group

models List of the summary.lm output for every model

potential_predictors The total pool of chromosomes where the predictors are selected from

all_control_group_Z_scores Z-scores for every sample using theoretical and practical VCs

additional_statistics Statistics for both the practical and theoretical CVs for every prediction set

Examples

Run this code

## Not run: 
# regression_score_21 <- perform_regression(nipt_sample = sample_of_interest, 
#                        nipt_control_group = control_group, chromo_focus = 21)
# ## End(Not run)

Run the code above in your browser using DataLab