Learn R Programming

NIPTeR (version 1.0.2)

perform_regression: Regression based Z score

Description

Make multiple models using linear regression and calculate Z-score

Usage

perform_regression(nipt_sample, nipt_control_group, chromo_focus, n_models = 4, n_predictors = 4, exclude_chromosomes = NULL, include_chromosomes = NULL, use_test_train_set = T, size_of_train_set = 0.6, overdispersion_rate = 1.15, force_practical_cv = F)

Arguments

nipt_sample
The NIPTSample object that is the focus of the analysis
nipt_control_group
The NIPTControlGroup object used in the analysis
chromo_focus
The chromosome of interest. Most commonly chromosome 13, 18 or 21. However, every autosomal chromosome can be predicted
n_models
Integer Number of linear models to be made. Default setting is 4 models
n_predictors
Integer The number of predictors each model contains. Default is 4
exclude_chromosomes
integer. Exclude which autosomal chromosomes as potential predictors? Default potential trisomic chromosomes 13, 18 and 21 are exluded.
include_chromosomes
integer. Include potential trisomic chromosomes? Options are: chromosomes 13, 18 and 21
use_test_train_set
Use a test and train set to build the models? Default is TRUE
size_of_train_set
The size of the train set expressed in a decimal. Default is 0.6 (60 of the control samples)
overdispersion_rate
The standard error of the mean is multiplied by this factor
force_practical_cv
Boolean, Ignore the theoretical CV and always use the practical CV?

Value

RegressionResult object

Details

The regression based Z-score builds n models with m predictors using stepwise regression with forward selection. The models are used to predict the chromosomal fraction of interest, for the sample and for the control group. The observed fractions are then divided by the expected fraction, and Z-scores are calculated over these ratios. The Z-score is calculated by subtracting one from the ratio of the sample and dividing this result by the coefficient of variation. The coefficient of variation (CV) can either be the Practical or Theoretical CV. The Theoretical CV is the standard error multiplied by the overdispersion. Theoretically, the CV cannot be lower than the standard error of the mean. If it is case the CV is lower than Theoretical CV, then the Theoretical CV is used.

The output of this function is an object of type RegressionResult, a named list containing:

  • prediction_statistics A dataframe with 7 rows and a column for every model. The rows are:
    • Z_score_sample The regression based Z score for the model
    • CV The coefficient of varation for the model
    • cv_types The CV type used to calculate the regression based Z score for the model. Either Practical_CV or Theoretical_CV
    • P_value_shapiro The P value of the Shaipro-Wilk test for normality of the control group regression based Z scores for the model
    • Predictor_chromosomes The predictor chromosomes used in the model
    • Mean_test_set The mean of the test set. Note that for calculating the regression based Z scores the mean is replaced by one. The mean, however, can be seen as a quality metric for the model
    • CV_train_set The CV of the train set. The difference between this CV and the CV of the test can be used as a measure to quantify overfit

  • control_group_Zscores A matrix containing the regression based Z-scores for the control sample
  • focus_chromosome he chromosome of interest. Most commonly chromosome 13, 18 or 21. However, every autosomal chromosome can be predicted
  • correction_status The correction status of the control group autosomes
  • control_group_sample_names The sample names of the test set group
  • models List of the summary.lm output for every model
  • potential_predictors The total pool of chromosomes where the predictors are selected from
  • all_control_group_Z_scores Z-scores for every sample using theoretical and practical VCs
  • additional_statistics Statistics for both the practical and theoretical CVs for every prediction set
  • Examples

    Run this code
    ## Not run: 
    # regression_score_21 <- perform_regression(nipt_sample = sample_of_interest, 
    #                        nipt_control_group = control_group, chromo_focus = 21)
    # ## End(Not run)
    
    

    Run the code above in your browser using DataLab