Learn R Programming

psfmi

The package provides functions to apply pooling, backward and forward selection of linear, logistic and Cox regression models across multiply imputed data sets using Rubin’s Rules (RR). The D1, D2, D3, D4 and the median p-values method can be used to pool the significance of categorical variables (multiparameter test). The model can contain continuous, dichotomous, categorical and restricted cubic spline predictors and interaction terms between all these type of variables. Variables can also be forced in the model during selection.

Validation of the prediction models can be performed with cross-validation or bootstrapping across multiply imputed data sets and pooled model performance measures as AUC value, Reclassification, R-square, Hosmer and Lemeshow test, scaled Brier score and calibration plots are generated. Also a function to externally validate logistic prediction models across multiple imputed data sets is available and a function to compare models in multiply imputed data.

Installation

You can install the released version of psfmi with:

install.packages("psfmi")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("mwheymans/psfmi")

Citation

Cite the package as:


Martijn W Heymans (2021). psfmi: Prediction Model Pooling, Selection and Performance Evaluation 
Across Multiply Imputed Datasets. R package version 1.1.0. https://mwheymans.github.io/psfmi/ 

Examples

This example shows you how to pool a logistic regression model across 5 multiply imputed datasets and that includes two restricted cubic spline variables and a categorical, continuous and dichotomous variable. The pooling method that is used is method D1.

library(psfmi)

pool_lr <- psfmi_lr(data=lbpmilr, formula = Chronic ~ rcs(Pain, 3) + 
                      JobDemands + rcs(Tampascale, 3) + factor(Satisfaction) + 
                      Smoking, nimp=5, impvar="Impnr", method="D1")

pool_lr$RR_model
#> $`Step 1 - no variables removed -`
#>                            term      estimate  std.error  statistic        df
#> 1                   (Intercept) -21.374498123 7.96491209 -2.6835824  65.71094
#> 2                    JobDemands  -0.007500147 0.05525835 -0.1357288  38.94021
#> 3                       Smoking   0.072207184 0.51097303  0.1413131  47.98415
#> 4         factor(Satisfaction)2  -0.506544055 0.56499941 -0.8965391 139.35335
#> 5         factor(Satisfaction)3  -2.580503376 0.77963853 -3.3098715 100.66273
#> 6              rcs(Pain, 3)Pain  -0.090675006 0.50510774 -0.1795162  26.92182
#> 7             rcs(Pain, 3)Pain'   1.183787048 0.55697046  2.1254036  94.79276
#> 8  rcs(Tampascale, 3)Tampascale   0.583697990 0.22707747  2.5704796  77.83368
#> 9 rcs(Tampascale, 3)Tampascale'  -0.602128298 0.29484065 -2.0422160  31.45559
#>       p.value           OR    lower.EXP  upper.EXP
#> 1 0.009206677 5.214029e-10 6.460344e-17 0.00420815
#> 2 0.892734942 9.925279e-01 8.875626e-01 1.10990663
#> 3 0.888214212 1.074878e+00 3.847422e-01 3.00295282
#> 4 0.371511077 6.025744e-01 1.971829e-01 1.84141687
#> 5 0.001296125 7.573587e-02 1.612863e-02 0.35563604
#> 6 0.858876729 9.133145e-01 3.239353e-01 2.57503035
#> 7 0.036152843 3.266722e+00 1.081155e+00 9.87043962
#> 8 0.012063538 1.792655e+00 1.140659e+00 2.81733025
#> 9 0.049589266 5.476448e-01 3.002599e-01 0.99885104

pool_lr$multiparm
#> $`Step 1 - no variables removed -`
#>                      p-values D1 F-statistic
#> JobDemands           0.892487763  0.01842230
#> Smoking              0.887968553  0.01996939
#> factor(Satisfaction) 0.002611518  6.04422205
#> rcs(Pain,3)          0.014630986  4.84409246
#> rcs(Tampascale,3)    0.130741167  2.24870192

This example shows you how to apply forward selection of the above model using a p-value of 0.05.

library(psfmi)

pool_lr <- psfmi_lr(data=lbpmilr, formula = Chronic ~ rcs(Pain, 3) + 
                      JobDemands + rcs(Tampascale, 3) + factor(Satisfaction) + 
                      Smoking, p.crit = 0.05, direction="FW", 
                      nimp=5, impvar="Impnr", method="D1")
#> Entered at Step 1 is - rcs(Pain,3)
#> Entered at Step 2 is - factor(Satisfaction)
#> 
#> Selection correctly terminated, 
#> No new variables entered the model

pool_lr$RR_model_final
#> $`Final model`
#>                    term   estimate std.error  statistic        df     p.value
#> 1           (Intercept) -3.6027668 1.5427414 -2.3353018  60.25659 0.022875170
#> 2 factor(Satisfaction)2 -0.4725289 0.5164342 -0.9149838 145.03888 0.361718841
#> 3 factor(Satisfaction)3 -2.3328994 0.7317131 -3.1882707 122.95905 0.001815476
#> 4      rcs(Pain, 3)Pain  0.6514983 0.4028728  1.6171315  51.09308 0.112008088
#> 5     rcs(Pain, 3)Pain'  0.4703811 0.4596490  1.0233483  75.29317 0.309419924
#>           OR   lower.EXP upper.EXP
#> 1 0.02724823 0.001245225 0.5962503
#> 2 0.62342367 0.224644070 1.7301016
#> 3 0.09701406 0.022793375 0.4129150
#> 4 1.91841309 0.854476033 4.3070942
#> 5 1.60060402 0.640677978 3.9987846

pool_lr$multiparm
#> $`Step 0 - selected - rcs(Pain,3)`
#>                        p-value D1
#> JobDemands           7.777737e-01
#> Smoking              9.371529e-01
#> factor(Satisfaction) 9.271071e-01
#> rcs(Pain,3)          3.282999e-07
#> rcs(Tampascale,3)    2.780012e-06
#> 
#> $`Step 1 - selected - factor(Satisfaction)`
#>                       p-value D1
#> JobDemands           0.952900908
#> Smoking              0.769394518
#> factor(Satisfaction) 0.004738608
#> rcs(Tampascale,3)    0.125280292

More examples for logistic, linear and Cox regression models as well as internal and external validation of prediction models can be found on the package website or in the online book Applied Missing Data Analysis.

Copy Link

Version

Install

install.packages('psfmi')

Monthly Downloads

405

Version

1.4.0

License

GPL (>= 2)

Maintainer

Martijn Heymans

Last Published

June 17th, 2023

Functions in psfmi (1.4.0)

bw_single

Predictor selection function for backward selection of Linear and Logistic regression models.
coxph_fw

Predictor selection function for forward selection of Cox regression models in single complete dataset.
clean_P

Function to clean variables
glm_fw

Function for forward selection of Linear and Logistic regression models.
glm_bw

Function for backward selection of Linear and Logistic regression models.
coxph_bw

Predictor selection function for backward selection of Cox regression models in single complete dataset.
cv_MI_RR

Cross-validation in Multiply Imputed datasets
day2_dataset4_mi

Dataset of low back pain patients with missing values
cv_MI

Cross-validation in Multiply Imputed datasets
chol_wide

Wide dataset of persons from the The Amsterdam Growth and Health Longitudinal Study (AGHLS)
hipstudy

Dataset of elderly patients with a hip fracture
km_fit

Kaplan-Meier (KM) estimate at specific time point
hoslem_test

Calculates the Hosmer and Lemeshow goodness of fit test.
lbpmi_extval

Example dataset of Low Back Pain Patients for external validation
infarct

Data of a patient-control study regarding the relationship between MI and smoking
hoorn_basic

Dataset of the Hoorn Study
hipstudy_external

External Dataset of elderly patients with a hip fracture
km_estimates

Kaplan-Meier risk estimates for Net Reclassification Index analysis
ipdna_md

Example dataset for the psfmi_mm function
lbpmicox

Example dataset for psfmi_coxr function
mivalext_lr

External Validation of logistic prediction models in multiply imputed datasets
miceImp

Wrapper function around mice
lbp_orig

Example dataset for psfmi_perform function, method boot_MI
nri_est

Calculation of Net Reclassification Index measures
nri_cox

Net Reclassification Index for Cox Regression Models
pool_D2

Combines the Chi Square statistics across Multiply Imputed datasets
men

Data of 613 patients with meningitis
mean_auc_log

Function to calulate mean auc values
lbpmilr_dev

Example dataset for mivalext_lr function
lbpmilr

Example dataset for psfmi_lr function
pool_intadj

Provides pooled adjusted intercept after shrinkage of pooled coefficients in multiply imputed datasets
pool_compare_models

Compare the fit and performance of prediction models across Multipy Imputed data
mammaca

Data of a study among women with breast cancer
lungvolume

Data of the development of lung and heartvolume of unborn babies
pool_RR

Function to combine estimates by using Rubin's Rules
pool_auc

Calculates the pooled C-statistic (Area Under the ROC Curve) across Multiply Imputed datasets
pool_D4

Pools the Likelihood Ratio tests across Multiply Imputed datasets ( method D4)
pool_performance_internal

Pooling performance measures over multiply imputed datasets
psfmi_mm_multiparm

Multiparameter pooling methods called by psfmi_mm
pool_performance

Pooling performance measures across multiply imputed datasets
psfmi_mm

Pooling and Predictor selection function for multilevel models in multiply imputed datasets
psfmi_lr

Pooling and Predictor selection function for backward or forward selection of Logistic regression models across multiply imputed data.
rsq_nagel

Nagelkerke's R-square calculation for logistic regression / glm models
psfmi_stab

Function to evaluate bootstrap predictor and model stability in multiply imputed datasets.
psfmi_perform

Internal validation and performance of logistic prediction models across Multiply Imputed datasets
psfmi_lm_fw

Forward selection of Linear regression models across multiply imputed data.
rsq_surv

R-square calculation for Cox regression models
psfmi_lm

Pooling and Predictor selection function for backward or forward selection of Linear regression models across multiply imputed data.
psfmi_lm_bw

Backward selection of Linear regression models across multiply imputed data.
psfmi_coxr

Pooling and Predictor selection function for backward or forward selection of Cox regression models across multiply imputed data.
pool_reclassification

Function to pool NRI measures over Multiply Imputed datasets
psfmi_lr_bw

Backward selection of Logistic regression models in multiply imputed data.
psfmi_lr_fw

Forward selection of Logistic regression models in multiply imputed data.
sbp_qas

Dataset with blood pressure measurements
sbp_age

Dataset with blood pressure measurements
scaled_brier

Calculates the scaled Brier score
stab_single

Function to evaluate bootstrap predictor and model stability.
smoking

Survival data about smoking
psfmi_coxr_bw

Backward selection of Cox regression models in multiply imputed data.
psfmi_validate

Internal validation and performance of logistic prediction models across Multiply Imputed datasets
risk_coxph

Risk calculation at specific time point for Cox model
psfmi_coxr_fw

Forward selection of Cox regression models across multiply imputed data.
weight

Dataset of persons from the The Amsterdam Growth and Health Longitudinal Study (AGHLS)
RR_diff_prop

Function to apply RR to pool difference of NRI and AUC values
MI_boot

Bootstrap validation in Multiply Imputed datasets
bmd

Data of a non-experimental study in more than 300 elderly women
aortadis

Dataset of patients with a aortadissection
chol_long

Long dataset of persons from the The Amsterdam Growth and Health Longitudinal Study (AGHLS)
anderson

Data from a placebo-controlled RCT with leukemia patients
chlrform

Data about concentration of ß2-microglobuline in urine as indicator for possible damage to the kidney
boot_MI

Bootstrap validation in Multiply Imputed datasets
MI_cv_naive

Naive method for Cross-validation in Multiply Imputed datasets