Learn R Programming

pprof (version 1.0.2)

data_check: Data quality check function

Description

Conduct data quality check including checking missingness, variation, correlation and VIF of variables.

Usage

data_check(Y, Z, ProvID)

Value

No return value, called for side effects.

Arguments

Y

a numeric vector indicating the outcome variable.

Z

a matrix or data frame representing covariates.

ProvID

a numeric vector representing the provider identifier.

Details

The function performs the following checks:

  • Missingness: Checks for any missing values in the dataset and provides a summary of missing data.

  • Variation: Identifies covariates with zero or near-zero variance which might affect model stability.

  • Correlation: Analyzes pairwise correlation among covariates and highlights highly correlated pairs.

  • VIF: Computes the Variable Inflation Factors to identify covariates with potential multicollinearity issues.

If issues arise when using the model functions logis_fe, linear_fe and linear_re, this function can be called for data quality checking purposes.

Examples

Run this code
data(ExampleDataBinary)
outcome = ExampleDataBinary$Y
covar = ExampleDataBinary$Z
ProvID = ExampleDataBinary$ProvID
data_check(outcome, covar, ProvID)

Run the code above in your browser using DataLab