Learn R Programming

dbrobust (version 1.0.0)

Data_HC_no_contamination: High-correlation dataset without contamination

Description

Synthetic dataset generated from a multivariate normal distribution with strong correlation structure (\(\rho = 0.8\)). It contains 500 observations and 10 variables of mixed type (continuous, categorical, binary, and weights). No contaminated cases were added in this version, so the dataset represents a clean scenario with 0% contamination. These data follow the design in boj2024robustificationdbrobust.

Usage

Data_HC_no_contamination

Arguments

Format

A data frame with 500 rows and 10 variables:

V1

Continuous variable 1

V2

Continuous variable 2

V3

Continuous variable 3

V4

Continuous variable 4

V5

Categorical variable 1 (3 categories, approx. balanced)

V6

Categorical variable 2 (3 categories, approx. balanced)

V7

Categorical variable 3 (4 categories, uniform distribution)

V8

Binary variable 1 (40% zeros, 60% ones)

V9

Binary variable 2 (60% zeros, 40% ones)

w_loop

Observation weights derived from the joint distribution of V5 and V8, following a proportional frequency-based scheme.

Details

  • Continuous variables were drawn directly from the multivariate normal sample.

  • Binary and categorical variables were obtained by discretizing normal margins using percentile-based thresholds.

  • Unlike other datasets in this collection, no artificial contamination was introduced here.

  • The weighting scheme prioritizes frequent category combinations.

References

boj2024robustificationdbrobust