PrepareData: Prepare Data for Two-Condition Within-Subject Mediation (WsMed)

Description

PrepareData() transforms raw pre/post data into the set of variables required by the WsMed workflow. It handles mediators, outcome, within-subject controls, between-subject controls, moderators, and all necessary interaction terms, while automatically centering / dummy-coding variables as needed.

Usage

PrepareData(
  data,
  M_C1,
  M_C2,
  Y_C1,
  Y_C2,
  C_C1 = NULL,
  C_C2 = NULL,
  C = NULL,
  C_type = NULL,
  W = NULL,
  W_type = NULL,
  center_W = TRUE,
  keep_W_raw = TRUE,
  keep_C_raw = TRUE
)

Value

A data frame containing at minimum:

Ydiff
Mi_diff, Mi_avg for each mediator
centered or dummy-coded Cb*, Cw*diff, Cw*avg
centered or dummy-coded W* and all int_* interaction terms

plus the attributes "W_info" and "C_info" described above.

Arguments

data: A data frame with the raw pre/post measures.
M_C1, M_C2: Character vectors: mediator names at occasion 1 and 2 (equal length).
Y_C1, Y_C2: Character scalars: outcome names at occasion 1 and 2.
C_C1, C_C2: Optional character vectors: within-subject control names.
C: Optional character vector: between-subject control names.
C_type: Optional vector of the same length as C. Each element is one of "continuous", "categorical", or "auto" (default). Ignored when C = NULL.
W: Optional character vector: moderator names (one or more).
W_type: Optional vector of the same length as W. Same coding as C_type. Ignored when W = NULL.
center_W: Logical. Whether to center the moderator variable W.
keep_W_raw, keep_C_raw: Logical. If TRUE, keep the original W / C columns in the returned data.

Details

The function performs the following steps:

Outcome difference: Ydiff = Y_C2 - Y_C1.
Mediator variables for each pair (M_C1[i], M_C2[i]):
- Mi_diff = M_C2 - M_C1
- Mi_avg is the mean-centered average of the two occasions.
Between-subject controls C:
- Continuous variables are grand-mean centered (Cb1, Cb2, ...).
- Categorical variables (binary or multi-level) are expanded into k - 1 dummy variables (Cb1_1, Cb2_1, Cb2_2, ...), using the first level as the reference.
Within-subject controls Cw: difference and centered-average versions (Cw1diff, Cw1avg, ...).
Moderators W (one or more):
- Continuous variables are grand-mean centered (W1, W2, ...).
- Categorical variables are dummy-coded in the same way as C.
Interaction terms between each moderator column and each mediator column:
- int_<Mi_diff>_<Wj>, int_<Mi_avg>_<Wj>.
Two attributes are added to the returned data:
- "W_info": raw names, dummy names, level mapping
- "C_info": same structure for between-subject controls.

Row counts are preserved even if input factors contain NA values (model.matrix is called with na.action = na.pass).

Examples

Run this code

set.seed(1)
raw <- data.frame(
  A1 = rnorm(50), A2 = rnorm(50),   # mediator 1
  B1 = rnorm(50), B2 = rnorm(50),   # mediator 2
  C1 = rnorm(50), C2 = rnorm(50),   # outcome
  D1 = rnorm(50), D2 = rnorm(50),   # within-subject control
  W_bin  = sample(0:1, 50, TRUE),   # between-subject binary C
  W_fac3 = factor(sample(c("Low","Med","High"), 50, TRUE)) # moderator W
)

prep <- PrepareData(
  data  = raw,
  M_C1  = c("A1","B1"), M_C2 = c("A2","B2"),
  Y_C1  = "C1",         Y_C2 = "C2",
  C_C1  = "D1",         C_C2 = "D2",
  C     = "W_bin",      C_type = "categorical",
  W     = "W_fac3",     W_type = "categorical"
)
head(prep)