Learn R Programming

LikertMakeR (version 1.4.0)

makeScalesRegression: Generate Data from Multiple-Regression Summary Statistics

Description

Generates synthetic rating-scale data that replicates reported regression results. This function is useful for reproducing analyses from published research where only summary statistics (standardised regression coefficients and R-squared) are reported.

Usage

makeScalesRegression(
  n,
  beta_std,
  r_squared,
  iv_cormatrix = NULL,
  iv_cor_mean = 0.3,
  iv_cor_variance = 0.01,
  iv_cor_range = c(-0.7, 0.7),
  iv_means,
  iv_sds,
  dv_mean,
  dv_sd,
  lowerbound_iv,
  upperbound_iv,
  lowerbound_dv,
  upperbound_dv,
  items_iv = 1,
  items_dv = 1,
  var_names = NULL,
  tolerance = 0.005
)

Value

A list containing:

data

Generated dataframe with k IVs and 1 DV

target_stats

List of target statistics provided

achieved_stats

List of achieved statistics from generated data

diagnostics

Comparison of target vs achieved

iv_dv_cors

Calculated correlations between IVs and DV

full_cormatrix

The complete (k+1) x (k+1) correlation matrix used

optimisation_info

If IV correlations were optimised, details about the optimisation

Arguments

n

Integer. Sample size

beta_std

Numeric vector of standardised regression coefficients (length k)

r_squared

Numeric. R-squared from regression (-1 to 1)

iv_cormatrix

k x k correlation matrix of independent variables. If missing (NULL), will be optimised.

iv_cor_mean

Numeric. Mean correlation among IVs when optimising (ignored if iv_cormatrix provided). Default = 0.3

iv_cor_variance

Numeric. Variance of correlations when optimising (ignored if iv_cormatrix provided). Default = 0.01

iv_cor_range

Numeric vector of length 2. Min and max constraints on correlations when optimising. Default = c(-0.7, 0.7)

iv_means

Numeric vector of means for IVs (length k)

iv_sds

Numeric vector of standard deviations for IVs (length k)

dv_mean

Numeric. Mean of dependent variable

dv_sd

Numeric. Standard deviation of dependent variable

lowerbound_iv

Numeric vector of lower bounds for each IV scale (or single value for all)

upperbound_iv

Numeric vector of upper bounds for each IV scale (or single value for all)

lowerbound_dv

Numeric. Lower bound for DV scale

upperbound_dv

Numeric. Upper bound for DV scale

items_iv

Integer vector of number of items per IV scale (or single value for all). Default = 1

items_dv

Integer. Number of items in DV scale. Default = 1

var_names

Character vector of variable names (length k+1: IVs then DV)

tolerance

Numeric. Acceptable deviation from target R-squared (default 0.005)

Details

Generate regression data from summary statistics

The function can operate in two modes:

Mode 1: With IV correlation matrix provided

When iv_cormatrix is provided, the function uses the given correlation structure among independent variables and calculates the implied IV-DV correlations from the regression coefficients.

Mode 2: With optimisation (IV correlation matrix not provided)

When iv_cormatrix = NULL, the function optimises to find a plausible correlation structure among independent variables that matches the reported regression statistics. Initial correlations are sampled using Fisher's z-transformation to ensure proper distribution, then iteratively adjusted to match the target R-squared.

The function generates Likert-scale data (not individual items) using lfast() for each variable with specified moments, then correlates them using lcor(). Generated data are verified by running a regression and comparing achieved statistics with targets.

See Also

lfast for generating individual rating-scale vectors with exact moments.

lcor for rearranging values to achieve target correlations.

makeCorrAlpha for generating correlation matrices from Cronbach's Alpha.

Examples

Run this code

# Example 1: With provided IV correlation matrix
set.seed(123)
iv_corr <- matrix(c(1.0, 0.3, 0.3, 1.0), nrow = 2)

result1 <- makeScalesRegression(
  n = 64,
  beta_std = c(0.4, 0.3),
  r_squared = 0.35,
  iv_cormatrix = iv_corr,
  iv_means = c(3.0, 3.5),
  iv_sds = c(1.0, 0.9),
  dv_mean = 3.8,
  dv_sd = 1.1,
  lowerbound_iv = 1,
  upperbound_iv = 5,
  lowerbound_dv = 1,
  upperbound_dv = 5,
  items_iv = 4,
  items_dv = 4,
  var_names = c("Attitude", "Intention", "Behaviour")
)

print(result1)
head(result1$data)


# Example 2: With optimisation (no IV correlation matrix)
set.seed(456)
result2 <- makeScalesRegression(
  n = 128,
  beta_std = c(0.3, 0.25, 0.2),
  r_squared = 0.40,
  iv_cormatrix = NULL, # Will be optimised
  iv_cor_mean = 0.3,
  iv_cor_variance = 0.02,
  iv_means = c(3.0, 3.2, 2.8),
  iv_sds = c(1.0, 0.9, 1.1),
  dv_mean = 3.5,
  dv_sd = 1.0,
  lowerbound_iv = 1,
  upperbound_iv = 5,
  lowerbound_dv = 1,
  upperbound_dv = 5,
  items_iv = 4,
  items_dv = 5
)

# View optimised correlation matrix
print(result2$target_stats$iv_cormatrix)
print(result2$optimisation_info)

Run the code above in your browser using DataLab