Learn R Programming

pECV (version 1.0.1)

pECV.miss: Entrywise Splitting Cross-Validation with Missing Data

Description

Uses (Penalized) Entrywise Splitting Cross-Validation to estimate the number of latent factors in generalized factor models when the data contain missing values.

Usage

pECV.miss(
  resp,
  C = 5,
  qmax = 8,
  fold = 5,
  tol_val = 0.01,
  theta0 = NULL,
  A0 = NULL,
  seed = 1,
  data_type = NULL
)

Value

A named list with components:

ECV

Integer. Number of factors selected by standard ECV.

p1ECV

Integer. Number of factors selected by ECV with penalty 1.

p2ECV

Integer. Number of factors selected by ECV with penalty 2.

p3ECV

Integer. Number of factors selected by ECV with penalty 3.

p4ECV

Integer. Number of factors selected by ECV with penalty 4.

ECV_loss

Numeric vector. Cross-validation loss for each candidate factor number (typically of length qmax).

data_type

Character. The detected/used data type: "continuous", "count", or "binary".

miss_percent

Numeric scalar. Percentage of missing entries in resp.

The return value uses base R types (no special S3/S4 class).

Arguments

resp

Observation data matrix (n x p) with missing values as NA; can be continuous, count, or binary.

C

Constraint constant, default is 5.

qmax

Maximum number of factors to consider, default is 8.

fold

Number of folds in cross-validation, default is 5.

tol_val

Convergence tolerance, default is 0.01 (interpreted as 0.01 / number of estimated elements).

theta0

Optional initial matrix for factors; sampled from Uniform if not provided.

A0

Optional initial matrix for loadings; sampled from Uniform if not provided.

seed

Random seed, default is 1.

data_type

Data type, one of "continuous", "count", "binary". If not specified, it is auto-detected.

Details

The example below may take more than 5 seconds on some machines and is therefore not run during routine checks.

Examples

Run this code
# \donttest{
set.seed(123)
# Generate count data with missing values
n <- 50; p <- 50; q <- 2
theta_true <- cbind(1, matrix(runif(n * q, -2, 2), n, q))
A_true <- matrix(runif(p * (q + 1), -2, 2), p, (q + 1))
lambda <- exp(theta_true %*% t(A_true))
resp <- matrix(
  rpois(length(lambda), lambda = as.vector(lambda)),
  nrow = nrow(lambda), ncol = ncol(lambda)
)
# Introduce 5% missing values
miss_idx <- sample(1:(n * p), size = 0.05 * n * p)
resp[miss_idx] <- NA
result <- pECV.miss(resp, C = 4, qmax = 4, fold = 5)
print(result)
# }

Run the code above in your browser using DataLab