clean_epmfd: Remove misfitting persons from an epmfd_misfit object

Description

clean_epmfd() removes individuals flagged as misfitting according to a chosen decision rule and returns a cleaned dataset that can be passed directly to scale_epmfd().

Usage

clean_epmfd(misfit, criterion = c("union", "intersection"), clean_item = FALSE)

Value

An epmfd_clean list with:

raw: An epmfd_raw object containing only the retained persons and items, directly usable in scale_epmfd().
clean_data: The cleaned raw data frame (persons × kept items).
n_removed: Number of persons removed.
criterion: The applied decision rule.
misfit: The original epmfd_misfit object (as provided).

Arguments

misfit: An epmfd_misfit object returned by misfit_epmfd().
criterion: Character string, either "union" (default) or "intersection".
clean_item: is a logical argument. If clean_item=TRUE, then the function can clean items. The defaul value is FALSE.

Criterion

"union" (default): A person is removed if at least one statistic (e.g., Gnp, U3p, lpz) flags them as misfitting. This is stricter.
"intersection": A person is removed only if all statistics flag them as misfitting. This is more lenient.

Details

The function uses logical misfit indicators stored in misfit$table, including:

misfit_any: TRUE if at least one statistic flagged the person.
Statistic-specific columns (e.g., Gnp, U3p, lpz) indicating per-statistic misfit decisions.

The set of statistics actually considered is taken from misfit$stats. Under the "intersection" rule, a person is removed only if all of those statistics are TRUE. Internally, rowSums(..., na.rm = TRUE) is used so that NA values do not force removal (i.e., NA behaves as “not flagged” in the intersection count).

Only items listed in misfit$scaled$kept are retained in the output. Person identifiers from the original raw object are preserved for the kept rows.

Examples

Run this code


library(epmfd)
data<-load_epmfd(sampledata)
scaling_data<-scale_epmfd(data)
misfit_result<-misfit_epmfd(scaling_data)
clean_data<-clean_epmfd(misfit_result)
head(clean_data$clean_data)
dim(data$data)  # the dimension of raw data
dim(clean_data$clean_data)  # the dimension of clean data

Run the code above in your browser using DataLab