slm: Simple Longitudinal Mean (SLM)

Description

This function detects influential subjects in longitudinal data based on their mean response values. It identifies subjects whose mean response deviates significantly beyond a specified threshold (defined as k standard deviations from the mean). The function provides a summary of influential subjects, separates the data into influential and non-influential subjects, calculates influence scores, and visualizes the results using ggplot2.

Usage

slm(data, subject_id, time, response, k = 2, verbose = FALSE)

Value

A list containing:

influential_subjects: A vector of subject IDs identified as influential.
influential_data: A data frame containing data for influential subjects.
non_influential_data: A data frame containing data for non-influential subjects.
influence_scores: A data frame with subject IDs, mean response, IS (Influence Score), and PIS (Proportional Influence Score).
mean_plot: A ggplot object showing mean responses per subject with influential subjects highlighted.
longitudinal_plot: A ggplot object visualizing longitudinal response trends, with influential subjects highlighted.
IS_table: A data frame containing the Influence Score (IS) and the Partial Influence Score (PIS) values for each subject.

Arguments

data: A data frame containing longitudinal data.
subject_id: A column specifying the column name representing subject identifiers.
time: A column specifying different time points that observations are measured.
response: A column specifying the column name representing response values.
k: A numeric value representing the threshold (number of standard deviations from the mean) to classify a subject as influential.
verbose: Logical; if TRUE, prints informative messages during execution.

Details

The function follows these steps:

Calculates the mean and standard deviation of the response variable across all subjects.
Determines the threshold for influence based on k standard deviations from the mean.
Identifies subjects whose mean response falls outside this threshold.
Calculates the Influence Score (IS) for each subject as the absolute deviation of their mean from the overall mean.
Calculates the Proportional Influence Score (PIS) for each subject as IS divided by the overall standard deviation.
Separates data into influential and non-influential subjects.
Visualizes the distribution of responses and highlights influential subjects.

This method is useful for detecting outliers and understanding the impact of extreme values in longitudinal studies.

Examples

Run this code

data(infsdata)
infsdata <- infsdata[1:5,]
result <- slm(infsdata, "subject_id", "time", "response", 2)
print(result$influential_subjects)
head(result$influential_data)
head(result$non_influential_data)
head(result$influence_scores)
print(result$mean_plot)
print(result$longitudinal_plot)