dynEGA: Dynamic Exploratory Graph Analysis

Description

Estimates dynamic communities in multivariate time series (e.g., panel data, longitudinal data, intensive longitudinal data) at multiple time scales and at different levels of analysis: individuals (intraindividual structure), groups, and population (interindividual structure)

Usage

dynEGA(
  data,
  id = NULL,
  group = NULL,
  n.embed = 5,
  n.embed.optimize = FALSE,
  tau = 1,
  delta = 1,
  use.derivatives = 1,
  na.derivative = c("none", "kalman", "rowwise", "skipover"),
  zero.jitter = 0.001,
  level = c("individual", "group", "population"),
  corr = c("auto", "cor_auto", "pearson", "spearman"),
  na.data = c("pairwise", "listwise"),
  model = c("BGGM", "glasso", "TMFG"),
  algorithm = c("leiden", "louvain", "walktrap"),
  uni.method = c("expand", "LE", "louvain"),
  ncores,
  seed = NULL,
  verbose = TRUE,
  ...
)

Value

A list containing:

Derivatives

A list containing:

Estimates --- A list the length of the unique IDs containing data frames of zero- to second-order derivatives for each ID in data
EstimatesDF --- A data frame of derivatives across all IDs containing columns of the zero- to second-order derivatives as well as id and group variables (group is automatically set to 1 for all if no group is provided)

dynEGA

A list containing:

population --- If level includes "populaton", then the EGA results for the entire sample
group --- If level includes "group", then a list containing the EGA results for each group
individual --- If level includes "individual", then a list containing the EGA results for each id

Arguments

data

Matrix or data frame. Participants and variable should be in long format such that row t represents observations for all variables at time point t for a participant. The next row, t + 1, represents the next measurement occasion for that same participant. The next participant's data should immediately follow, in the same pattern, after the previous participant

data should have an ID variable labeled "ID"; otherwise, it is assumed that the data represent the population

For groups, data should have a Group variable labeled "Group"; otherwise, it is assumed that there are no groups in data

Arguments id and group can be specified to tell the function which column in data it should use as the ID and Group variable, respectively

A measurement occasion variable is not necessary and should be removed from the data before proceeding with the analysis

id

Numeric or character (length = 1). Number or name of the column identifying each individual. Defaults to NULL

group

Numeric or character (length = 1). Number of the column identifying group membership. Defaults to NULL

n.embed

Numeric (length = 1 or more). Defaults to 5. Number of embedded dimensions (the number of observations to be used in the Embed function). For example, an "n.embed = 5" will use five consecutive observations to estimate a single derivative.

If more than one value is provided, then the number of embeddings will be optimized over using tefi to determine the optimal length of the embedding dimensions for each individual in the sample

n.embed.optimize

Boolean (length = 1). If TRUE, performs optimization of n.embed for each individual, then constructs the population based on optimized derivatives. When TRUE, individual networks are considered of interest and will always be output. Defaults to FALSE

tau

Numeric (length = 1). Defaults to 1. Number of observations to offset successive embeddings in the Embed function. Generally recommended to leave "as is"

delta

Numeric (length = 1). Defaults to 1. The time between successive observations in the time series (i.e, lag). Generally recommended to leave "as is"

use.derivatives

Numeric (length = 1). Defaults to 1. The order of the derivative to be used in the analysis. Available options:

0 --- No derivatives; consistent with moving average
1 --- First-order derivatives; interpreted as "velocity" or rate of change over time
2 --- Second-order derivatives; interpreted as "acceleration" or rate of the rate of change over time

Generally recommended to leave "as is"

na.derivative

Character (length = 1). How should missing data in the embeddings be handled? Available options (see Boker et al. (2018) in glla references for more details):

"none" (default) --- does nothing and leaves NAs in data
"kalman" --- uses Kalman smoothing (KalmanSmooth) with structural time series models (StructTS) to impute missing values. This approach models the underlying temporal dependencies (trend, seasonality, autocorrelation) to generate estimates for missing observations while preserving the original time scale. More computationally intensive than the other methods but typically provides the most accurate imputation by respecting the stochastic properties of the time series
"rowwise" --- adjusts time interval with respect to each embedding ensuring time intervals are adaptive to the missing data (tends to be more accurate than "none")
"skipover" --- "skips over" missing data and treats the non-missing points as continuous points in time (note that the time scale shifts to the "per mean time interval," which is different and larger than the original scale)

zero.jitter

Numeric (length = 1). Small amount of Gaussian noise added to zero variance derivatives to prevent estimation failures. For more than one variable, noise is generated multivariate normal distribution to ensure orthogonal noise is added. The jitter preserves the overall structure but avoids singular covariance matrices during network estimation. Defaults to 0.001

level

Character vector (up to length of 3). A character vector indicating which level(s) to estimate:

"individual" --- Estimates EGA for each individual in data (intraindividual structure; requires an "ID" column, see data)
"group" --- Estimates EGA for each group in data (group structure; requires a "Group" column, see data)
"population" --- Estimates EGA across all data (interindividual structure)

corr

Character (length = 1). Method to compute correlations. Defaults to "auto". Available options:

"auto" --- Automatically computes appropriate correlations for the data using Pearson's for continuous, polychoric for ordinal, tetrachoric for binary, and polyserial/biserial for ordinal/binary with continuous. To change the number of categories that are considered ordinal, use ordinal.categories (see polychoric.matrix for more details)
"cor_auto" --- Uses cor_auto to compute correlations. Arguments can be passed along to the function
"pearson" --- Pearson's correlation is computed for all variables regardless of categories
"spearman" --- Spearman's rank-order correlation is computed for all variables regardless of categories

For other similarity measures, compute them first and input them into data with the sample size (n)

na.data

Character (length = 1). How should missing data be handled? Defaults to "pairwise". Available options:

"pairwise" --- Computes correlation for all available cases between two variables
"listwise" --- Computes correlation for all complete cases in the dataset

model

Character (length = 1). Defaults to "glasso". Available options:

"BGGM" --- Computes the Bayesian Gaussian Graphical Model. Set argument ordinal.categories to determine levels allowed for a variable to be considered ordinal. See ?BGGM::estimate for more details
"glasso" --- Computes the GLASSO with EBIC model selection. See EBICglasso.qgraph for more details
"TMFG" --- Computes the TMFG method. See TMFG for more details

algorithm

Character or igraph cluster_* function (length = 1). Defaults to "walktrap". Three options are listed below but all are available (see community.detection for other options):

"leiden" --- See cluster_leiden for more details
"louvain" --- By default, "louvain" will implement the Louvain algorithm using the consensus clustering method (see community.consensus for more information). This function will implement consensus.method = "most_common" and consensus.iter = 1000 unless specified otherwise
"walktrap" --- See cluster_walktrap for more details

uni.method

Character (length = 1). What unidimensionality method should be used? Defaults to "louvain". Available options:

"expand" --- Expands the correlation matrix with four variables correlated 0.50. If number of dimension returns 2 or less in check, then the data are unidimensional; otherwise, regular EGA with no matrix expansion is used. This method was used in the Golino et al.'s (2020) Psychological Methods simulation
"LE" --- Applies the Leading Eigenvector algorithm (cluster_leading_eigen) on the empirical correlation matrix. If the number of dimensions is 1, then the Leading Eigenvector solution is used; otherwise, regular EGA is used. This method was used in the Christensen et al.'s (2023) Behavior Research Methods simulation
"louvain" --- Applies the Louvain algorithm (cluster_louvain) on the empirical correlation matrix. If the number of dimensions is 1, then the Louvain solution is used; otherwise, regular EGA is used. This method was validated Christensen's (2022) PsyArXiv simulation. Consensus clustering can be used by specifying either "consensus.method" or "consensus.iter"

ncores

Numeric (length = 1). Number of cores to use in computing results. Defaults to ceiling(parallel::detectCores() / 2) or half of your computer's processing power. Set to 1 to not use parallel computing

If you're unsure how many cores your computer has, then type: parallel::detectCores()

seed

Numeric (length = 1). Defaults to NULL or random results. Set for reproducible results. See Reproducibility and PRNG for more details on random number generation in EGAnet

verbose

Boolean (length = 1). Should progress be displayed? Defaults to TRUE. Set to FALSE to not display progress

...

Additional arguments to be passed on to auto.correlate, network.estimation, community.detection, community.consensus, and EGA

Author

Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <alexpaulchristensen@gmail.com>

Details

Derivatives for each variable's time series for each participant are estimated using generalized local linear approximation (see glla). EGA is then applied to these derivatives to model how variables are changing together over time. Variables that change together over time are detected as communities

References

Generalized local linear approximation
Boker, S. M., Deboeck, P. R., Edler, C., & Keel, P. K. (2010) Generalized local linear approximation of derivatives from time series. In S.-M. Chow, E. Ferrer, & F. Hsieh (Eds.), The Notre Dame series on quantitative methodology. Statistical methods for modeling human dynamics: An interdisciplinary dialogue, (p. 161-178). Routledge/Taylor & Francis Group.

Deboeck, P. R., Montpetit, M. A., Bergeman, C. S., & Boker, S. M. (2009) Using derivative estimates to describe intraindividual variability at multiple time scales. Psychological Methods, 14(4), 367-386.

Original dynamic EGA implementation
Golino, H., Christensen, A. P., Moulder, R. G., Kim, S., & Boker, S. M. (2021). Modeling latent topics in social media using Dynamic Exploratory Graph Analysis: The case of the right-wing and left-wing trolls in the 2016 US elections. Psychometrika.

Time delay embedding procedure
Savitzky, A., & Golay, M. J. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 36(8), 1627-1639.

Examples

Run this code

# Population structure
simulated_population <- dynEGA(
  data = sim.dynEGA, level = "population"
  # uses simulated data in package
  # useful to understand how data should be structured
)

if (FALSE) {
# Group structure
simulated_group <- dynEGA(
  data = sim.dynEGA, level = "group"
  # uses simulated data in package
  # useful to understand how data should be structured
)

# Individual structure
simulated_individual <- dynEGA(
  data = sim.dynEGA, level = "individual",
  ncores = 2, # use more for quicker results
  verbose = TRUE # progress bar
)

# Population, group, and individual structure
simulated_all <- dynEGA(
  data = sim.dynEGA,
  level = c("individual", "group", "population"),
  ncores = 2, # use more for quicker results
  verbose = TRUE # progress bar
)

# Plot population
plot(simulated_all$dynEGA$population)

# Plot groups
plot(simulated_all$dynEGA$group)

# Plot individual
plot(simulated_all$dynEGA$individual, id = 1)

# Step through all plots
# Unless `id` is specified, 4 random IDs
# will be drawn from individuals
plot(simulated_all)

# Optimize over multiple embeddings
optimized_all <- dynEGA(
  data = sim.dynEGA,
  level = c("individual", "group", "population"),
  n.embed = 3:10, # set number of dimensions to search over
  n.embed.optimize = TRUE, # set to TRUE to optimize
  ncores = 2, # use more for quicker results
  verbose = TRUE # progress bar
)}

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

Author

Details

References

See Also

Examples