Learn R Programming

SCDA (version 0.0.1)

SCSR_InfoCrit: Automatically select the optimal number of clusters based on likelihood information criteria (i.e., AIC, BIC and HQC) for a given SCSR model.

Description

Computes the likelihood-based information criteria (i.e, Akaike's IC, Bayesian IC, and Hannan–Quinn IC) for every SCSR model given by the combination of the G and Phi contained in the G.set and Phi.set inputs and provides the associated likelihood-based information criteria. Given the minimization rule, SCSR_InfoCrit automatically identifies the optimal number of clusters for every criterion.

Usage

SCSR_InfoCrit(
  Formula,
  Data_sf,
  listW,
  Phi.set = c(0.5, 1),
  G.set = c(2, 3, 4),
  Type = c("SCLM", "SCSAR", "SCSEM", "SCSLX"),
  CenterVars = TRUE,
  ScaleVars = TRUE,
  Maxitr = 200,
  RelTol = 10^-6,
  AbsTol = 10^-5,
  Verbose = TRUE,
  Seed = 123456789
)

Value

A list object containing the following outputs:

  • IC: a data.frame object containing one row for each combination of the supplied vectors G.set and Phi.set and 5 columns (G,Phi,AIC,BIC,HQC).

  • OptimPars: a data.frame object with 3 rows (criteria) and 2 columns (Parameters) with the optimal combination of G and Phi for every criterion.

Arguments

Formula

a symbolic description of the regression model to be fit. The details of model specification are given for lm(...)

Data_sf

A data.frame object of class sf with n rows (each one corresponding to a location/polygon) and a user-defined number of columns. The data frame must contain the response variable and all the covariates to be used in the model. Also, it must include the geometry feature for spatial modelling and representation. Typically, sf data.frame are built using the st_as_sf(...) command from the sf package (see its documentation for details).

listW

listw object. It contains the spatial weights for the spatial autoregressive component. Typically, listW is built using the nb2listw(...) command from the spdep package (see its documentation for details). We suggest to adopt one of matrix styles suggested in the spdep package, such as W (row-standardized) or B (binary). We also suggest to adopt a zero.policy = TRUE option to allow the computation of groups/clusters with isolated units. In this regard, we recall that if zero.policy = FALSE and Type = "SCSAR" causes SCSR_Estim(...) to terminate with an error. See package spatialreg for details on the zero.policy input.

Phi.set

Non-negative (>=0) real-valued vector. Sequence of spatial penalty parameter. Default is Phi = c(0.50,1).

G.set

Integer vector. Sequence of clusters to be considered. Default is G = c(2,3,4).

Type

Character. Declares which model specification has to be estimated. Admitted strings are:

  • "SCLM" for linear regression model without spatial effects (LM);

  • "SCSAR" for spatial autoregressive (SAR) model;

  • "SCSEM" for linear regression model with spatial autoregressive error term or spatial Durbin model (SEM);

  • "SCSLX" for linear regression model with spatially-lagged response variable and covariates (SLX);

CenterVars

Logical value (TRUE or FALSE) stating whether the response variable and the covariates have to be centered around the mean in the iterative algorithm to update memberships and group-wise parameters. Centering is only use in the iterative procedure, while final estimates provided to the user are computed original (i.e., non-centered) variables.

ScaleVars

Logical value (TRUE or FALSE) stating whether the response variable and the covariates have to be scaled with respect to their standard deviation in the iterative algorithm to update memberships and group-wise parameters. Scaling is only used in the iterative procedure, while final estimates provided to the user are computed original (i.e., non-scaled) variables.

Maxitr

Integer value. Maximum number of iterations for the iterative algorithm. Convergence criterion is fixed to \(\varepsilon\) = 10^(-5).

RelTol

Tolerance for the relative improvement in the log-likelihood (exit criterion) from iteration k to k+1. Default is \(\varepsilon_{Rel}\) = 10^-6

AbsTol

Tolerance for the absolute improvement in the log-likelihood (exit criterion) from iteration k to k+1. Default is \(\varepsilon_{Abs}\) = 10^-5

Verbose

Logical value (TRUE or FALSE). Toggle warnings and messages. If verbose = TRUE (default) the function prints on the screen some messages describing the progress of the tasks. If verbose = FALSE any message about the progression is suppressed.

Seed

Integer value. Define the random number generator (RNG) state for random number generation in R. Deafult is seed = 123456789.

Details

Given the vectors G.set = c(2,3,4) and Phi.set = c(0.50,1), the function 'SCSR_InfoCrit' will compute 3x2=6 models, each at a given combination of G and Phi. For computional details on the spatially-clustered models, we kindly refer to Cerqueti, R., Maranzano, P. & Mattera, R. "Spatially-clustered spatial autoregressive models with application to agricultural market concentration in Europe". arXiv preprints (<doi:10.48550/arXiv.2407.15874>)

Examples

Run this code
# \donttest{
data(Data_RC_PM_RM_JABES2024, package="SCDA")
SCSAR_IC <- SCSR_InfoCrit(Formula = "Gini_SO ~ GDPPC_PPS2020 + Share_AgroEmp",
                          Data_sf = Data2020, listW=listW, Type="SCSAR",
                          Maxitr = 100, Phi.set = c(0.50,1), G.set=c(2,3))
# }

Run the code above in your browser using DataLab