Learn R Programming

CDatanet (version 0.0.1)

CDnetNPL: Estimate Count Data Model with Social Interactions using NPL Method

Description

Estimate Count Data Model with Social Interactions using NPL Method

Usage

CDnetNPL(
  formula,
  contextual,
  Glist,
  theta0 = NULL,
  yb0 = NULL,
  optimizer = "optim",
  npl.ctr = list(),
  opt.ctr = list(),
  data
)

Arguments

formula

an object of class formula: a symbolic description of the model. The formula should be as for example y ~ x1 + x2 | x1 + x2 where y is the endogenous vector, the listed variables before the pipe, x1, x2 are the individual exogenous variables and the listed variables after the pipe, x1, x2 are the contextual observable variables. Other formulas may be y ~ x1 + x2 for the model without contextual effects, y ~ -1 + x1 + x2 | x1 + x2 for the model without intercept or y ~ x1 + x2 | x2 + x3 to allow the contextual variable to be different from the individual variables.

contextual

(optional) logical; if true, this means that all individual variables will be set as contextual variables. Set the the formula as y ~ x1 + x2 and contextual as TRUE is equivalent to set the formula as y ~ x1 + x2 | x1 + x2.

Glist

the adjacency matrix or list sub-adjacency matrix.

theta0

(optional) starting value of \(\theta = (\lambda, \beta, \gamma, \sigma)\). The parameter \(\gamma\) should be removed if the model does not contain contextual effects (see details).

yb0

(optional) expectation of y.

optimizer

is either nlm (referring to the nlm function) or optim (referring to the optim function). At every step of the NPL method, the estimation is performed using nlm or optim. Other arguments of these functions such as, control and method can be defined through the argument opt.ctr.

npl.ctr

list of controls for the NPL method (see details).

opt.ctr

list of arguments of nlm or optim (the one set in optimizer) such as control, method, ...

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which CDnetNPL is called.

Value

A list consisting of:

M

number of sub-networks.

n

number of individuals in each network.

iteration

number of iterations performed by the NPL algorithm.

estimate

NPL estimator.

likelihood

pseudo-likelihood value.

yb

ybar (see details), expectation of y.

Gyb

average of the expectation of y among friends.

steps

step-by-step output as returned by the optimizer.

codedata

list of formula, name of the object Glist, number of friends in the network and name of the object data (see details).

Details

Model

Following Houndetoungan (2020), the count data \(\mathbf{y}\) is generated from a latent variable \(\mathbf{y}^*\). The latent variable is given for all i as $$y_i^* = \lambda \mathbf{g}_i \bar{\mathbf{y}} + \mathbf{x}_i'\beta + \mathbf{g}_i\mathbf{X}\gamma + \epsilon_i,$$ where \(\epsilon_i \sim N(0, \sigma^2)\). The count variable \(y_i\) is then define by the next (greater or equal) non negative integer to \(y_i^*\); that is \(y_i = 0\) if \(y_i^* \leq 0\) and \(y_i = q + 1\) if \(q < y_i^* \leq q + 1\), where \(q\) is a non-negative integer.

npl.ctr

The model parameters is estimated using the Nested Partial Likelihood (NPL) method. This approach starts with a guess of \(\theta\) and \(\bar{y}\) and constructs iteratively a sequence of \(\theta\) and \(\bar{y}\). The solution converges when the \(L_1\) distance between two consecutive \(\theta\) and \(\bar{y}\) is less than a tolerance. The argument npl.ctr is an optional list which contain

  • tol the tolerance of the NPL algorithm (default 1e-4),

  • maxit the maximal number of iterations allowed (default 500),

  • print a boolean indicating if the estimate should be printed at each step.

codedata

The class of the output of this function is CDnetNPL. This class has a summary and print methods to summarize and print the results. The adjacency matrix and the data are needed to summarize the results. However, in order to save memory, the function does not return these objects. Instead, it returns codedata which contains among others, the formula and the names of these objects passed through the argument Glist and data (if provided). codedata will be used to get access to the adjacency matrix and the data. Therefore, it is important to have the adjacency matrix and the data (or the variables) available in .GlobalEnv. Otherwise, it will be necessary to provide them to the summary function.

See Also

simCDnet, SARML and SARTML.

Examples

Run this code
# NOT RUN {
# Groups' size
M      <- 5 # Number of sub-groups
nvec   <- round(runif(M, 100, 1000))
n      <- sum(nvec)

# Parameters
lambda <- 0.4
beta   <- c(2, -1.9, 0.8)
gamma  <- c(1.5, -1.2)
sigma  <- 1.5
theta  <- c(lambda, beta, gamma, sigma)

# X
X      <- cbind(rnorm(n, 1, 1), rexp(n, 0.4))

# Network
Glist  <- list()

for (m in 1:M) {
  nm           <- nvec[m]
  Gm           <- matrix(0, nm, nm)
  max_d        <- 30
  for (i in 1:nm) {
    tmp        <- sample((1:nm)[-i], sample(0:max_d, 1))
    Gm[i, tmp] <- 1
  }
  rs           <- rowSums(Gm); rs[rs == 0] <- 1
  Gm           <- Gm/rs
  Glist[[m]]   <- Gm
}


# data
data    <- data.frame(x1 = X[,1], x2 =  X[,2])

rm(list = ls()[!(ls() %in% c("Glist", "data", "theta"))])

ytmp    <- simCDnet(formula = ~ x1 + x2 | x1 + x2, Glist = Glist, theta = theta, data = data)

y       <- ytmp$y

# plot histogram
hist(y, breaks = max(y))

data    <- data.frame(yt = y, x1 = data$x1, x2 = data$x2)
rm(list = ls()[!(ls() %in% c("Glist", "data"))])

out   <- CDnetNPL(formula = yt ~ x1 + x2, contextual = TRUE, Glist = Glist, data = data)
summary(out)
# }

Run the code above in your browser using DataLab