cdnet {CDatanet} | R Documentation |
cdnet
estimates count data models with social interactions under rational expectations using the NPL algorithm (see Houndetoungan, 2024).
cdnet(
formula,
Glist,
group,
Rmax,
Rbar,
starting = list(lambda = NULL, Gamma = NULL, delta = NULL),
Ey0 = NULL,
ubslambda = 1L,
optimizer = "fastlbfgs",
npl.ctr = list(),
opt.ctr = list(),
cov = TRUE,
data
)
formula |
a class object formula: a symbolic description of the model. |
Glist |
adjacency matrix. For networks consisting of multiple subnets, |
group |
the vector indicating the individual groups. The default assumes a common group. For 2 groups; that is, |
Rmax |
an integer indicating the theoretical upper bound of |
Rbar |
an |
starting |
(optional) a starting value for |
Ey0 |
(optional) a starting value for |
ubslambda |
a positive value indicating the upper bound of |
optimizer |
is either |
npl.ctr |
a list of controls for the NPL method (see details). |
opt.ctr |
a list of arguments to be passed in |
cov |
a Boolean indicating if the covariance should be computed. |
data |
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables
in the model. If not found in data, the variables are taken from |
The count variable y_i
take the value r
with probability.
P_{ir} = F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s} + \mathbf{z}_i'\Gamma - a_{h(i),r}) - F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s} + \mathbf{z}_i'\Gamma - a_{h(i),r + 1}).
In this equation, \mathbf{z}_i
is a vector of control variables; F
is the distribution function of the standard normal distribution;
\bar{y}_i^{e,s}
is the average of E(y)
among peers using the s
-th network definition;
a_{h(i),r}
is the r
-th cut-point in the cost group h(i)
.
The following identification conditions have been introduced: \sum_{s = 1}^S \lambda_s > 0
, a_{h(i),0} = -\infty
, a_{h(i),1} = 0
, and
a_{h(i),r} = \infty
for any r \geq R_{\text{max}} + 1
. The last condition implies that P_{ir} = 0
for any r \geq R_{\text{max}} + 1
.
For any r \geq 1
, the distance between two cut-points is a_{h(i),r+1} - a_{h(i),r} = \delta_{h(i),r} + \sum_{s = 1}^S \lambda_s
As the number of cut-point can be large, a quadratic cost function is considered for r \geq \bar{R}_{h(i)}
, where \bar{R} = (\bar{R}_{1}, ..., \bar{R}_{L})
.
With the semi-parametric cost-function,
a_{h(i),r + 1} - a_{h(i),r}= \bar{\delta}_{h(i)} + \sum_{s = 1}^S \lambda_s
.
The model parameters are: \lambda = (\lambda_1, ..., \lambda_S)'
, \Gamma
, and \delta = (\delta_1', ..., \delta_L')'
,
where \delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l}, \bar{\delta}_l)'
for l = 1, ..., L
.
The number of single parameters in \delta_l
depends on R_{\text{max}}
and \bar{R}_{l}
. The components \delta_{l,2}, ..., \delta_{l,\bar{R}_l}
or/and
\bar{\delta}_l
must be removed in certain cases.
If R_{\text{max}} = \bar{R}_{l} \geq 2
, then \delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l})'
.
If R_{\text{max}} = \bar{R}_{l} = 1
(binary models), then \delta_l
must be empty.
If R_{\text{max}} > \bar{R}_{l} = 1
, then \delta_l = \bar{\delta}_l
.
npl.ctr
The model parameters are estimated using the Nested Partial Likelihood (NPL) method. This approach
starts with a guess of \theta
and E(y)
and constructs iteratively a sequence
of \theta
and E(y)
. The solution converges when the \ell_1
-distance
between two consecutive \theta
and E(y)
is less than a tolerance.
The argument npl.ctr
must include
the tolerance of the NPL algorithm (default 1e-4),
the maximal number of iterations allowed (default 500),
a boolean indicating if the estimate should be printed at each step.
the number of simulations performed use to compute integral in the covariance by important sampling.
A list consisting of:
info |
a list of general information about the model. |
estimate |
the NPL estimator. |
Ey |
|
GEy |
the average of |
cov |
a list including (if |
details |
step-by-step output as returned by the optimizer. |
Houndetoungan, E. A. (2024). Count Data Models with Social Interactions under Rational Expectations. Available at SSRN 3721250, doi:10.2139/ssrn.3721250.
set.seed(123)
M <- 5 # Number of sub-groups
nvec <- round(runif(M, 100, 200))
n <- sum(nvec)
# Adjacency matrix
A <- list()
for (m in 1:M) {
nm <- nvec[m]
Am <- matrix(0, nm, nm)
max_d <- 30 #maximum number of friends
for (i in 1:nm) {
tmp <- sample((1:nm)[-i], sample(0:max_d, 1))
Am[i, tmp] <- 1
}
A[[m]] <- Am
}
Anorm <- norm.network(A) #Row-normalization
# X
X <- cbind(rnorm(n, 1, 3), rexp(n, 0.4))
# Two group:
group <- 1*(X[,1] > 0.95)
# Networks
# length(group) = 2 and unique(sort(group)) = c(0, 1)
# The networks must be defined as to capture:
# peer effects of `0` on `0`, peer effects of `1` on `0`
# peer effects of `0` on `1`, and peer effects of `1` on `1`
G <- list()
cums <- c(0, cumsum(nvec))
for (m in 1:M) {
tp <- group[(cums[m] + 1):(cums[m + 1])]
Am <- A[[m]]
G[[m]] <- norm.network(list(Am * ((1 - tp) %*% t(1 - tp)),
Am * ((1 - tp) %*% t(tp)),
Am * (tp %*% t(1 - tp)),
Am * (tp %*% t(tp))))
}
# Parameters
lambda <- c(0.2, 0.3, -0.15, 0.25)
Gamma <- c(4.5, 2.2, -0.9, 1.5, -1.2)
delta <- rep(c(2.6, 1.47, 0.85, 0.7, 0.5), 2)
# Data
data <- data.frame(X, peer.avg(Anorm, cbind(x1 = X[,1], x2 = X[,2])))
colnames(data) = c("x1", "x2", "gx1", "gx2")
ytmp <- simcdnet(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2),
lambda = lambda, Gamma = Gamma, delta = delta, group = group,
data = data)
y <- ytmp$y
hist(y, breaks = max(y) + 1)
table(y)
# Estimation
est <- cdnet(formula = y ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2), group = group,
optimizer = "fastlbfgs", data = data,
opt.ctr = list(maxit = 5e3, eps_f = 1e-11, eps_g = 1e-11))
summary(est)