pagfl {PAGFL} | R Documentation |
The pairwise adaptive group fused lasso (PAGFL) by Mehrabani (2023) jointly estimates the latent group structure and group-specific slope parameters in a panel data model. It can handle static and dynamic panels, either with or without endogenous regressors.
pagfl(
formula,
data,
index = NULL,
n_periods = NULL,
lambda,
method = "PLS",
Z = NULL,
min_group_frac = 0.05,
bias_correc = FALSE,
kappa = 2,
max_iter = 5000,
tol_convergence = 1e-08,
tol_group = 0.001,
rho = 0.07 * log(N * n_periods)/sqrt(N * n_periods),
varrho = max(sqrt(5 * N * n_periods * p)/log(N * n_periods * p) - 7, 1),
verbose = TRUE,
parallel = TRUE,
...
)
## S3 method for class 'pagfl'
print(x, ...)
## S3 method for class 'pagfl'
formula(x, ...)
## S3 method for class 'pagfl'
df.residual(object, ...)
## S3 method for class 'pagfl'
summary(object, ...)
## S3 method for class 'pagfl'
coef(object, ...)
## S3 method for class 'pagfl'
residuals(object, ...)
## S3 method for class 'pagfl'
fitted(object, ...)
formula |
a formula object describing the model to be estimated. |
data |
a |
index |
a character vector holding two strings specifying the variable names that identify the cross-sectional unit and time period for each observation. The first string denotes the individual unit, while the second string represents the time period. In case of a balanced panel data set that is ordered in the long format, |
n_periods |
the number of observed time periods |
lambda |
the tuning parameter. |
method |
the estimation method. Options are
Default is |
Z |
a |
min_group_frac |
the minimum group size as a fraction of the total number of individuals |
bias_correc |
logical. If |
kappa |
the a non-negative weight placed on the adaptive penalty weights. Default is 2. |
max_iter |
the maximum number of iterations for the ADMM estimation algorithm. Default is 5000. |
tol_convergence |
the tolerance limit for the stopping criterion of the iterative ADMM estimation algorithm. Default is |
tol_group |
the tolerance limit for within-group differences. Two individuals |
rho |
the tuning parameter balancing the fitness and penalty terms in the IC that determines the penalty parameter |
varrho |
the non-negative Lagrangian ADMM penalty parameter. For PLS, the |
verbose |
logical. If |
parallel |
logical. If |
... |
ellipsis |
x |
of class |
object |
of class |
Consider the grouped panel data model
y_{it} = \gamma_i + \beta^\prime_{i} x_{it} + \epsilon_{it}, \quad i = 1, \dots, N, \; t = 1, \dots, T,
where y_{it}
is the scalar dependent variable, \gamma_i
is an individual fixed effect, x_{it}
is a p \times 1
vector of explanatory variables, and \epsilon_{it}
is a zero mean error.
The coefficient vector \beta_i
is subject to the latent group pattern
\beta_i = \sum_{k = 1}^K \alpha_k \bold{1} \{i \in G_k \},
with \cup_{k = 1}^K G_k = \{1, \dots, N\}
, G_k \cap G_j = \emptyset
and \| \alpha_k \| \neq \| \alpha_j \|
for any k \neq M
.
The PLS method jointly estimates the latent group structure and group-specific coefficient by minimizing the following criterion:
\frac{1}{T} \sum^N_{i=1} \sum^{T}_{t=1}(\tilde{y}_{it} - \beta^\prime_i \tilde{x}_{it})^2 + \frac{\lambda}{N} \sum_{1 \leq i} \sum_{i<j \leq N} \dot{w}_{ij} \| \beta_i - \beta_j \|,
where \tilde{y}_{it}
is the demeaned scalar dependent variable, \tilde{x}_{it}
denotes a p \times 1
vector of demeaned weakly exogenous explanatory variables, \lambda
is the penalty tuning parameter and \dot{w}_{ij}
reflects adaptive penalty weights (see Mehrabani, 2023, eq. 2.6). \| \cdot \|
denotes the Frobenius norm.
The adaptive weights \dot{w}_{ij}
are obtained by a preliminary individual least squares estimation.
The solution \hat{\bold{\beta}}
is computed via an iterative alternating direction method of multipliers (ADMM) algorithm (see Mehrabani, 2023, sec. 5.1).
PGMM employs a set of instruments \bold{Z}
to control for endogenous regressors. Using PGMM, \bold{\beta} = (\beta_1^\prime, \dots, \beta_N^\prime)^\prime
is estimated by minimizing:
\sum^N_{i = 1} \left[ \frac{1}{N} \sum_{t=1}^T z_{it} (\Delta y_{it} - \beta^\prime_i \Delta x_{it}) \right]^\prime W_i \left[\frac{1}{T} \sum_{t=1}^T z_{it}(\Delta y_{it} - \beta^\prime_i \Delta x_{it}) \right] + \frac{\lambda}{N} \sum_{1 \leq i} \sum_{i<j \leq N} \ddot{w}_{ij} \| \beta_i - \beta_j \|.
\ddot{w}_{ij}
are obtained by an initial GMM estimation. \Delta
gives the first differences operator \Delta y_{it} = y_{it} - y_{i t-1}
. W_i
represents a data-driven q \times q
weight matrix. I refer to Mehrabani (2023, eq. 2.10) for more details.
\bold{\beta}
is again estimated employing an efficient ADMM algorithm (Mehrabani, 2023, sec. 5.2).
Two individuals are assigned to the same group if \| \hat{\beta}_i - \hat{\beta}_j \| \leq \epsilon_{\text{tol}}
, where \epsilon_{\text{tol}}
is given by tol_group
. Subsequently, the number of groups follows as the number of distinct elements in \hat{\bold{\beta}}
. Given an estimated group structure, it is straightforward to obtain post-Lasso estimates using least squares.
We suggest identifying a suitable \lambda
parameter by passing a logarithmically spaced grid of candidate values with a lower limit of 0 and an upper limit that leads to a fully homogeneous panel. A BIC-type information criterion then selects the best fitting \lambda
value.
An object of class pagfl
holding
model |
a |
coefficients |
a |
groups |
a |
residuals |
a vector of residuals of the demeaned model, |
fitted |
a vector of fitted values of the demeaned model, |
args |
a |
IC |
a |
convergence |
a |
call |
the function call. |
A pagfl
object has print
, summary
, fitted
, residuals
, formula
, df.residual
, and coef
S3 methods.
Paul Haimerl
Dhaene, G., & Jochmans, K. (2015). Split-panel jackknife estimation of fixed-effect models. The Review of Economic Studies, 82(3), 991-1030. doi:10.1093/restud/rdv007.
Mehrabani, A. (2023). Estimation and identification of latent group structures in panel data. Journal of Econometrics, 235(2), 1464-1482. doi:10.1016/j.jeconom.2022.12.002.
# Simulate a panel with a group structure
sim <- sim_DGP(N = 20, n_periods = 80, p = 2, n_groups = 3)
y <- sim$y
X <- sim$X
df <- cbind(y = c(y), X)
# Run the PAGFL procedure
estim <- pagfl(y ~ ., data = df, n_periods = 80, lambda = 0.5, method = "PLS")
summary(estim)
# Lets pass a panel data set with explicit cross-sectional and time indicators
i_index <- rep(1:20, each = 80)
t_index <- rep(1:80, 20)
df <- data.frame(y = c(y), X, i_index = i_index, t_index = t_index)
estim <- pagfl(
y ~ ., data = df, index = c("i_index", "t_index"),
lambda = 0.5, method = "PLS"
)
summary(estim)