stan_sar {geostan} | R Documentation |
Fit data to an spatial Gaussian SAR (spatial error) model, or model a vector of spatially-autocorrelated parameters using a SAR prior model.
stan_sar(
formula,
slx,
re,
data,
C,
sar_parts = prep_sar_data(C),
family = auto_gaussian(),
prior = NULL,
ME = NULL,
centerx = FALSE,
prior_only = FALSE,
censor_point,
chains = 4,
iter = 2000,
refresh = 500,
keep_all = FALSE,
pars = NULL,
slim = FALSE,
drop = NULL,
control = NULL,
quiet = FALSE,
...
)
formula |
A model formula, following the R |
slx |
Formula to specify any spatially-lagged covariates. As in, |
re |
To include a varying intercept (or "random effects") term, alpha_re ~ N(0, alpha_tau) alpha_tau ~ Student_t(d.f., location, scale). With the SAR model, any |
data |
A |
C |
Spatial weights matrix (conventionally referred to as |
sar_parts |
Optional. If not provided, then |
family |
The likelihood function for the outcome variable. Current options are |
prior |
A named list of parameters for prior distributions (see
.
|
ME |
To model observational uncertainty (i.e. measurement or sampling error) in any or all of the covariates, provide a list of data as constructed by the |
centerx |
To center predictors on their mean values, use |
prior_only |
Logical value; if |
censor_point |
Integer value indicating the maximum censored value; this argument is for modeling censored (suppressed) outcome data, typically disease case counts or deaths. |
chains |
Number of MCMC chains to use. |
iter |
Number of samples per chain. |
refresh |
Stan will print the progress of the sampler every |
keep_all |
If |
pars |
Optional; specify any additional parameters you'd like stored from the Stan model. |
slim |
If |
drop |
Provide a vector of character strings to specify the names of any parameters that you do not want MCMC samples for. Dropping parameters in this way can improve sampling speed and reduce memory usage. The following parameter vectors can potentially be dropped from SAR models:
Using |
control |
A named list of parameters to control the sampler's behavior. See |
quiet |
Controls (most) automatic printing to the console. By default, any prior distributions that have not been assigned by the user are printed to the console. If |
... |
Other arguments passed to |
Discussions of SAR models may be found in Cliff and Ord (1981), Cressie (2015, Ch. 6), LeSage and Pace (2009), and LeSage (2014). The Stan implementation draws from Donegan (2021).
The general scheme of the SAR model for numeric vector y
is
y = \mu + ( I - \rho W)^{-1} \epsilon
\epsilon \sim Gauss(0, \sigma^2 I)
where W
is the spatial weights matrix, I
is the n-by-n identity matrix, and \rho
is a spatial autocorrelation parameter. In words, the errors of the regression equation are spatially autocorrelated.
Re-arranging terms, the model can also be written as follows:
y = \mu + \rho W (y - \mu) + \epsilon
which perhaps shows more intuitively the implicit spatial trend component, \rho W (y - \mu)
.
Most often, this model is applied directly to observations (referred to below as the auto-Gaussian model). The SAR model can also be applied to a vector of parameters inside a hierarchical model. The latter enables spatial autocorrelation to be modeled when the observations are discrete counts (e.g., disease incidence data).
A note on terminology: the spatial statistics literature conceptualizes the simultaneously-specified spatial autoregressive model (SAR) in relation to the conditionally-specified spatial autoregressive model (CAR) (see stan_car) (see Cliff and Ord 1981). The spatial econometrics literature, by contrast, refers to the simultaneously-specified spatial autoregressive (SAR) model as the spatial error model (SEM), and they contrast the SEM with the spatial lag model (which contains a spatially-lagged dependent variable on the right-hand-side of the regression equation) (see LeSage 2014).
When family = auto_gaussian()
, the SAR model is specified as follows:
y \sim Gauss(\mu, \Sigma)
\Sigma = \sigma^2 (I - \rho W)^{-1}(I - \rho W')^{-1}
where \mu
is the mean vector (with intercept, covariates, etc.), W
is a spatial weights matrix (usually row-standardized), and \sigma
is a scale parameter.
The SAR model contains an implicit spatial trend (i.e., spatial autocorrelation) component \phi
which is calculated as follows:
\phi = \rho W (y - \mu)
This term can be extracted from a fitted auto-Gaussian model using the spatial method.
When applied to a fitted auto-Gaussian model, the residuals.geostan_fit method returns 'de-trended' residuals R
by default. That is,
R = y - \mu - \rho W (y - \mu).
To obtain "raw" residuals (y - \mu
), use residuals(fit, detrend = FALSE)
. Similarly, the fitted values obtained from the fitted.geostan_fit will include the spatial trend term by default.
For family = poisson()
, the model is specified as:
y \sim Poisson(e^{O + \lambda})
\lambda \sim Gauss(\mu, \Sigma)
\Sigma = \sigma^2 (I - \rho W)^{-1}(I - \rho W')^{-1}.
If the raw outcome consists of a rate \frac{y}{p}
with observed counts y
and denominator p
(often this will be the size of the population at risk), then the offset term O=log(p)
is the log of the denominator.
This is often written (equivalently) as:
y \sim Poisson(e^{O + \mu + \phi})
\phi \sim Gauss(0, \Sigma)
\Sigma = \sigma^2 (I - \rho W)^{-1}(I - \rho W')^{-1}.
For Poisson models, the spatial method returns the parameter vector \phi
.
In the Poisson SAR model, \phi
contains a latent spatial trend as well as additional variation around it. If you would like to extract the latent/implicit spatial trend from \phi
, you can do so by calculating:
\rho W \phi.
For family = binomial()
, the model is specified as:
y \sim Binomial(N, \lambda)
logit(\lambda) \sim Gauss(\mu, \Sigma)
\Sigma = \sigma^2 (I - \rho W)^{-1}(I - \rho W')^{-1}.
where outcome data y
are counts, N
is the number of trials, and \lambda
is the rate of 'success'. Note that the model formula should be structured as: cbind(sucesses, failures) ~ 1
(for an intercept-only model), such that trials = successes + failures
.
For fitted Binomial models, the spatial
method will return the parameter vector phi
, equivalent to:
\phi = logit(\lambda) - \mu.
As is also the case for the Poisson model, \phi
contains a latent spatial trend as well as additional variation around it. If you would like to extract the latent/implicit spatial trend from \phi
, you can do so by calculating:
\rho W \phi.
The SAR models can also incorporate spatially-lagged covariates, measurement/sampling error in covariates (particularly when using small area survey estimates as covariates), missing outcome data, and censored outcomes (such as arise when a disease surveillance system suppresses data for privacy reasons). For details on these options, please see the Details section in the documentation for stan_glm.
An object of class class geostan_fit
(a list) containing:
Summaries of the main parameters of interest; a data frame.
Widely Applicable Information Criteria (WAIC) with a measure of effective number of parameters (eff_pars
) and mean log pointwise predictive density (lpd
), and mean residual spatial autocorrelation as measured by the Moran coefficient.
an object of class stanfit
returned by rstan::stan
a data frame containing the model data
the user-provided or default family
argument used to fit the model
The model formula provided by the user (not including CAR component)
The slx
formula
A list containing re
, the varying intercepts (re
) formula if provided, and
Data
a data frame with columns id
, the grouping variable, and idx
, the index values assigned to each group.
Prior specifications.
If covariates are centered internally (centerx = TRUE
), then x_center
is a numeric vector of the values on which covariates were centered.
A data frame with the name of the spatial component parameter (either "phi" or, for auto Gaussian models, "trend") and method ("SAR")
A list indicating if the object contains an ME model; if so, the user-provided ME list is also stored here.
Spatial weights matrix (in sparse matrix format).
Connor Donegan, connor.donegan@gmail.com
Cliff, A D and Ord, J K (1981). Spatial Processes: Models and Applications. Pion.
Cressie, Noel (2015 (1993)). Statistics for Spatial Data. Wiley Classics, Revised Edition.
Cressie, Noel and Wikle, Christopher (2011). Statistics for Spatio-Temporal Data. Wiley.
Donegan, Connor (2021). Building spatial conditional autoregressive (CAR) models in the Stan programming language. OSF Preprints. doi:10.31219/osf.io/3ey65.
LeSage, James (2014). What Regional Scientists Need to Know about Spatial Econometrics. The Review of Regional Science 44: 13-32 (2014 Southern Regional Science Association Fellows Address).
LeSage, James, & Pace, Robert Kelley (2009). Introduction to Spatial Econometrics. Chapman and Hall/CRC.
# model mortality risk
data(georgia)
W <- shape2mat(georgia, style = "W")
fit <- stan_sar(log(rate.male) ~ 1,
C = W,
data = georgia,
chains = 1, # for ex. speed only
iter = 700
)
rstan::stan_rhat(fit$stanfit)
rstan::stan_mcse(fit$stanfit)
print(fit)
plot(fit)
sp_diag(fit, georgia)
# a more appropriate model for count data:
fit2 <- stan_sar(deaths.male ~ offset(log(pop.at.risk.male)),
C = W,
data = georgia,
family = poisson(),
chains = 1, # for ex. speed only
iter = 700
)
sp_diag(fit2, georgia)