bfi {BFI} | R Documentation |
Bayesian Federated Inference
Description
bfi
function can be used (in the central server) to combine inference results from separate data sets (without combining the data) to approximate what would have been inferred had the data sets been merged. For now the function can handle linear and logistic regression models, but code for more models will be available in the near future.
bfi command
Usage
bfi(theta_hats = NULL, A_hats, Lambda, stratified = FALSE,
strat_par = NULL, center_spec = NULL)
Arguments
theta_hats |
a list of |
A_hats |
a list of |
Lambda |
a list of |
stratified |
logical flag for performing the stratified analysis. If |
strat_par |
a one- or two-element integer vector for indicating the stratification parameter(s). The values |
center_spec |
a vector of |
Details
bfi
function implements the BFI approach described in the paper Jonker et. al. (2023) given in the references.
The inference results gathered from different (L
) centers are combined, and the BFI estimates of the model parameters and curvature matrix evaluated at that point are returned.
The inference result from each center must be obtained using the MAP.estimation
function separately, and then all of these results (coming from different centers) should be compiled into a list to be used as an input of bfi()
.
The models in the different centers should be defined in exactly the same way; among others, exactly the same covariates should be included in the models. The parameter vectors should be defined exactly the same, so that the L
vectors and matrices in the input lists theta_hat
's and A_hat
's are defined in the same way (e.g., the covariates need to be included in the models in the same order).
Note that the order of the elements in the lists theta_hats
, A_hats
and Lambda
, must be the same with respect to the centers, so that in every list the element at the \ell^{\th}
position is from the center \ell
. This should also be the case for the vector center_spec
.
If for the locations intercept = FALSE
, the stratified analysis is not possible anymore for the binomial
family.
If stratified = FALSE
, both strat_par
and center_spec
must be NULL
(the defaults), while if stratified = TRUE
only one of the two must be NULL
.
If stratified = FALSE
and all the L+1
matrices are equal, it is sufficient to give a (list of) one matrix only.
In both cases of the stratified
argument (TRUE
or FALSE
), if only the first L
matrices are equal, the argument Lambda
can be a list of two matrices, so that the fist matrix represents the chosen variance-covariance matrix for local centers and the second one is the chosen matrix for the combined data set.
The last matrix of the list in the argument Lambda
can be built by the function inv.prior.cov()
.
If the data type used in the argument center_spec
is continuous, one can use stratified = TRUE
and center_spec = NULL
, and set strat_par
not to NULL
(i.e., to 1
, 2
or both (1, 2)
). Indeed, in this case, the stratification parameter(s) given in the argument strat_par
are assumed to be different across the centers.
Value
bfi
returns a list containing the following components:
theta_hat |
the vector of estimates obtained by combining the inference results from the |
A_hat |
minus the curvature (or Hessian) matrix obtained by the |
sd |
the vector of standard deviation of the estimates in |
Author(s)
Hassan Pazira
Maintainer: Hassan Pazira hassan.pazira@radboudumc.nl
References
Jonker M.A., Pazira H. and Coolen A.C.C. (2024). Bayesian federated inference for estimating statistical models based on non-shared multicenter data sets, Statistics in Medicine, 1-18. <https://doi.org/10.1002/sim.10072>
See Also
MAP.estimation
and inv.prior.cov
Examples
###############################################
## Example 1: y ~ Binomial (L=2 centers) ##
###############################################
#------------------
# Model Assumption:
#------------------
beta <- 1:4 # regression coefficients (beta[1] is the intercept)
#-----------------------------------
# Data Simulation for Local Center 1
#-----------------------------------
n1 <- 30 # sample size of center 1
X1 <- data.frame(x1=rnorm(n1), # continuous variable
x2=sample(0:2, n1, replace=TRUE)) # categorical variable
# make dummy variables
X1x2_1 <- ifelse(X1$x2 == '1', 1, 0)
X1x2_2 <- ifelse(X1$x2 == '2', 1, 0)
X1$x2 <- as.factor(X1$x2)
# linear predictor:
eta1 <- beta[1] + X1$x1 * beta[2] + X1x2_1 * beta[3] + X1x2_2 * beta[4]
# inverse of the link function ( g^{-1}(\eta) = \mu ):
mu1 <- binomial()$linkinv(eta1)
y1 <- rbinom(n1, 1, mu1)
#-----------------------------------
# Data Simulation for Local Center 2
#-----------------------------------
n2 <- 50 # sample size of center 2
X2 <- data.frame(x1=rnorm(n2), # continuous variable
x2=sample(0:2, n2, replace=TRUE)) # categorical variable
# make dummy variables:
X2x2_1 <- ifelse(X2$x2 == '1', 1, 0)
X2x2_2 <- ifelse(X2$x2 == '2', 1, 0)
X2$x2 <- as.factor(X2$x2)
# linear predictor:
eta2 <- beta[1] + X2$x1 * beta[2] + X2x2_1 * beta[3] + X2x2_2 * beta[4]
# inverse of the link function:
mu2 <- binomial()$linkinv(eta2)
y2 <- rbinom(n2, 1, mu2)
#---------------------
# Load the BFI package
#---------------------
library(BFI)
#--------------------------
# MAP Estimates at Center 1
#--------------------------
# assume the same inverse covariance matrix (Lambda) for both centers:
Lambda <- inv.prior.cov(X1, lambda=0.01, family=binomial)
fit1 <- MAP.estimation(y1, X1, family=binomial, Lambda)
theta_hat1 <- fit1$theta_hat # intercept and coefficient estimates
A_hat1 <- fit1$A_hat # minus the curvature matrix
#--------------------------
# MAP Estimates at Center 2
#--------------------------
fit2 <- MAP.estimation(y2, X2, family=binomial, Lambda)
theta_hat2 <- fit2$theta_hat
A_hat2 <- fit2$A_hat
#----------------------
# BFI at Central Center
#----------------------
A_hats <- list(A_hat1, A_hat2)
theta_hats <- list(theta_hat1, theta_hat2)
bfi <- bfi(theta_hats, A_hats, Lambda)
class(bfi)
summary(bfi, cur_mat=TRUE)
#--------------------
# Stratified Analysis
#--------------------
# By running the following line an error appears because when stratified = TRUE,
# both 'strat_par' and 'center_spec' can not be NULL:
Just4check1 <- try(bfi(theta_hats, A_hats, Lambda, stratified=TRUE), TRUE)
class(Just4check1) # By default, both 'strat_par' and 'center_spec' are NULL!
# By running the following line an error appears because when stratified = TRUE,
# last matrix in 'Lambda' should not have the same dim. as the other local matrices:
Just4check2 <- try(bfi(theta_hats, A_hats, Lambda, stratified=TRUE, strat_par=1), TRUE)
class(Just4check2) # All matices in Lambda have the same dimension!
# Stratified analysis when 'intercept' varies across two centers:
newLam <- inv.prior.cov(X1, lambda=c(0.1, 0.3), family=binomial, stratified=TRUE,
strat_par = 1)
bfi <- bfi(theta_hats, A_hats, list(Lambda, newLam), stratified=TRUE, strat_par=1)
summary(bfi, cur_mat=TRUE)
###############################################
## Example 2: y ~ Gaussian (L=3 centers) ##
###############################################
#-------------------
# Model Assumptions:
#-------------------
p <- 3 # number of coefficients without 'intercept'
theta <- c(1, rep(2, p), 1.5) # reg. coefficients (theta[1] is 'intercept') & 'sigma2' = 1.5
#-----------------------------------
# Data Simulation for Local Center 1
#-----------------------------------
n1 <- 30 # sample size of center 1
X1 <- data.frame(matrix(rnorm(n1 * p), n1, p)) # continuous variables
# linear predictor:
eta1 <- theta[1] + as.matrix(X1) %*% theta[2:4]
# inverse of the link function ( g^{-1}(\eta) = \mu ):
mu1 <- gaussian()$linkinv(eta1)
y1 <- rnorm(n1, mu1, sd = sqrt(theta[5]))
#-----------------------------------
# Data Simulation for Local Center 2
#-----------------------------------
n2 <- 40 # sample size of center 2
X2 <- data.frame(matrix(rnorm(n2 * p), n2, p)) # continuous variables
# linear predictor:
eta2 <- theta[1] + as.matrix(X2) %*% theta[2:4]
# inverse of the link function:
mu2 <- gaussian()$linkinv(eta2)
y2 <- rnorm(n2, mu2, sd = sqrt(theta[5]))
#-----------------------------------
# Data Simulation for Local Center 3
#-----------------------------------
n3 <- 50 # sample size of center 3
X3 <- data.frame(matrix(rnorm(n3 * p), n3, p)) # continuous variables
# linear predictor:
eta3 <- theta[1] + as.matrix(X3) %*% theta[2:4]
# inverse of the link function:
mu3 <- gaussian()$linkinv(eta3)
y3 <- rnorm(n3, mu3, sd = sqrt(theta[5]))
#---------------------------
# Inverse Covariance Matrix
#---------------------------
# Creating the inverse covariance matrix for the Gaussian prior distribution:
Lambda <- inv.prior.cov(X1, lambda=0.05, family=gaussian) # the same for both centers
#--------------------------
# MAP Estimates at Center 1
#--------------------------
fit1 <- MAP.estimation(y1, X1, family=gaussian, Lambda)
theta_hat1 <- fit1$theta_hat # intercept and coefficient estimates
A_hat1 <- fit1$A_hat # minus the curvature matrix
#--------------------------
# MAP Estimates at Center 2
#--------------------------
fit2 <- MAP.estimation(y2, X2, family=gaussian, Lambda)
theta_hat2 <- fit2$theta_hat
A_hat2 <- fit2$A_hat
#--------------------------
# MAP Estimates at Center 3
#--------------------------
fit3 <- MAP.estimation(y3, X3, family=gaussian, Lambda)
theta_hat3 <- fit3$theta_hat
A_hat3 <- fit3$A_hat
#----------------------
# BFI at Central Center
#----------------------
A_hats <- list(A_hat1, A_hat2, A_hat3)
theta_hats <- list(theta_hat1, theta_hat2, theta_hat3)
bfi <- bfi(theta_hats, A_hats, Lambda)
summary(bfi, cur_mat=TRUE)
#--------------------
# Stratified Analysis
#--------------------
# Stratified analysis when 'intercept' varies across two centers:
newLam1 <- inv.prior.cov(X1, lambda=c(0.1,0.3), family=gaussian, stratified=TRUE,
strat_par = 1, L=3)
# 'newLam1' is used the prior for combined data and 'Lambda' is used the prior for locals
bfi1 <- bfi(theta_hats, A_hats, list(Lambda, newLam1), stratified=TRUE, strat_par=1)
summary(bfi1, cur_mat=TRUE)
# Stratified analysis when 'sigma2' varies across two centers:
newLam2 <- inv.prior.cov(X1, lambda=c(0.1,0.3), family=gaussian, stratified=TRUE,
strat_par = 2, L=3)
# 'newLam2' is used the prior for combined data and 'Lambda' is used the prior for locals
bfi2 <- bfi(theta_hats, A_hats, list(Lambda, newLam2), stratified=TRUE, strat_par=2)
summary(bfi2, cur_mat=TRUE)
# Stratified analysis when 'intercept' and 'sigma2' vary across 2 centers:
newLam3 <- inv.prior.cov(X1, lambda=c(0.1,0.2,0.3), family=gaussian, stratified=TRUE,
strat_par = c(1, 2), L=3)
# 'newLam3' is used the prior for combined data and 'Lambda' is used the prior for locals
bfi3 <- bfi(theta_hats, A_hats, list(Lambda, newLam3), stratified=TRUE, strat_par=1:2)
summary(bfi3, cur_mat=TRUE)
#---------------------------
# Center Specific Covariates
#---------------------------
# Assume the first and third centers have the same center-specific covariate value of '3',
# while this value for the second center is '1', i.e., center_spec = c(3,1,3)
newLam4 <- inv.prior.cov(X1, lambda=c(0.1, 0.2, 0.3), family=gaussian, stratified=TRUE,
center_spec = c(3,1,3), L=3)
# 'newLam4' is used the prior for combined data and 'Lambda' is used the prior for locals
bfi4 <- bfi(theta_hats, A_hats, list(Lambda, newLam4), stratified=TRUE, center_spec = c(3,1,3))
summary(bfi4, cur_mat=TRUE)