computeThreePointInfo {miic} | R Documentation |
Compute (conditional) three-point information
Description
Three point information is defined and computed as the difference of mutual information and conditional mutual information, e.g.
I(X;Y;Z|U) = I(X;Y|U) - Ik(X;Y|U,Z)
For discrete or categorical variables, the three-point information is computed with the empirical frequencies minus a complexity cost (computed as BIC or with the Normalized Maximum Likelihood).
Usage
computeThreePointInfo(
x,
y,
z,
df_conditioning = NULL,
maxbins = NULL,
cplx = c("nml", "bic"),
n_eff = -1,
sample_weights = NULL,
is_continuous = NULL
)
Arguments
x |
[a vector]
The |
y |
[a vector]
The |
z |
[a vector]
The |
df_conditioning |
[a data frame]
The data frame of the observations of the set of conditioning variables
|
maxbins |
[an integer] When the data contain continuous variables, the maximum number of bins allowed during the discretization. A smaller number makes the computation faster, a larger number allows finer discretization. |
cplx |
[a string] The complexity model:
|
n_eff |
[an integer] The effective number of samples. When there is significant autocorrelation between successive samples, you may want to specify an effective number of samples that is lower than the total number of samples. |
sample_weights |
[a vector of floats] Individual weights for each sample, used for the same reason as the effective number of samples but with individual weights. |
is_continuous |
[a vector of booleans]
Specify if each variable is to be treated as continuous (TRUE) or discrete
(FALSE), must be of length 'ncol(df_conditioning) + 3', in the order
|
Details
For variables X
, Y
, Z
and a set of conditioning
variables U
, the conditional three point information is defined as
Ik(X;Y;Z|U) = Ik(X;Y|U) - Ik(X;Y|U,Z)
where Ik
is the shifted or regularized conditional mutual information.
See computeMutualInfo
for the definition of Ik
.
Value
A list that contains :
i3: The estimation of (conditional) three-point information without the complexity cost.
i3k: The estimation of (conditional) three-point information with the complexity cost (i3k = i3 - cplx).
i2: For reference, the estimation of (conditional) mutual information
I(X;Y|U)
used in the estimation of i3.i2k: For reference, the estimation of regularized (conditional) mutual information
Ik(X;Y|U)
used in the estimation of i3k.
References
Cabeli et al., PLoS Comput. Biol. 2020, Learning clinical networks from medical records based on information estimates in mixed-type data
Affeldt et al., UAI 2015, Robust Reconstruction of Causal Graphical Models based on Conditional 2-point and 3-point Information
Examples
library(miic)
N <- 1000
# Dependence, conditional independence : X <- Z -> Y
Z <- runif(N)
X <- Z * 2 + rnorm(N, sd = 0.2)
Y <- Z * 2 + rnorm(N, sd = 0.2)
res <- computeThreePointInfo(X, Y, Z)
message("I(X;Y;Z) = ", res$i3)
message("Ik(X;Y;Z) = ", res$i3k)
# Independence, conditional dependence : X -> Z <- Y
X <- runif(N)
Y <- runif(N)
Z <- X + Y + rnorm(N, sd = 0.1)
res <- computeThreePointInfo(X, Y, Z)
message("I(X;Y;Z) = ", res$i3)
message("Ik(X;Y;Z) = ", res$i3k)