HPLB {HPLB} | R Documentation |
Implementations of different HPLBs for TV as described in (Michel et al., 2020).
HPLB(
t,
rho,
s = 0.5,
estimator.type = "adapt",
alpha = 0.05,
tv.seq = seq(from = 0, to = 1, by = 1/length(t)),
custom.bounding.seq = NULL,
direction = rep("left", length(s)),
cutoff = 0.5,
verbose.plot = FALSE,
seed = 0,
...
)
t |
a numeric vector value corresponding to a natural ordering of the observations. For a two-sample test 0-1 numeric values values should be provided. |
rho |
a numeric vector value providing an ordering. This could be a binary classifier, a regressor, a witness function from a MMD kernel or anything else that would witness a distributional difference. |
s |
a numeric vector value giving split points on t. |
estimator.type |
a character value indicating which estimator to use. One option out of:
|
alpha |
a numeric value giving the overall type-I error control level. |
tv.seq |
a sequence of values between 0 and 1 used as the grid search for the total variation distance in case of tv-search. |
custom.bounding.seq |
a list of bounding functions respecting the order of tv.seq used in case of estimator.type "custom-tv-search". |
direction |
a character vector value made of "left" or "right" giving which distribution witness count to estimate (t<=s or t>s?). |
cutoff |
a numeric value. This is the cutoff used if bayes estimators are used. The theory suggests to use 1/2 but this can be changed. |
verbose.plot |
a boolean value for additional plots. |
seed |
an integer value. The seed for reproducibility. |
... |
additional parameters for the function |
a list
containing the relevant lower bounds estimates. For the total variation distance the relevant entry is tvhat
.
Loris Michel, Jeffrey Naef
L. Michel, J. Naef and N. Meinshausen (2020). High-Probability Lower Bounds for the Total Variation Distance
## libs
library(HPLB)
library(ranger)
library(distrEx)
## reproducibility
set.seed(0)
## Example 1: TV lower bound based on two samples (bayes estimator), Gaussian mean-shift example
n <- 100
means <- rep(c(0,2), each = n / 2)
x <- stats::rnorm(n, mean = means)
t <- rep(c(0,1), each = n / 2)
bayesRate <- function(x) {
return(stats::dnorm(x, mean = 2) /
(stats::dnorm(x, mean = 2) + stats::dnorm(x, mean = 0)))
}
# estimated HPLB
tvhat <- HPLB(t = t, rho = bayesRate(x), estimator.type = "bayes")
# true TV
TotalVarDist(e1 = Norm(2,1), e2 = Norm(0,1))
## Example 2: optimal mixture detection (adapt estimator), Gaussian mean-shift example
n <- 100
mean.shift <- 2
t.train <- runif(n, 0 ,1)
x.train <- ifelse(t.train>0.5, stats::rnorm(n, mean.shift), stats::rnorm(n))
rf <- ranger::ranger(t~x, data.frame(t=t.train,x=x.train))
n <- 100
t.test <- runif(n, 0 ,1)
x.test <- ifelse(t.test>0.5, stats::rnorm(n, mean.shift), stats::rnorm(n))
rho <- predict(rf, data.frame(t=t.test,x=x.test))$predictions
## out-of-sample
tv.oos <- HPLB(t = t.test, rho = rho, s = seq(0.1,0.9,0.1), estimator.type = "adapt")
## total variation values
tv <- c()
for (s in seq(0.1,0.9,0.1)) {
if (s<=0.5) {
D.left <- Norm(0,1)
} else {
D.left <- UnivarMixingDistribution(Dlist = list(Norm(0,1),Norm(mean.shift,1)),
mixCoeff = c(ifelse(s<=0.5, 1, 0.5/s), ifelse(s<=0.5, 0, (s-0.5)/s)))
}
if (s < 0.5) {
D.right <- UnivarMixingDistribution(Dlist = list(Norm(0,1),Norm(mean.shift,1)),
mixCoeff = c(ifelse(s<=0.5, (0.5-s)/(1-s), 0), ifelse(s<=0.5, (0.5/(1-s)), 1)))
} else {
D.right <- Norm(mean.shift,1)
}
tv <- c(tv, TotalVarDist(e1 = D.left, e2 = D.right))
}
## plot
oldpar <- par(no.readonly =TRUE)
par(mfrow=c(2,1))
plot(t.test,x.test,pch=19,xlab="t",ylab="x")
plot(seq(0.1,0.9,0.1), tv.oos$tvhat,type="l",ylim=c(0,1),xlab="t", ylab="TV")
lines(seq(0.1,0.9,0.1), tv, col="red",type="l")
par(oldpar)