mmd_est {regMMD} | R Documentation |
Fits a statistical models to the data, using the robust procedure based on maximum mean discrepancy (MMD) minimization introduced and studied in Briol et al. (2019); Chérief-Abdellatif and Alquier (2022).
mmd_est(x, model, par1, par2, kernel, bdwth, control= list())
x |
Data. Must be a vector for univariate models, a matrix of dimension n by d, where n is the sample size and d the dimension of the model. |
model |
Parametric model to be fitted to the data. No default. See details for the list of available models. |
par1 |
First parameter of the model. In models where the first parameter is fixed, it is necessary to provide a value for |
par2 |
Second parameter of the model (if any). In models where the second parameter is fixed, it is necessary to provide a value for |
kernel |
Kernel to be used in the MMD. Available options for |
bdwth |
Bandwidth parameter for the kernel. |
control |
A |
Available options for model
are:
beta
"Beta distribution with pdf ~x^{a-1}(1-x)^(b-1)
on [0,1]
, par1
=a
and par2
=b
are both estimated.
binomial
"Binomial distribution with pmf ~p^{x}(1-p)^{N-x}
on \{0,1,...,N\}
, par1
=N
and par2
=p
are both estimated. Note that in this case, if the user specifies a value for N
, it is used as an upper bound rather than an initialization.
binomial.prob
"Binomial distribution with pmf ~p^{x}(1-p)^{N-x}
on \{0,1,...,N\}
, par1
=N
is fixed and must be specified by the user while par2
=p
is estimated.
binomial.size
"Binomial distribution with pmf ~p^{x}(1-p)^{N-x}
on \{0,1,...,N\}
, par1
=N
is estimated while par2
=p
fixed and must be specified by the user. Note that in this case, if the user specifies a value for N
, it is used as an upper bound rather than an initialization.
Cauchy
"Cauchy distribution with pdf ~1/(1+(x-m)^2)
, par1
=m
is estimated.
continuous.uniform.loc
"Uniform distribution with pdf ~1
on [m-L/2,m+L/2]
, par1
=m
is estimated while par2
=L
is fixed and must be specified by the user.
continuous.uniform.upper
"Uniform distribution with pdf ~1
on [a,b]
, par1
=a
is fixed and must be specified by the user while par2
=b
is estimated.
continuous.uniform.lower.upper
"Uniform distribution with pdf ~1
on [a,b]
, par1
=a
and par2
=b
are estimated.
Dirac
"Dirac mass at point a
on the reals, par1
=a
is estimated.
discrete.uniform
"Uniform distribution with pmf ~1
on \{1,2,..,M\}
, par1
=M
is estimated. Note that in this case, if the user specifies a value for M
, it is used as an upper bound rather than an initialization.
exponential
"Exponential distribution with pdf ~\exp(-b x)
on positive reals R_+
, par1
=b
is estimated.
gamma
"Gamma distribution with pdf ~x^{a-1}\exp(-b x)
on positive reals R_+
, par1
=a>=0.5
and par2
=b
are estimated.
gamma.shape
"Gamma distribution with pdf ~x^{a-1}\exp(-b x)
on positive reals R_+
, par1
=a>=0.5
is estimated while par2
=b
is fixed and must be specified by the user.
gamma.rate
"Gamma distribution with pdf ~x^{a-1}\exp(-b x)
on positive reals R_+
, par1
=a>=0.5
is fixed and must be specified by the user while par2
=b
is estimated.
Gaussian
"Gaussian distribution with pdf~\exp(-(x-m)^2/2s^2)
on reals R
, par1
=m
and par2
=s
are estimated.
Gaussian.loc
"Gaussian distribution with pdf ~\exp(-(x-m)^2/2s^2)
on reals R
, par1
=m
is estimated while par2
=s
is fixed and must be specified by the user.
Gaussian.scale
"Gaussian distribution with pdf ~\exp(-(x-m)^2/2s^2)
on reals R
, par1
=m
is fixed and must be specified by the user while par2
=s
is estimated.
geometric
"Geometric distribution with pmf ~p(1-p)^x
on \{0,1,2,...\}
, par1
=p
is estimated.
multidim.Dirac
"Dirac mass at point a
on R^d
, par1
=a
(d
-dimensional vector) is estimated.
multidim.Gaussian
"Gaussian distribution with pdf ~\exp(-(x-m)'U'U(x-m)
on R^d
, par1
=m
(d
-dimensional vector) and par2
=U
(d
-d
matrix) are estimated.
multidim.Gaussian.loc
"Gaussian distribution with pdf ~\exp(-\|x-m\|^2/2s^2)
on R^d
, par1
=m
(d
-dimensional vector) is estimated while par2
=s
is fixed.
multidim.Gaussian.scale
"Gaussian distribution with pdf ~\exp(-(x-m)'U'U(x-m)
on R^d
, par1
=m
(d
-dimensional vector) is fixed and must be specified by the user while and par2
=U
(d
-d
matrix) is estimated.
Pareto
"Pareto distribution with pmf ~1/x^{a+1}
on the reals >1
, par1
=a
is estimated.
Poisson
"Poisson distribution with pmf ~b^x/x!
on \{0,1,2,...\}
, par1
=b
is estimated.
The control
argument is a list that can supply any of the following components:
Length of the burn-in period in GD or SGD. burnin
must be a non-negative integer and defaut burnin
==500
.
Number of iterations performed after the burn-in period in GD or SGD. nsetps
must be an integer strictly larger than 2 and by default nsteps
=1000
Stepsize parameter. An adaptive gradient step is used (adagrad), but it is possible to pre-multiply it by stepsize
. It must be strictly positive number and by default stepsize
=1
Parameter used in adagrad to avoid numerical errors in the computation of the step-size. epsilon
must be a strictly positive real number and by default epsilon
=10^{-4}
.
Optimization method to be used: "EXACT"
for exact, "GD"
for gradient descent and "SGD"
for stochastic gradient descent. Not all methods are available for all models. By default, exact is preferred to GD which is prefered to SGD.
MMD_est
returns an object of class
"estMMD"
.
The functions summary
can be used to print the results.
An object of class estMMD
is a list containing the following components:
model |
Model estimated |
par1 |
In models where the first parameter is fixed, this is the value |
par2 |
In models where the second parameter is fixed, this is the value |
kernel |
Kernel used in the MMD |
bdwth |
Bandwidth used. That is, either the value specified by the user, either the bandwidth computedby the median heuristic |
burnin |
Number of steps in the burnin of the GD or SGD algorithm |
nstep |
Number of steps in the GD or SGD algorithm |
stepsize |
Stepize parameter in GD or SGD |
epsilon |
Parameter used in adagrad to avoid numerical errors in the computation of the step-size |
method |
Optimization method used |
error |
Error message (if any) |
estimator |
Estimated parameter(s) |
type |
Takes the value " |
Briol F, Barp A, Duncan AB, Girolami M (2019).
“Statistical inference for generative models with maximum mean discrepancy.”
arXiv preprint arXiv:1906.05944.
Chérief-Abdellatif B, Alquier P (2022).
“Finite Sample Properties of Parametric MMD Estimation: Robustness to Misspecification and Dependence.”
Bernoulli, 28(1), 181-213.
Garreau D, Jitkrittum W, Kanagawa M (2017).
“Large sample analysis of the median heuristic.”
arXiv preprint arXiv:1707.07269.
#simulate data
x = rnorm(50,0,1.5)
# estimate the mean and variance (assuming the data is Gaussian)
Est = mmd_est(x, model="Gaussian")
# print a summary
summary(Est)
# estimate the mean (assuming the data is Gaussian with known standard deviation =1.5)
Est2 = mmd_est(x, model="Gaussian.loc", par2=1.5)
# print a summary
summary(Est2)
# estimate the standard deviation (assuming the data is Gaussian with known mean = 0)
Est3 = mmd_est(x, model="Gaussian.scale", par1=0)
# print a summary
summary(Est3)
# test of the robustness
x[42] = 100
mean(x)
# estimate the mean and variance (assuming the data is Gaussian)
Est4 = mmd_est(x, model="Gaussian")
summary(Est4)