simvcd {scR} | R Documentation |
Estimate the Vapnik-Chervonenkis (VC) dimension of an arbitrary binary classification algorithm.
simvcd(
model,
dim,
packages = list(),
m = 1000,
k = 1000,
maxn = 5000,
parallel = TRUE,
coreoffset = 0,
predictfn = NULL,
a = 0.16,
a1 = 1.2,
a11 = 0.14927,
...
)
model |
A binary classification model supplied by the user. Must take arguments |
dim |
A positive integer giving dimension (number of input features) of the model. |
packages |
A |
m |
A positive integer giving the number of simulations to be performed at each design point (sample size value). Higher values give more accurate results but increase computation time. |
k |
A positive integer giving the number of design points (sample size values) for which the bounding function is to be estimated. Higher values give more accurate results but increase computation time. |
maxn |
Gives the vertical dimension of the data (number of observations) to be generated. |
parallel |
Boolean indicating whether or not to use parallel processing. |
coreoffset |
If |
predictfn |
An optional user-defined function giving a custom predict method. If also using a user-defined model, the |
a |
Scaling coefficient for the bounding function. Defaults to the value given by Vapnik, Levin and Le Cun 1994. |
a1 |
Scaling coefficient for the bounding function. Defaults to the value given by Vapnik, Levin and Le Cun 1994. |
a11 |
Scaling coefficient for the bounding function. Defaults to the value given by Vapnik, Levin and Le Cun 1994. |
... |
Additional arguments that need to be passed to |
A real number giving the estimated value of the VC dimension of the supplied model.
scb()
, to calculate sample complexity bounds given estimated VCD.
mylogit <- function(formula, data){
m <- structure(
glm(formula=formula,data=data,family=binomial(link="logit")),
class=c("svrclass","glm") #IMPORTANT - must use the class svrclass to work correctly
)
return(m)
}
mypred <- function(m,newdata){
out <- predict.glm(m,newdata,type="response")
out <- factor(ifelse(out>0.5,1,0),levels=c("0","1"))
#Important - must specify levels to account for possibility of all
#observations being classified into the same class in smaller samples
return(out)
}
library(parallel)
vcd <- simvcd(model=mylogit,dim=7,m=10,k=10,maxn=50,predictfn = mypred,
coreoffset = (detectCores() -2))