gendata {scR} | R Documentation |
Simulate data with appropriate structure to be used in estimating sample complexity bounds
gendata(model, dim, maxn, predictfn = NULL, varnames = NULL, ...)
model |
A binary classification model supplied by the user. Must take arguments |
dim |
Gives the horizontal dimension of the data (number of predictor variables) to be generated. |
maxn |
Gives the vertical dimension of the data (number of observations) to be generated. |
predictfn |
An optional user-defined function giving a custom predict method. If also using a user-defined model, the |
varnames |
An optional character vector giving the names of variables to be used for the generated data |
... |
Additional arguments that need to be passed to |
A data.frame
containing the simulated data.
estimate_accuracy()
, to estimate sample complexity bounds given the generated data
mylogit <- function(formula, data){
m <- structure(
glm(formula=formula,data=data,family=binomial(link="logit")),
class=c("svrclass","glm") #IMPORTANT - must use the class svrclass to work correctly
)
return(m)
}
mypred <- function(m,newdata){
out <- predict.glm(m,newdata,type="response")
out <- factor(ifelse(out>0.5,1,0),levels=c("0","1"))
#Important - must specify levels to account for possibility of all
#observations being classified into the same class in smaller samples
return(out)
}
formula <- two_year_recid ~
race + sex + age + juv_fel_count +
juv_misd_count + priors_count + charge_degree..misd.fel.
dat <- gendata(mylogit,7,7214,mypred,all.vars(formula))
library(parallel)
results <- estimate_accuracy(formula,mylogit,dat,predictfn = mypred,
nsample=10,
steps=10,
coreoffset = (detectCores() -2))