experimental_design |
(required) Defines what the experiment looks
like, e.g. cv(bt(fs,20)+mb,3,2)+ev for 2 times repeated 3-fold
cross-validation with nested feature selection on 20 bootstraps and
model-building, and external validation. The basic workflow components are:
-
fs : (required) feature selection step.
-
mb : (required) model building step.
-
ev : (optional) external validation. Note that internal validation due
to subsampling will always be conducted if the subsampling methods create
any validation data sets.
The different components are linked using + .
Different subsampling methods can be used in conjunction with the basic
workflow components:
-
bs(x,n) : (stratified) .632 bootstrap, with n the number of
bootstraps. In contrast to bt , feature pre-processing parameters and
hyperparameter optimisation are conducted on individual bootstraps.
-
bt(x,n) : (stratified) .632 bootstrap, with n the number of
bootstraps. Unlike bs and other subsampling methods, no separate
pre-processing parameters or optimised hyperparameters will be determined
for each bootstrap.
-
cv(x,n,p) : (stratified) n -fold cross-validation, repeated p times.
Pre-processing parameters are determined for each iteration.
-
lv(x) : leave-one-out-cross-validation. Pre-processing parameters are
determined for each iteration.
-
ip(x) : imbalance partitioning for addressing class imbalances on the
data set. Pre-processing parameters are determined for each partition. The
number of partitions generated depends on the imbalance correction method
(see the imbalance_correction_method parameter). Imbalance partitioning
does not generate validation sets.
As shown in the example above, sampling algorithms can be nested.
The simplest valid experimental design is fs+mb , which corresponds to a
TRIPOD type 1a analysis. Type 1b analyses are only possible using
bootstraps, e.g. bt(fs+mb,100) . Type 2a analyses can be conducted using
cross-validation, e.g. cv(bt(fs,100)+mb,10,1) . Depending on the origin of
the external validation data, designs such as fs+mb+ev or
cv(bt(fs,100)+mb,10,1)+ev constitute type 2b or type 3 analyses. Type 4
analyses can be done by obtaining one or more familiarModel objects from
others and applying them to your own data set.
Alternatively, the experimental_design parameter may be used to provide a
path to a file containing iterations, which is named ####_iterations.RDS
by convention. This path can be relative to the directory of the current
experiment (experiment_dir ), or an absolute path. The absolute path may
thus also point to a file from a different experiment.
|