get_folds {aifeducation} | R Documentation |
Function creates cross-validation samples and ensures that the relative frequency for every category/label within a fold equals the relative frequency of the category/label within the initial data.
get_folds(target, k_folds)
target |
Named |
k_folds |
|
Return a list
with the following components:
val_sample:
vector
of strings
containing the names of cases of the validation sample.
train_sample:
vector
of strings
containing the names of cases of the train sample.
n_folds:
int
Number of realized folds.
unlabeled_cases:
vector
of strings
containing the names of the unlabeled cases.
The parameter target
allows cases with missing categories/labels.
These should be declared with NA
. All these cases are ignored for creating the
different folds. Their names are saved within the component unlabeled_cases
.
These cases can be used for Pseudo Labeling.
the function checks the absolute frequencies of every category/label. If the absolute frequency is not sufficient to ensure at least four cases in every fold, the number of folds is adjusted. In these cases, a warning is printed to the console. At least four cases per fold are necessary to ensure that the training of TextEmbeddingClassifierNeuralNet works well with all options turned on.
Other Auxiliary Functions:
array_to_matrix()
,
calc_standard_classification_measures()
,
check_embedding_models()
,
clean_pytorch_log_transformers()
,
create_iota2_mean_object()
,
create_synthetic_units()
,
generate_id()
,
get_coder_metrics()
,
get_n_chunks()
,
get_stratified_train_test_split()
,
get_synthetic_cases()
,
get_train_test_split()
,
is.null_or_na()
,
matrix_to_array_c()
,
split_labeled_unlabeled()
,
summarize_tracked_sustainability()
,
to_categorical_c()