predictLayerHVT {HVT} | R Documentation |
Predict which cell and what level each point in the test dataset belongs to
predictLayerHVT(
data,
hvt_mapA,
hvt_mapB,
hvt_mapC,
mad.threshold = 0.2,
normalize = TRUE,
seed = 300,
distance_metric = "L1_Norm",
error_metric = "max",
child.level = 1,
line.width = c(0.6, 0.4, 0.2),
color.vec = c("#141B41", "#6369D1", "#D8D2E1"),
yVar = NULL,
...
)
data |
Data Frame. A dataframe containing test dataset. The dataframe should have atleast one variable used while training. The variables from this dataset can also be used to overlay as heatmap |
hvt_mapA |
A list of hvt.results.model obtained from HVT function while performing hierarchical vector quantization on train data |
hvt_mapB |
A list of hvt.results.model obtained from HVT function while performing hierarchical vector quantization on train data with novelty(s) |
hvt_mapC |
A list of hvt.results.model obtained from HVT function while performing hierarchical vector quantization on train data without novelty(s) |
mad.threshold |
A numeric values indicating the permissible Mean Absolute Deviation |
normalize |
Logical. A logical value indicating if the columns in your dataset should be normalized. Default value is TRUE. |
seed |
Numeric. Random Seed. |
distance_metric |
character. The distance metric can be 'Euclidean" or "Manhattan". Euclidean is selected by default. |
error_metric |
character. The error metric can be "mean" or "max". mean is selected by default |
child.level |
A number indicating the level for which the heat map is to be plotted.(Only used if hmap.cols is not NULL) |
line.width |
Vector. A line width vector |
color.vec |
Vector. A color vector |
yVar |
character. Name of the dependent variable(s) |
... |
color.vec and line.width can be passed from here |
Dataframe containing scored predicted layer output
Shubhra Prakash <shubhra.prakash@mu-sigma.com>, Sangeet Moy Das <sangeet.das@mu-sigma.com>, Shantanu Vaidya <shantanu.vaidya@mu-sigma.com>,Somya Shambhawi <somya.shambhawi@mu-sigma.com>
data(USArrests)
library("dplyr")
# Split in train and test
train <- USArrests[1:40, ]
test <- USArrests[41:50, ]
hvt_mapA <- list()
hvt_mapA <- HVT(train,
min_compression_perc = 70, quant.err = 0.2,
distance_metric = "L1_Norm", error_metric = "mean",
projection.scale = 10, normalize = TRUE,
quant_method = "kmeans"
)
identified_Novelty_cells <<- c(2, 10)
output_list <- removeNovelty(identified_Novelty_cells, hvt_mapA)
data_with_novelty <- output_list[[1]] %>% dplyr::select(!c("Cell.ID", "Cell.Number"))
hvt_mapB <- HVT(data_with_novelty,
n_cells = 3, quant.err = 0.2,
distance_metric = "L1_Norm", error_metric = "mean",
projection.scale = 10, normalize = TRUE,
quant_method = "kmeans"
)
dataset_without_novelty <- output_list[[2]]
mapA_scale_summary <- hvt_mapA[[3]]$scale_summary
hvt_mapC <- list()
hvt_mapC <- HVT(dataset_without_novelty,
n_cells = 15,
depth = 2, quant.err = 0.2, distance_metric = "L1_Norm",
error_metric = "max", quant_method = "kmeans",
projection.scale = 10, normalize = FALSE, scale_summary = mapA_scale_summary
)
predictions <- list()
predictions <- predictLayerHVT(test, hvt_mapA, hvt_mapB, hvt_mapC)