GPTree {GPTreeO} | R Documentation |
Tree structure storing all nodes containing local GPs
Description
The base class which contains and where all parameters are set. Here, all information on how and when the splitting is carried out is stored.
wrapper
and gp_control
specify the Gaussian process (GP) implementation and its parameters. Moreover, minimum errors and calibration of the predictions are specified here, too.
Essential methods
The following three methods are essential for the package. The remaining ones are mostly not expected to be called by the user.
-
GPTree$new()
: Creates a new tree with specified parameters -
GPTree$update()
: Adds the information from the input point to the tree and updates local GPs -
GPTree$joint_prediction()
: Computes the joint prediction for a given input point
Brief package functionality overview
The tree collects the information from all GPNodes which in turn contain the local GP. Currently, GPs from the DiceKriging
package (WrappedDiceKrigingGP) and mlegp
package (WrappedmlegpGP) are implemented. The user can create their own wrapper using WrappedGP.
Public fields
Nbar
Maximum number of data points for each GP in a leaf before it is split. The default value is 1000.
retrain_buffer_length
Size of the retrain buffer. The buffer for a each node collects data points and holds them until the buffer length is reached. Then the GP in the node is updated with the data in the buffer. For a fixed
Nbar
, higher values forretrain_buffer_length
lead to faster run time (less frequent retraining), but the trade-off is a temporary reduced prediction accuracy. We advise that the choice forretrain_buffer_length
should depend on the chosenNbar
. By defaultretrain_buffer_length
is set equal toNbar
.gradual_split
If TRUE, gradual splitting is used for splitting. The default value is TRUE.
theta
Overlap ratio between two leafs in the split direction. The default value is 0.
wrapper
A string that indicates which GP implementation should be used. The current version includes wrappers for the packages
"DiceKriging"
and"mlegp"
. The default setting is"DiceKriging"
.gp_control
A
list
of control parameter that is forwarded to the wrapper. Here, the covariance function is specified.DiceKriging
allows for the following kernels, passed as string:"gauss"
,"matern5_2"
,"matern3_2"
,"exp"
,"powexp"
where"matern3_2"
is set as default.split_direction_criterion
A string that indicates which spitting criterion to use. The options are:
-
"max_spread"
: Split along the direction which has the largest data spread. -
"min_lengthscale"
: split along the direction with the smallest length-scale hyperparameter from the local GP. -
"max_spread_per_lengthscale"
: Split along the direction with the largest data spread relative to the corresponding GP length-scale hyperparameter. -
"max_corr"
: Split along the direction where the input data is most strongly correlated with the target variable. -
"principal_component"
: Split along the first principal component.
The default value is
"max_spread_per_lengthscale"
.-
split_position_criterion
A string indicating how the split position along the split direction should be set. Possible values are (
"median"
and"mean"
). The default is"median"
.shape_decay
A string specifying how the probability function for a point to be assigned to the left leaf should fall off in the overlap region. The available options are a linear shape (
"linear"
), an exponential shape ("exponential"
) or a Gaussian shape ("gaussian"
). Another option is to select no overlap region. This can be achieved by selecting"deterministic"
or to settheta
to 0. The default is"linear"
.use_empirical_error
If TRUE, the uncertainty is calibrated using recent data points. The default value is TRUE.
The most recent 25 observations are used to ensure that the prediction uncertainty yields approximately 68 % coverage. This coverage is only achieved if
theta = 0
(also together withgradual_split = TRUE
) is used. Nevertheless, the coverage will be closer to 68 % than it would be without calibration. The prediction uncertainties at the beginning are conservative and become less conservative with increasing number of input points.use_reference_gp
If TRUE, the covariance parameters determined for the GP in node 0 will be used for all subsequent GPs. The default is
FALSE
.min_abs_y_err
Minimum absolute error assumed for y data. The default value is 0.
min_rel_y_err
Minimum relative error assumed for y data. The default value is
100 * .Machine$double.eps
.min_abs_node_pred_err
Minimum absolute error on the prediction from a single node. The default value is 0.
min_rel_node_pred_err
Minimum relative error on the prediction from a single node. The default value is
100 * .Machine$double.eps
.prob_min_theta
Minimum probability after which the overlap shape gets truncated (either towards 0 or 1). The default value is 0.01.
add_buffer_in_prediction
If TRUE, points in the data buffers are added to the GP before prediction. They are added into a temporarily created GP which contains the not yet included points. The GP in the node is not yet updated. The default is
FALSE
.x_dim
Dimensionality of input points. It is set once the first point is received through the
update()
orjoint_prediction()
method. It needs to be specified ifmin_ranges
should be different from default.min_ranges
Smallest allowed input data spread (per dimension) before node splitting stops. It is set to its default
min_ranges = rep(0.0, x_dim)
once the first point is received through theupdate()
method.x_dim
needs to be specified by the user if it should be different from the default.max_cond_num
Add additional noise if the covariance matrix condition number exceeds this value. The default is
NULL
.max_points
The maximum number of points the tree is allowed to store. The default value is
Inf
.End of the user-defined input fields.
nodes
A hash to hold the GP tree, using string keys to identify nodes and their position in the tree ("0", "00", "01", "000", "001", "010", "011", etc.)
leaf_keys
Stores the keys ("0", "00", "01", "000", "001", "010", "011", etc.) for the leaves
n_points
Number of points in the tree
n_fed
Number of points fed to the tree
Methods
Public methods
Method new()
Usage
GPTree$new( Nbar = 1000, retrain_buffer_length = Nbar, gradual_split = TRUE, theta = 0, wrapper = "DiceKriging", gp_control = list(covtype = "matern3_2"), split_direction_criterion = "max_spread_per_lengthscale", split_position_criterion = "median", shape_decay = "linear", use_empirical_error = TRUE, use_reference_gp = FALSE, min_abs_y_err = 0, min_rel_y_err = 100 * .Machine$double.eps, min_abs_node_pred_err = 0, min_rel_node_pred_err = 100 * .Machine$double.eps, prob_min_theta = 0.01, add_buffer_in_prediction = FALSE, x_dim = 0, min_ranges = NULL, max_cond_num = NULL, max_points = Inf )
Arguments
Nbar
Maximum number of data points for each GP in a leaf before it is split. The default value is 1000.
retrain_buffer_length
Size of the retrain buffer. The buffer for a each node collects data points and holds them until the buffer length is reached. Then the GP in the node is updated with the data in the buffer. For a fixed
Nbar
, higher values forretrain_buffer_length
lead to faster run time (less frequent retraining), but the trade-off is a temporary reduced prediction accuracy. We advise that the choice forretrain_buffer_length
should depend on the chosenNbar
. By defaultretrain_buffer_length
is set equal toNbar
.gradual_split
If TRUE, gradual splitting is used for splitting. The default value is TRUE.
theta
Overlap ratio between two leafs in the split direction. The default value is 0.
wrapper
A string that indicates which GP implementation should be used. The current version includes wrappers for the packages
"DiceKriging"
and"mlegp"
. The default setting is"DiceKriging"
.gp_control
A
list
of control parameter that is forwarded to the wrapper. Here, the covariance function is specified.DiceKriging
allows for the following kernels, passed as string:"gauss"
,"matern5_2"
,"matern3_2"
,"exp"
,"powexp"
where"matern3_2"
is set as default.split_direction_criterion
A string that indicates which spitting criterion to use. The options are:
-
"max_spread"
: Split along the direction which has the largest data spread. -
"min_lengthscale"
: split along the direction with the smallest length-scale hyperparameter from the local GP. -
"max_spread_per_lengthscale"
: Split along the direction with the largest data spread relative to the corresponding GP length-scale hyperparameter. -
"max_corr"
: Split along the direction where the input data is most strongly correlated with the target variable. -
"principal_component"
: Split along the first principal component.
The default value is
"max_spread_per_lengthscale"
.-
split_position_criterion
A string indicating how the split position along the split direction should be set. Possible values are (
"median"
and"mean"
). The default is"median"
.shape_decay
A string specifying how the probability function for a point to be assigned to the left leaf should fall off in the overlap region. The available options are a linear shape (
"linear"
), an exponential shape ("exponential"
) or a Gaussian shape ("gaussian"
). Another option is to select no overlap region. This can be achieved by selecting"deterministic"
or to settheta
to 0. The default is"linear"
.use_empirical_error
If TRUE, the uncertainty is calibrated using recent data points. The default value is TRUE.
The most recent 25 observations are used to ensure that the prediction uncertainty yields approximately 68 % coverage. This coverage is only achieved if
theta = 0
(also together withgradual_split = TRUE
) is used. Nevertheless, the coverage will be closer to 68 % than it would be without calibration. The prediction uncertainties at the beginning are conservative and become less conservative with increasing number of input points.use_reference_gp
If TRUE, the covariance parameters determined for the GP in node 0 will be used for all subsequent GPs. The default is
FALSE
.min_abs_y_err
Minimum absolute error assumed for y data. The default value is 0.
min_rel_y_err
Minimum relative error assumed for y data. The default value is
100 * .Machine$double.eps
.min_abs_node_pred_err
Minimum absolute error on the prediction from a single node. The default value is 0.
min_rel_node_pred_err
Minimum relative error on the prediction from a single node. The default value is
100 * .Machine$double.eps
.prob_min_theta
Minimum probability after which the overlap shape gets truncated (either towards 0 or 1). The default value is 0.01.
add_buffer_in_prediction
If TRUE, points in the data buffers are added to the GP before prediction. They are added into a temporarily created GP which contains the not yet included points. The GP in the node is not yet updated. The default is
FALSE
.x_dim
Dimensionality of input points. It is set once the first point is received through the
update
method. It needs to be specified ifmin_ranges
should be different from default.min_ranges
Smallest allowed input data spread (per dimension) before node splitting stops. It is set to its default
min_ranges = rep(0.0, x_dim)
once the first point is received through theupdate
method.x_dim
needs to be specified by the user if it should be different from the default.max_cond_num
Add additional noise if the covariance matrix condition number exceeds this value. The default is
NULL
.max_points
The maximum number of points the tree is allowed to store. The default value is
Inf
.
Returns
A new GPTree object. Tree-specific parameters are listed in this object. The field nodes
contains a hash with all GPNodes and information related to nodes. The nodes in turn contain the local GPs. Nodes that have been split no longer contain a GP.
Examples
set.seed(42) ## Use the 1d toy data set from Higdon (2002) X <- as.matrix(sample(seq(0, 10, length.out = 31))) y <- sin(2 * pi * X / 10) + 0.2 * sin(2 * pi * X / 2.5) y_variance <- rep(0.1**2, 31) ## Initialize a tree with Nbar = 15, retrain_buffer_length = 15, use_empirical_error = FALSE, ## and default parameters otherwise gptree <- GPTree$new(Nbar = 15, retrain_buffer_length = 15, use_empirical_error = FALSE) ## For the purpose of this example, we simulate the data stream through a simple for loop. ## In actual applications, the input stream comes from e.g. a differential evolutionary scanner. ## We follow the procedure in the associated paper, thus letting the tree make a prediction ## first before we update the tree with the point. for (i in 1:nrow(X)) { y_pred_with_err = gptree$joint_prediction(X[i,], return_std = TRUE) ## Update the tree with the true (X,y) pair gptree$update(X[i,], y[i], y_variance[i]) } ## In the following, we go over different initializations of the tree ## 1. The same tree as before, but using the package mlegp: ## Note: since the default for gp_control is gp_control = list(covtype = "matern3_2"), ## we set gp_control to an empty list when using mlegp. gptree <- GPTree$new(Nbar = 15, retrain_buffer_length = 15, use_empirical_error = FALSE, wrapper = "mlegp", gp_control = list()) ## 2. Minimum working example: gptree <- GPTree$new() ## 3. Fully specified example corresponding to the default settings ## Here, we choose to specify x_dim and min_ranges so that they correspond to the default values. ## If we do not specifiy them here, they will be automatically specified once ## the update or predict method is called. gptree <- GPTree$new(Nbar = 1000, retrain_buffer_length = 1000, gradual_split = TRUE, theta = 0, wrapper = "DiceKriging", gp_control = list(covtype = "matern3_2"), split_direction_criterion = "max_spread_per_lengthscale", split_position_criterion = "mean", shape_decay = "linear", use_empirical_error = TRUE, use_reference_gp = FALSE, min_abs_y_err = 0, min_rel_y_err = 100 * .Machine$double.eps, min_abs_node_pred_err = 0, min_rel_node_pred_err = 100 * .Machine$double.eps, prob_min_theta = 0.01, add_buffer_in_prediction = FALSE, x_dim = ncol(X), min_ranges = rep(0.0, ncol(X)), max_cond_num = NULL, max_points = Inf)
Method add_node()
Add a new GPNode to the tree. IS EXPECTED TO NOT BE CALLED BY THE USER
Usage
GPTree$add_node(key)
Arguments
key
Key of the new leaf
Method get_marginal_point_prob()
Marginal probability for point x to belong to node with given key. IS EXPECTED TO NOT BE CALLED BY THE USER
Usage
GPTree$get_marginal_point_prob(x, key)
Arguments
x
Single input data point from the data stream; has to be a vector with length equal to x_dim
key
Key of the node
Returns
Returns the marginal probability for point x to belong to node with given key
Method update()
Assigns the given input point x with target variable y and associated variance y_var to a node and updates the tree accordingly
Usage
GPTree$update(x, y, y_var = 0, retrain_node = TRUE)
Arguments
x
Most recent single input data point from the data stream; has to be a vector with length equal to x_dim
y
Value of target variable at input point x; has to be a one-dimensional matrix or a vector; any further columns will be ignored
y_var
Variance of the target variable; has to be a one-dimensional matrix or vector
retrain_node
If TRUE, the GP node will be retrained after the point is added.
Details
The methods takes care of both updating an existing node and splitting the parent node into two child nodes. It ensures that the each child node has at least n_points_train_limit
in each GP. Further handling of duplicate points is also done here.
Method get_data_split_table()
Generates a table used to distribute data points from a node to two child nodes
Usage
GPTree$get_data_split_table(current_node)
Arguments
current_node
The GPNode whose data should be distributed
Returns
A matrix object
Method joint_prediction()
Compute the joint prediction from all relevant leaves for an input point x
Usage
GPTree$joint_prediction(x, return_std = TRUE)
Arguments
x
Single data point for which the predicted joint mean (and standard deviation) is computed; has to be a vector with length equal to x_dim
return_std
If TRUE, the standard error of the prediction is returned
Details
We follow Eqs. (5) and (6) in this paper
Returns
The prediction (and its standard error) for input point x from this tree
Method clone()
The objects of this class are cloneable with this method.
Usage
GPTree$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Examples
## ------------------------------------------------
## Method `GPTree$new`
## ------------------------------------------------
set.seed(42)
## Use the 1d toy data set from Higdon (2002)
X <- as.matrix(sample(seq(0, 10, length.out = 31)))
y <- sin(2 * pi * X / 10) + 0.2 * sin(2 * pi * X / 2.5)
y_variance <- rep(0.1**2, 31)
## Initialize a tree with Nbar = 15, retrain_buffer_length = 15, use_empirical_error = FALSE,
## and default parameters otherwise
gptree <- GPTree$new(Nbar = 15, retrain_buffer_length = 15, use_empirical_error = FALSE)
## For the purpose of this example, we simulate the data stream through a simple for loop.
## In actual applications, the input stream comes from e.g. a differential evolutionary scanner.
## We follow the procedure in the associated paper, thus letting the tree make a prediction
## first before we update the tree with the point.
for (i in 1:nrow(X)) {
y_pred_with_err = gptree$joint_prediction(X[i,], return_std = TRUE)
## Update the tree with the true (X,y) pair
gptree$update(X[i,], y[i], y_variance[i])
}
## In the following, we go over different initializations of the tree
## 1. The same tree as before, but using the package mlegp:
## Note: since the default for gp_control is gp_control = list(covtype = "matern3_2"),
## we set gp_control to an empty list when using mlegp.
gptree <- GPTree$new(Nbar = 15, retrain_buffer_length = 15, use_empirical_error = FALSE,
wrapper = "mlegp", gp_control = list())
## 2. Minimum working example:
gptree <- GPTree$new()
## 3. Fully specified example corresponding to the default settings
## Here, we choose to specify x_dim and min_ranges so that they correspond to the default values.
## If we do not specifiy them here, they will be automatically specified once
## the update or predict method is called.
gptree <- GPTree$new(Nbar = 1000, retrain_buffer_length = 1000,
gradual_split = TRUE, theta = 0, wrapper = "DiceKriging",
gp_control = list(covtype = "matern3_2"),
split_direction_criterion = "max_spread_per_lengthscale", split_position_criterion = "mean",
shape_decay = "linear", use_empirical_error = TRUE,
use_reference_gp = FALSE, min_abs_y_err = 0, min_rel_y_err = 100 * .Machine$double.eps,
min_abs_node_pred_err = 0, min_rel_node_pred_err = 100 * .Machine$double.eps,
prob_min_theta = 0.01, add_buffer_in_prediction = FALSE, x_dim = ncol(X),
min_ranges = rep(0.0, ncol(X)), max_cond_num = NULL, max_points = Inf)