rl_config_set {RLoptimal} | R Documentation |
Configuration of Reinforcement Learning
Description
Mainly settings for the arguments of the training() function. Not compatible with the new API stack introduced in Ray 2.10.0.
Usage
rl_config_set(
iter = 1000L,
save_start_iter = NULL,
save_every_iter = NULL,
cores = 4L,
gamma = 1,
lr = 5e-05,
train_batch_size = 10000L,
model = rl_dnn_config(),
sgd_minibatch_size = 200L,
num_sgd_iter = 20L,
...
)
Arguments
iter |
A positive integer value. Number of iterations. |
save_start_iter , save_every_iter |
An integer value. Save checkpoints every 'save_every_iter' iterations starting from 'save_start_iter' or later. |
cores |
A positive integer value. Number of CPU cores used for learning. |
gamma |
A positive numeric value. Discount factor of the Markov decision process. Default is 1.0 (not discount). |
lr |
A positive numeric value. Learning rate (default 5e-5). You can set a learning schedule instead of a learning rate. |
train_batch_size |
A positive integer value. Training batch size. Deprecated on the new API stack. |
model |
A list. Arguments passed into the policy model. See rl_dnn_config for details. |
sgd_minibatch_size |
A positive integer value. Total SGD batch size across all devices for SGD. Deprecated on the new API stack. |
num_sgd_iter |
A positive integer value. Number of SGD iterations in each outer loop. |
... |
Other settings for training(). See the arguments of the training() function in the source code of RLlib. https://github.com/ray-project/ray/blob/master/rllib/algorithms/algorithm_config.py https://github.com/ray-project/ray/blob/master/rllib/algorithms/ppo/ppo.py |
Value
A list of reinforcement learning configuration parameters
Examples
## Not run:
allocation_rule <- learn_allocation_rule(
models,
N_total = 150, N_ini = rep(10, 5), N_block = 10, Delta = 1.3,
outcome_type = "continuous", sd_normal = sqrt(4.5),
seed = 123,
# We change `iter` to 200 and `cores` for reinforcement learning to 2
rl_config = rl_config_set(iter = 200, cores = 2),
alpha = 0.025
)
## End(Not run)