simulate_POMDP {pomdp} | R Documentation |
Simulate trajectories through a POMDP. The start state for each trajectory is randomly chosen using the specified belief. The belief is used to choose actions from the the epsilon-greedy policy and then updated using observations.
simulate_POMDP(
model,
n = 100,
belief = NULL,
horizon = NULL,
visited_beliefs = FALSE,
epsilon = NULL,
digits = 7,
verbose = FALSE
)
model |
a POMDP model. |
n |
number of trajectories. |
belief |
probability distribution over the states for choosing the starting states for the trajectories. Defaults to the start belief state specified in the model or "uniform". |
horizon |
number of epochs for the simulation. If |
visited_beliefs |
logical; Should all belief points visited on the
trajectories be returned? If |
epsilon |
the probability of random actions for using an epsilon-greedy policy. Default for solved models is 0 and for unsolved model 1. |
digits |
round belief points. |
verbose |
report used parameters. |
A matrix with belief points (in the final epoch or all) as rows. Attributes containing action counts, and rewards for each trajectory may be available.
Michael Hahsler
Other POMDP:
POMDP()
,
plot_belief_space()
,
sample_belief_space()
,
solve_POMDP()
,
solve_SARSOP()
,
transition_matrix()
,
update_belief()
,
write_POMDP()
data(Tiger)
# solve the POMDP for 5 epochs and no discounting
sol <- solve_POMDP(Tiger, horizon = 5, discount = 1, method = "enum")
sol
policy(sol)
## Example 1: simulate 10 trajectories, only the final belief state is returned
sim <- simulate_POMDP(sol, n = 100, verbose = TRUE)
head(sim)
# plot the final belief state, look at the average reward and how often different actions were used.
plot_belief_space(sol, sample = sim)
# additional data is available as attributes
names(attributes(sim))
attr(sim, "avg_reward")
colMeans(attr(sim, "action"))
## Example 2: look at all belief states in the trajectory starting with an initial start belief.
sim <- simulate_POMDP(sol, n = 100, belief = c(.5, .5), visited_beliefs = TRUE)
# plot with added density
plot_belief_space(sol, sample = sim, ylim = c(0,5), jitter = 1)
lines(density(sim[, 1], bw = .02)); axis(2); title(ylab = "Density")
## Example 3: simulate trajectories for an unsolved POMDP which uses a epsilon of 1
# (i.e., all randomized actions)
sim <- simulate_POMDP(Tiger, n = 100, horizon = 5, visited_beliefs = TRUE)
plot_belief_space(sol, sample = sim, ylim = c(0,6))
lines(density(sim[, 1], bw = .05)); axis(2); title(ylab = "Density")