prob_sup {ProbBreed} | R Documentation |
Probabilities of superior performance and stability
Description
This function estimates the probabilities of superior performance and stability
across environments (marginal
output). It also computes the probabilities
of superior performance within environments (conditional
output).
Usage
prob_sup(
data,
trait,
gen,
loc,
reg = NULL,
year = NULL,
mod.output,
int,
increase = TRUE,
save.df = FALSE,
interactive = FALSE,
verbose = FALSE
)
Arguments
data |
A data frame containing the phenotypic data |
trait , gen , loc |
A string. The name of the columns that correspond to
the trait, genotype and location information, respectively. If
the environment is a combination of other factors (for instance, location-year),
the name of the column that contains this information must be attributed to |
reg |
A string or NULL. If the dataset has information about regions,
|
year |
A string or NULL. If the data set has information about time-related
environmental factors (years, seasons...), |
mod.output |
An object from the |
int |
A number representing the selection intensity (between 0 and 1) |
increase |
Logical. Indicates the direction of the selection.
|
save.df |
Logical. Should the data frames be saved in the work directory?
|
interactive |
Logical. Should ggplots be converted into interactive plots?
If |
verbose |
A logical value. If |
Details
Probabilities provide the risk of recommending a selection candidate for a target
population of environments or for a specific environment. The function prob_sup()
computes the probabilities of superior performance and the probabilities of superior stability:
Probability of superior performance
Let \Omega
represent the subset of selected genotypes based on their
performance across environments. A given genotype j
will belong to \Omega
if its genotypic marginal value (\hat{g}_j
) is high or low enough compared to
its peers. prob_sup()
leverages the Monte Carlo discretized sampling
from the posterior distribution to emulate the occurrence of S
trials. Then,
the probability of the j^{th}
genotype belonging to \Omega
is the
ratio of success (\hat{g}_j \in \Omega
) events and the total number of sampled events,
as follows:
Pr(\hat{g}_j \in \Omega \vert y) = \frac{1}{S}\sum_{s=1}^S{I(\hat{g}_j^{(s)} \in \Omega \vert y)}
where S
is the total number of samples (s = 1, 2, ..., S
),
and I(g_j^{(s)} \in \Omega \vert y)
is an indicator variable that can assume
two values: (1) if \hat{g}_j^{(s)} \in \Omega
in the s^{th}
sample,
and (0) otherwise. S
is conditioned to the number of iterations and chains
previously set at bayes_met()
.
Similarly, the conditional probability of superior performance can be applied to
individual environments. Let \Omega_k
represent the subset of superior
genotypes in the k^{th}
environment, so that the probability of the
j^{th} \in \Omega_k
can calculated as follows:
Pr(\hat{g}_{jk} \in \Omega_k \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_{jk}^{(s)} \in \Omega_k \vert y)
where I(\hat{g}_{jk}^{(s)} \in \Omega_k \vert y)
is an indicator variable
mapping success (1) if \hat{g}_{jk}^{(s)}
exists in \Omega_k
, and
failure (0) otherwise, and \hat{g}_{jk}^{(s)} = \hat{g}_j^{(s)} + \widehat{ge}_{jk}^{(s)}
.
Note that when computing conditional probabilities (i.e., conditional to the
k^{th}
environment or mega-environment), we are accounting for
the interaction of the j^{th}
genotype with the k^{th}
environment.
The pairwise probabilities of superior performance can also be calculated across
or within environments. This metric assesses the probability of the j^{th}
genotype being superior to another experimental genotype or a commercial check.
The calculations are as follows, across and within environments, respectively:
Pr(\hat{g}_{j} > \hat{g}_{j^\prime} \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_{j}^{(s)} > \hat{g}_{j^\prime}^{(s)} \vert y)
or
Pr(\hat{g}_{jk} > \hat{g}_{j^\prime k} \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_{jk}^{(s)} > \hat{g}_{j^\prime k}^{(s)} \vert y)
These equations are set for when the selection direction is positive. If
increase = FALSE
, >
is simply switched by <
.
Probability of superior stability
Probabilities of superior performance highlight experimental genotypes with
high agronomic stability. For ecological stability (invariance), the probability
of superior stability is the more adequate. Making a direct analogy with the
method of Shukla (1972), a stable genotype is the one that has a low variance
of the GEI (genotype-by-environment interaction) effects [var(\widehat{ge})]
.
Using the same probability principles previously described, the probability
of superior stability is given as follows:
Pr[var(\widehat{ge}_{jk}) \in \Omega \vert y] = \frac{1}{S} \sum_{s=1}^S I[var(\widehat{ge}_{jk}^{(s)}) \in \Omega \vert y]
where I[var(\widehat{ge}_{jk}^{(s)}) \in \Omega \vert y]
indicates if
var(\widehat{ge}_{jk}^{(s)})
exists in \Omega
(1) or not (0).
Pairwise probabilities of superior stability are also possible in this context:
Pr[var(\widehat{ge}_{jk}) < var(\widehat{ge}_{j^\prime k}) \vert y] = \frac{1}{S} \sum_{s=1}^S I[var(\widehat{ge}_{jk})^{(s)} < var(\widehat{ge}_{j^\prime k})^{(s)} \vert y]
Note that j
will be superior to j^\prime
if it has a lower
variance of the genotype-by-environment interaction effect. This is true regardless
if increase
is set to TRUE
or FALSE
.
The joint probability independent events is the product of the individual probabilities. The estimated genotypic main effects and the variances of GEI effects are independent by design, thus the joint probability of superior performance and stability as follows:
Pr[\hat{g}_j \in \Omega \cap var(\widehat{ge}_{jk}) \in \Omega] = Pr(\hat{g}_j \in \Omega) \times Pr[var(\widehat{ge}_{jk}) \in \Omega]
The estimation of these probabilities are strictly related to some key questions that constantly arises in plant breeding:
-
What is the risk of recommending a selection candidate for a target population of environments?
-
What is the probability of a given selection candidate having good performance if recommended to a target population of environments? And for a specific environment?
-
What is the probability of a given selection candidate having better performance than a cultivar check in the target population of environments? And in specific environments?
-
How probable is it that a given selection candidate performs similarly across environments?
-
What are the chances that a given selection candidate is more stable than a cultivar check in the target population of environments?
-
What is the probability that a given selection candidate having a superior and invariable performance across environments?
More details about the usage of prob_sup
, as well as the other function of
the ProbBreed
package can be found at https://saulo-chaves.github.io/ProbBreed_site/.
Value
The function returns two lists, one with the marginal
probabilities, and
another with the conditional
probabilities.
The marginal
list has:
-
df
: A list of data frames containing the calculated probabilities:-
perfo
: the probabilities of superior performance. -
pair_perfo
: the pairwise probabilities of superior performance. -
stabi
: the probabilities of superior stability. Can bestabi_gl
,stabi_gm
(whenreg
is notNULL
) orstabi_gt
(whenyear
is notNULL
). -
pair_stabi
: the pairwise probabilities of superior stability. Can bepair_stabi_gl
,pair_stabi_gm
(whenreg
is notNULL
) orpair_stabi_gt
(whenyear
is notNULL
). -
joint_prob
: the joint probabilities of superior performance and stability.
-
-
plot
: A list of ggplots illustrating the outputs:-
g_hpd
: a caterpillar plot representing the marginal genotypic value of each genotype, and their respective highest posterior density interval (95% represented by the thick line, and 97.5% represented by the thin line). -
perfo
: a bar plot illustrating the probabilities of superior performance -
pair_perfo
: a heatmap representing the pairwise probability of superior performance (the probability of genotypes at the x-axis being superior to those on the y-axis). -
stabi
: a bar plot with the probabilities of superior stability. Different plots are generated forstabi_gl
,stabi_gm
andstabi_gt
ifreg
or/andyear
are notNULL
. -
pair_stabi
: a heatmap with the pairwise probabilities of superior stability. Different plots are generated forstabi_gl
,stabi_gm
andstabi_gt
ifreg
or/andyear
are notNULL
. This plot represents the probability of genotypes at the x-axis being superior to those on y-axis. -
joint_prob
: a plot with the probabilities of superior performance, probabilities of superior stability and the joint probabilities of superior performance and stability.
-
The conditional
list has:
-
df
: A list with:-
prob
: data frames containing the probabilities of superior performance within environments. Can beprob_loc
,prob_reg
(ifreg
is notNULL
), andprob_year
(ifyear
is notNULL
). -
pwprob
: lists with the pairwise probabilities of superior performance within environments. Can bepwprob_loc
,pwprob_reg
(ifreg
is notNULL
), andpwprob_year
(ifyear
is notNULL
).
-
-
plot
: A list with:-
prob
: heatmaps with the probabilities of superior performance within environments. Can beprob_loc
,prob_reg
(ifreg
is notNULL
), andprob_year
(ifyear
is notNULL
). -
pwprob
: a list of heatmaps representing the pairwise probability of superior performance within environments. Can bepwprob_loc
,pwprob_reg
(ifreg
is notNULL
), andpwprob_year
(ifyear
is notNULL
). The interpretation is the same as in thepair_perfo
in themarginal
list: the probability of genotypes at the x-axis being superior to those on y-axis.
-
References
Dias, K. O. G, Santos J. P. R., Krause, M. D., Piepho H. -P., GuimarĂ£es, L. J. M., Pastina, M. M., and Garcia, A. A. F. (2022). Leveraging probability concepts for cultivar recommendation in multi-environment trials. Theoretical and Applied Genetics, 133(2):443-455. doi:10.1007/s00122-022-04041-y
Shukla, G. K. (1972) Some statistical aspects of partioning genotype environmental componentes of variability. Heredity, 29:237-245. doi:10.1038/hdy.1972.87
Examples
mod = bayes_met(data = maize,
gen = "Hybrid",
loc = "Location",
repl = c("Rep", "Block"),
year = NULL,
reg = 'Region',
res.het = FALSE,
trait = 'GY',
iter = 6000, cores = 4, chains = 4)
outs = extr_outs(data = maize, trait = "GY", model = mod,
probs = c(0.05, 0.95),
check.stan.diag = TRUE,
verbose = TRUE)
results = prob_sup(data = maize,
trait = "GY",
gen = "Hybrid",
loc = "Location",
reg = 'Region',
year = NULL,
mod.output = outs,
int = .2,
increase = TRUE,
save.df = FALSE,
interactive = FALSE,
verbose = FALSE)