prob_sup {ProbBreed}R Documentation

Probabilities of superior performance and stability

Description

This function estimates the probabilities of superior performance and stability across environments (marginal output). It also computes the probabilities of superior performance within environments (conditional output).

Usage

prob_sup(
  data,
  trait,
  gen,
  loc,
  reg = NULL,
  year = NULL,
  mod.output,
  int,
  increase = TRUE,
  save.df = FALSE,
  interactive = FALSE,
  verbose = FALSE
)

Arguments

data

A data frame containing the phenotypic data

trait, gen, loc

A string. The name of the columns that correspond to the trait, genotype and location information, respectively. If the environment is a combination of other factors (for instance, location-year), the name of the column that contains this information must be attributed to loc.

reg

A string or NULL. If the dataset has information about regions, reg will be a string with the name of the column that corresponds to the region information. Otherwise, reg = NULL (default).

year

A string or NULL. If the data set has information about time-related environmental factors (years, seasons...), year will be a string with the name of the column that corresponds to the time information. Otherwise, year = NULL (default).

mod.output

An object from the extr_outs() function

int

A number representing the selection intensity (between 0 and 1)

increase

Logical. Indicates the direction of the selection. TRUE (default) for increasing the trait value, FALSE otherwise.

save.df

Logical. Should the data frames be saved in the work directory? TRUE for saving, FALSE (default) otherwise.

interactive

Logical. Should ggplots be converted into interactive plots? If TRUE, the function loads the plotly package and uses the plotly::ggplotly() command.

verbose

A logical value. If TRUE, the function will indicate the completed steps. Defaults to FALSE.

Details

Probabilities provide the risk of recommending a selection candidate for a target population of environments or for a specific environment. The function prob_sup() computes the probabilities of superior performance and the probabilities of superior stability:

Let \Omega represent the subset of selected genotypes based on their performance across environments. A given genotype j will belong to \Omega if its genotypic marginal value (\hat{g}_j) is high or low enough compared to its peers. prob_sup() leverages the Monte Carlo discretized sampling from the posterior distribution to emulate the occurrence of S trials. Then, the probability of the j^{th} genotype belonging to \Omega is the ratio of success (\hat{g}_j \in \Omega) events and the total number of sampled events, as follows:

Pr(\hat{g}_j \in \Omega \vert y) = \frac{1}{S}\sum_{s=1}^S{I(\hat{g}_j^{(s)} \in \Omega \vert y)}

where S is the total number of samples (s = 1, 2, ..., S), and I(g_j^{(s)} \in \Omega \vert y) is an indicator variable that can assume two values: (1) if \hat{g}_j^{(s)} \in \Omega in the s^{th} sample, and (0) otherwise. S is conditioned to the number of iterations and chains previously set at bayes_met().

Similarly, the conditional probability of superior performance can be applied to individual environments. Let \Omega_k represent the subset of superior genotypes in the k^{th} environment, so that the probability of the j^{th} \in \Omega_k can calculated as follows:

Pr(\hat{g}_{jk} \in \Omega_k \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_{jk}^{(s)} \in \Omega_k \vert y)

where I(\hat{g}_{jk}^{(s)} \in \Omega_k \vert y) is an indicator variable mapping success (1) if \hat{g}_{jk}^{(s)} exists in \Omega_k, and failure (0) otherwise, and \hat{g}_{jk}^{(s)} = \hat{g}_j^{(s)} + \widehat{ge}_{jk}^{(s)}. Note that when computing conditional probabilities (i.e., conditional to the k^{th} environment or mega-environment), we are accounting for the interaction of the j^{th} genotype with the k^{th} environment.

The pairwise probabilities of superior performance can also be calculated across or within environments. This metric assesses the probability of the j^{th} genotype being superior to another experimental genotype or a commercial check. The calculations are as follows, across and within environments, respectively:

Pr(\hat{g}_{j} > \hat{g}_{j^\prime} \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_{j}^{(s)} > \hat{g}_{j^\prime}^{(s)} \vert y)

or

Pr(\hat{g}_{jk} > \hat{g}_{j^\prime k} \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_{jk}^{(s)} > \hat{g}_{j^\prime k}^{(s)} \vert y)

These equations are set for when the selection direction is positive. If increase = FALSE, > is simply switched by <.

Probabilities of superior performance highlight experimental genotypes with high agronomic stability. For ecological stability (invariance), the probability of superior stability is the more adequate. Making a direct analogy with the method of Shukla (1972), a stable genotype is the one that has a low variance of the GEI (genotype-by-environment interaction) effects [var(\widehat{ge})]. Using the same probability principles previously described, the probability of superior stability is given as follows:

Pr[var(\widehat{ge}_{jk}) \in \Omega \vert y] = \frac{1}{S} \sum_{s=1}^S I[var(\widehat{ge}_{jk}^{(s)}) \in \Omega \vert y]

where I[var(\widehat{ge}_{jk}^{(s)}) \in \Omega \vert y] indicates if var(\widehat{ge}_{jk}^{(s)}) exists in \Omega (1) or not (0). Pairwise probabilities of superior stability are also possible in this context:

Pr[var(\widehat{ge}_{jk}) < var(\widehat{ge}_{j^\prime k}) \vert y] = \frac{1}{S} \sum_{s=1}^S I[var(\widehat{ge}_{jk})^{(s)} < var(\widehat{ge}_{j^\prime k})^{(s)} \vert y]

Note that j will be superior to j^\prime if it has a lower variance of the genotype-by-environment interaction effect. This is true regardless if increase is set to TRUE or FALSE.

The joint probability independent events is the product of the individual probabilities. The estimated genotypic main effects and the variances of GEI effects are independent by design, thus the joint probability of superior performance and stability as follows:

Pr[\hat{g}_j \in \Omega \cap var(\widehat{ge}_{jk}) \in \Omega] = Pr(\hat{g}_j \in \Omega) \times Pr[var(\widehat{ge}_{jk}) \in \Omega]

The estimation of these probabilities are strictly related to some key questions that constantly arises in plant breeding:

More details about the usage of prob_sup, as well as the other function of the ProbBreed package can be found at https://saulo-chaves.github.io/ProbBreed_site/.

Value

The function returns two lists, one with the marginal probabilities, and another with the conditional probabilities.

The marginal list has:

The conditional list has:

References

Dias, K. O. G, Santos J. P. R., Krause, M. D., Piepho H. -P., GuimarĂ£es, L. J. M., Pastina, M. M., and Garcia, A. A. F. (2022). Leveraging probability concepts for cultivar recommendation in multi-environment trials. Theoretical and Applied Genetics, 133(2):443-455. doi:10.1007/s00122-022-04041-y

Shukla, G. K. (1972) Some statistical aspects of partioning genotype environmental componentes of variability. Heredity, 29:237-245. doi:10.1038/hdy.1972.87

Examples


mod = bayes_met(data = maize,
                gen = "Hybrid",
                loc = "Location",
                repl = c("Rep", "Block"),
                year = NULL,
                reg = 'Region',
                res.het = FALSE,
                trait = 'GY',
                iter = 6000, cores = 4, chains = 4)

outs = extr_outs(data = maize, trait = "GY", model = mod,
                 probs = c(0.05, 0.95),
                 check.stan.diag = TRUE,
                 verbose = TRUE)

results = prob_sup(data = maize,
                   trait = "GY",
                   gen = "Hybrid",
                   loc = "Location",
                   reg = 'Region',
                   year = NULL,
                   mod.output = outs,
                   int = .2,
                   increase = TRUE,
                   save.df = FALSE,
                   interactive = FALSE,
                   verbose = FALSE)



[Package ProbBreed version 1.0.3.2 Index]