get_X_prima_AB_bounds_bootstrap {RVCompare} | R Documentation |
Estimate the confidence intervals for the cumulative distributions of X'_A and X'_B using bootstrap. Much slower than the Dvoretzky–Kiefer–Wolfowitz approach.
get_X_prima_AB_bounds_bootstrap( X_A_observed, X_B_observed, nOfEstimationPoints = 100, alpha = 0.2, EPSILON = 1e-20, nOfBootstrapSamples = 1000, ignoreUniqueValuesCheck = FALSE )
X_A_observed |
array of the observed samples (real values) of X_A. |
X_B_observed |
array of the observed samples (real values) of X_B, it needs to have the same length as X_A. |
nOfEstimationPoints |
(optional, default 100) the number of points in the interval [0,1] in which the cumulative density is estimated. Increases computation time. |
alpha |
(optional, default value 0.2) the error of the confidence interval. If alpha = 0.05 then we have 95 percent confidence interval. |
EPSILON |
(optional, default value 1e-20) minimum difference between two values to be considered different. |
nOfBootstrapSamples |
(optional, default value 1e3) how many bootstrap samples to average. Increases computation time. |
ignoreUniqueValuesCheck |
(optional, default value FALSE) |
Returns a list with the following fields:
- p: values in the interval [0,1] that represent the nOfEstimationPoints points in which the densities are estimated. Useful for plotting.
- X_prima_A_cumulative_estimation: an array with the estimated cumulative diustribution function of X_prima_A from 0 to p[[i]].
- X_prima_A_cumulative_upper: an array with the upper bounds of confidence 1 - alpha of the cumulative density of X_prima_A
- X_prima_A_cumulative_lower: an array with the lower bounds of confidence 1 - alpha of the cumulative density of X_prima_A
- X_prima_B_cumulative_estimation: The same as X_prima_A_cumulative_estimation for X'_B.
- X_prima_B_cumulative_upper: The same as X_prima_A_cumulative_upper for X'_B
- X_prima_B_cumulative_lower: The same as X_prima_A_cumulative_lower for X'_B
- diff_estimation: X_prima_A_cumulative_estimation - X_prima_B_cumulative_estimation
- diff_upper: an array with the upper bounds of confidence 1 - alpha of the difference between the cumulative distributions
- diff_lower: an array with the lower bounds of confidence 1 - alpha of the difference between the cumulative distributions
library(ggplot2) ### Example 1 ### X_A_observed <- c(0.13,0.21,0.13,0.11,2.2,0.12,0.5,0.14,0.21,0.17, 0.11,2.0,0.12,0.50,0.14,0.16,0.2,0.23,0.6,0.11,0.18,0.113,0.1234, 0.316,0.1523,0.1297,0.1123,0.139572,0.1937523) X_B_observed <- c(0.71,0.12,0.19,0.17,1.5,1.0,0.5,0.41,0.11,0.16,0.01, 0.31,0.34,0.64,0.14,0.13,0.09,0.21,0.29,0.36,0.41,0.13,0.142335, 0.12363,0.132451,0.59217,0.157129,0.13528, 0.145) res <- get_X_prima_AB_bounds_bootstrap(X_A_observed, X_B_observed) fig1 = plot_X_prima_AB(res, plotDifference=FALSE)+ ggplot2::ggtitle("Example 1") print(fig1) ### Example 2 ### # Comparing the estimations with the actual distributions for two normal distributions. ################################### ## sample size = 30 ############## ################################### X_A_observed <- rnorm(30,mean = 1, sd = 1) X_B_observed <- rnorm(30,mean = 1.3, sd = 0.5) res <- get_X_prima_AB_bounds_bootstrap(X_A_observed, X_B_observed) X_A_observed_large_sample <- sort(rnorm(1e4, mean = 1, sd = 1)) X_B_observed_large_sample <- sort(rnorm(1e4, mean = 1.3, sd = 0.5)) actualDistributions <- getEmpiricalCumulativeDistributions( X_A_observed_large_sample, X_B_observed_large_sample, nOfEstimationPoints=1e4, EPSILON=1e-20) actualDistributions$X_prima_A_cumulative_estimation <- lm(X_prima_A_cumulative_estimation ~ p + I(p^2) + I(p^3)+ I(p^4)+ I(p^5)+ I(p^6)+I(p^7)+ I(p^8), data = actualDistributions)$fitted.values actualDistributions$X_prima_B_cumulative_estimation <- lm(X_prima_B_cumulative_estimation ~ p + I(p^2) + I(p^3)+ I(p^4)+ I(p^5)+ I(p^6)+I(p^7)+ I(p^8), data = actualDistributions)$fitted.values fig = plot_X_prima_AB(res, plotDifference=FALSE) + geom_line(data=as.data.frame(actualDistributions), aes(x=p, y=X_prima_A_cumulative_estimation, colour = "Actual X'_A", linetype="Actual X'_A")) + geom_line(data=as.data.frame(actualDistributions), aes(x=p, y=X_prima_B_cumulative_estimation, colour = "Actual X'_B", linetype="Actual X'_B")) + scale_colour_manual("", breaks = c("X'_A", "X'_B","Actual X'_A", "Actual X'_B"), values = c("X'_A"="#F8766D", "X'_B"="#00BFC4", "Actual X'_A"="#FF0000", "Actual X'_B"="#0000FF"))+ scale_linetype_manual("", breaks = c("X'_A", "X'_B","Actual X'_A", "Actual X'_B"), values = c("X'_A"="dashed", "X'_B"="solid", "Actual X'_A"="solid", "Actual X'_B"="solid"))+ ggtitle("30 samples used in the estimation") print(fig) ################################### ## sample size = 300 ############## ################################### X_A_observed <- rnorm(300,mean = 1, sd = 1) X_B_observed <- rnorm(300,mean = 1.3, sd = 0.5) res <- get_X_prima_AB_bounds_bootstrap(X_A_observed, X_B_observed) X_A_observed_large_sample <- sort(rnorm(1e4, mean = 1, sd = 1)) X_B_observed_large_sample <- sort(rnorm(1e4, mean = 1.3, sd = 0.5)) actualDistributions <- getEmpiricalCumulativeDistributions( X_A_observed_large_sample, X_B_observed_large_sample, nOfEstimationPoints=1e4, EPSILON=1e-20) actualDistributions$X_prima_A_cumulative_estimation <- lm(X_prima_A_cumulative_estimation ~ p + I(p^2) + I(p^3)+ I(p^4)+ I(p^5)+ I(p^6)+I(p^7)+ I(p^8), data = actualDistributions)$fitted.values actualDistributions$X_prima_B_cumulative_estimation <- lm(X_prima_B_cumulative_estimation ~ p + I(p^2) + I(p^3)+ I(p^4)+ I(p^5)+ I(p^6)+I(p^7)+ I(p^8), data = actualDistributions)$fitted.values fig = plot_X_prima_AB(res, plotDifference=FALSE) + geom_line(data=as.data.frame(actualDistributions), aes(x=p, y=X_prima_A_cumulative_estimation, colour = "Actual X'_A", linetype="Actual X'_A")) + geom_line(data=as.data.frame(actualDistributions), aes(x=p, y=X_prima_B_cumulative_estimation, colour = "Actual X'_B", linetype="Actual X'_B")) + scale_colour_manual("", breaks = c("X'_A", "X'_B","Actual X'_A", "Actual X'_B"), values = c("X'_A"="#F8766D", "X'_B"="#00BFC4", "Actual X'_A"="#FF0000", "Actual X'_B"="#0000FF"))+ scale_linetype_manual("", breaks = c("X'_A", "X'_B","Actual X'_A", "Actual X'_B"), values = c("X'_A"="dashed", "X'_B"="solid", "Actual X'_A"="solid", "Actual X'_B"="solid")) + ggtitle("300 samples used in the estimation") print(fig)