rgeode {RGeode} | R Documentation |
It selects the principal directions of the data and performs inference. Moreover GEODE is also able to handle missing data.
rgeode(Y, d = 6, burn = 1000, its = 2000, tol = 0.01, atau = 1/20, asigma = 1/2, bsigma = 1/2, starttime = NULL, stoptime = NULL, fast = TRUE, c0 = -1, c1 = -0.005)
Y |
array_like |
d |
int, optional |
burn |
int, optional |
its |
int, optional |
tol |
double, optional |
atau |
double, optional |
asigma |
double, optional |
bsigma |
double, optional |
starttime |
int, optional |
stoptime |
int, optional |
fast |
bool, optional |
c0 |
double, optional |
c1 |
double, optional |
GEOmetric Density Estimation (rgeode) is a fast algorithm performing inference on normally distributed data. It is essentially divided in two principal steps:
Selection of the principal axes of the data.
Adaptive Gibbs sampler with the creation of a set of samples from the full conditional posteriors of the parameters of interest, which enable us to perform inference.
It takes in inputs several quantities. A rectangular (N,D) matrix Y, on which we will run a Fast rank d SVD. The conservative upper bound of the true dimension of our data d. A set of tuning parameters. We remark that the choice of the conservative upper bound d must be such that d>p, with p real dimension, and d << D.
rgeode
returns a list containing the following
components:
InD |
array_like |
u |
matrix |
tau |
matrix |
sigmaS |
array_like |
W |
matrix |
Miss |
list
|
The part related to the missing data is filled only in the case in which we have missing data.
L. Rimella, lorenzo.rimella@hotmail.it
[1] Y. Wang, A. Canale, D. Dunson.
"Scalable Geometric Density Estimation" (2016).
library(MASS) library(RGeode) #################################################################### # WITHOUT MISSING DATA #################################################################### # Define the dataset D= 200 n= 500 d= 10 d_true= 3 set.seed(321) mu_true= runif(d_true, -3, 10) Sigma_true= matrix(0,d_true,d_true) diag(Sigma_true)= c(runif(d_true, 10, 100)) W_true = svd(matrix(rnorm(D*d_true, 0, 1), d_true, D))$v sigma_true = abs(runif(1,0,1)) mu= W_true%*%mu_true C= W_true %*% Sigma_true %*% t(W_true)+ sigma_true* diag(D) y= mvrnorm(n, mu, C) ################################ # GEODE: Without missing data ################################ start.time <- Sys.time() GEODE= rgeode(Y= y, d) Sys.time()- start.time # SIGMAS #plot(seq(110,3000,by=1),GEODE$sigmaS[110:3000],ty='l',col=2, # xlab= 'Iteration', ylab= 'sigma^2', main= 'Simulation of sigma^2') #abline(v=800,lwd= 2, col= 'blue') #legend('bottomright',c('Posterior of sigma^2', 'Stopping time'), # lwd=c(1,2),col=c(2,4),cex=0.55, border='black', box.lwd=3) #################################################################### # WITH MISSING DATA #################################################################### ########################### #Insert NaN n_m = 5 #number of data vectors containing missing features d_m = 1 #number of missing features data_miss= sample(seq(1,n),n_m) features= sample(seq(1,D), d_m) for(i in 2:n_m) { features= rbind(features, sample(seq(1,D), d_m)) } for(i in 1:length(data_miss)) { if(i==length(data_miss)) { y[data_miss[i],features[i,][-1]]= NaN } else { y[data_miss[i],features[i,]]= NaN } } ################################ # GEODE: With missing data ################################ set.seed(321) start.time <- Sys.time() GEODE= rgeode(Y= y, d) Sys.time()- start.time # SIGMAS #plot(seq(110,3000,by=1),GEODE$sigmaS[110:3000],ty='l',col=2, # xlab= 'Iteration', ylab= 'sigma^2', main= 'Simulation of sigma^2') #abline(v=800,lwd= 2, col= 'blue') #legend('bottomright',c('Posterior of sigma^2', 'Stopping time'), # lwd=c(1,2),col=c(2,4),cex=0.55, border='black', box.lwd=3) #################################################################### ####################################################################