tclustregIC {fsdaR} | R Documentation |
Computes tclustreg
for different number of groups k
and restriction factors c
.
Description
(the last two letters stand for 'Information Criterion') computes
the values of BIC (MIXMIX), ICL (MIXCLA) or CLA (CLACLA), for different values
of k
(number of groups) and different values of c
(restriction factor for the variances of the residuals), for
a prespecified level of trimming. In order to minimize randomness, given k
,
the same subsets are used for each value of c
.
Usage
tclustregIC(
y,
x,
alphaLik,
alphaX,
intercept = TRUE,
plot = FALSE,
nsamp,
refsteps = 10,
reftol = 1e-13,
equalweights = FALSE,
wtrim = 0,
we,
msg = TRUE,
RandNumbForNini,
trace = FALSE,
...
)
Arguments
y |
Response variable. A vector with n elements that
contains the response variable.
|
x |
An n x p data matrix (n observations and p variables).
Rows of x represent observations, and columns represent variables.
Missing values (NA's) and infinite values (Inf's) are allowed,
since observations (rows) with missing or infinite values will
automatically be excluded from the computations.
|
alphaLik |
Trimming level, a scalar between 0 and 0.5 or an
integer specifying the number of observations which have to be trimmed.
If alphaLik=0 , there is no trimming. More in detail, if 0 < alphaLik < 1
clustering is based on h = floor(n * (1 - alphaLik)) observations.
If alphaLik is an integer greater than 1 clustering is
based on h = n - floor(alphaLik) . More in detail, likelihood
contributions are sorted and the units associated with the smallest n - h
contributions are trimmed.
|
alphaX |
Second-level trimming or constrained weighted model for x .
|
intercept |
wheather to use constant term (default is intercept=TRUE
|
plot |
If plot=FALSE (default) or plot=0 no plot is produced.
If plot=TRUE a plot with the final allocation is shown (using the spmplot function).
If X is 2-dimensional, the lines associated to the groups are shown too.
|
nsamp |
If a scalar, it contains the number of subsamples which will be extracted.
If nsamp = 0 all subsets will be extracted. Remark - if the number of all possible
subset is greater than 300 the default is to extract all subsets, otherwise just 300.
If nsamp is a matrix it contains in the rows the indexes of the subsets which
have to be extracted. nsamp in this case can be conveniently generated by
function subsets() . nsamp must have k * p columns. The first p
columns are used to estimate the regression coefficient of group 1, ..., the last p
columns are used to estimate the regression coefficient of group k .
|
refsteps |
Number of refining iterations in each subsample. Default is refsteps=10 .
refsteps = 0 means "raw-subsampling" without iterations.
|
reftol |
Tolerance of the refining steps. The default value is 1e-14
|
equalweights |
A logical specifying wheather cluster weights in the concentration
and assignment steps shall be considered. If equalweights=TRUE we are (ideally)
assuming equally sized groups, else if equalweights = false (default) we allow for
different group weights. Please, check in the given references which functions
are maximized in both cases.
|
wtrim |
How to apply the weights on the observations - a flag taking values in c(0, 1, 2, 3, 4).
If wtrim==0 (no weights), the algorithm reduces to the standard tclustreg algorithm.
If wtrim==1 , trimming is done by weighting the observations using values specified in vector
we . In this case, vector we must be supplied by the user.
If wtrim==2 , trimming is again done by weighting the observations
using values specified in vector we . In this case, vector we
is computed from the data as a function of the density estimate pdfe.
Specifically, the weight of each observation is the probability of retaining
the observation, computed as
pretain_{ig} = 1-pdfe_{ig}/max_{ig}(pdfe_{ig})
If wtrim==3 , trimming is again done by weighting the observations using
values specified in vector we . In this case, each element wei of vector
we is a Bernoulli random variable with probability of success
pdfe_{ig} .
In the clustering framework this is done under the constraint that no group is empty.
If wtrim==4 , trimming is done with the tandem approach of Cerioli and Perrotta (2014).
|
we |
Weights. A vector of size n-by-1 containing application-specific weights
Default is a vector of ones.
|
msg |
Controls whether to display or not messages on the screen If msg==TRUE (default)
messages are displayed on the screen. If msg=2 , detailed messages are displayed,
for example the information at iteration level.
|
RandNumbForNini |
pre-extracted random numbers to initialize proportions.
Matrix of size k-by-nrow(nsamp) containing the random numbers which
are used to initialize the proportions of the groups. This option is effective only if
nsamp is a matrix which contains pre-extracted subsamples. The purpose of this
option is to enable the user to replicate the results when the function tclustreg()
is called using a parfor instruction (as it happens for example in routine IC, where
tclustreg() is called through a parfor for different values of the restriction factor).
The default is that RandNumbForNini is empty - then uniform random numbers are used.
|
trace |
Whether to print intermediate results. Default is trace=FALSE .
|
... |
potential further arguments passed to lower level functions.
|
Value
An S3 object of class tclustreg.object
Author(s)
FSDA team, valentin.todorov@chello.at
References
Torti F., Perrotta D., Riani, M. and Cerioli A. (2019). Assessing Robust Methodologies for Clustering Linear Regression Data,
Advances in Data Analysis and Classification, Vol. 13, pp 227-257.
[Package
fsdaR version 0.9-0
Index]