validation {Qval}R Documentation

Perform Q-matrix validation methods

Description

This function uses generalized Q-matrix validation methods to validate the Q-matrix, including commonly used methods such as GDI (de la Torre, & Chiu, 2016; Najera, Sorrel, & Abad, 2019; Najera et al., 2020), Wald (Ma, & de la Torre, 2020), Hull (Najera et al., 2021), and MLR-B (Tu et al., 2022). It supports different iteration methods (test level or item level; Najera et al., 2020; Najera et al., 2021; Tu et al., 2022) and can apply various attribute search methods (ESA, SSA, PAA; de la Torre, 2008; Terzi, & de la Torre, 2018). More see details.

Usage

validation(
  Y,
  Q,
  CDM.obj = NULL,
  par.method = "EM",
  mono.constraint = TRUE,
  model = "GDINA",
  method = "GDI",
  search.method = "PAA",
  maxitr = 1,
  iter.level = "test",
  eps = 0.95,
  alpha.level = 0.05,
  criter = "PVAF",
  verbose = TRUE
)

Arguments

Y

A required N × I matrix or data.frame consisting of the responses of N individuals to I items. Missing values need to be coded as NA.

Q

A required binary I × K containing the attributes not required or required, 0 or 1, to master the items. The ith row of the matrix is a binary indicator vector indicating which attributes are not required (coded by 0) and which attributes are required (coded by 1) to master item i.

CDM.obj

An object of class CDM.obj. When it is not NULL, it enables rapid verification of the Q-matrix without the need for parameter estimation. @seealso CDM.

par.method

Type of mtehod to estimate CDMs' parameters; one out of "EM", "BM". Default = "EM" However, "BM" is only avaible when method = "GDINA".

mono.constraint

Logical indicating whether monotonicity constraints should be fulfilled in estimation. Default = TRUE.

model

Type of model to fit; can be "GDINA", "LCDM", "DINA", "DINO" , "ACDM", "LLM", or "rRUM". Default = "GDINA". @seealso CDM.

method

The methods to validata Q-matrix, can be "GDI", "Wald", "Hull", and "MLR-B". The "model" must be "GDINA" when method = "Wald". Default = "GDI". See details.

search.method

Character string specifying the search method to use during validation.

"SSA"

for sequential search algorithm (see de la Torre, 2008; Terzi & de la Torre, 2018). This option can be used when the method is "GDI", "Hull" or "MLR-B".

"ESA"

for exhaustive search algorithm. This option can be used when the method is any of "GDI", "Hull", or "MLR-B".

"PAA"

for priority attribute algorithm. This is the default option and can be used when the method is any of "GDI", "Wald", "Hull", or "MLR-B".

"stepwise"

only for the "Wald"

"forward"

only for the "Wald"

maxitr

Number of max iterations. Default = 1.

iter.level

Can be "item" level or "test" level. Default = "test". Only "test" is available When method = "Wald" or "MLR-B". See details.

eps

Cut-off points of PVAF, will work when the method is "GDI" or "Wald". Default = 0.95. See details.

alpha.level

alpha level for the wald test. Default = 0.05

criter

The kind of fit-index value, can be R^2 for R_{McFadden}^2 @seealso get.R2 or PVAF for the proportion of variance accounted for (PVAF) @seealso get.PVAF. Only when method = "Hull" works and default = "PVAF". See details.

verbose

Logical indicating to print iterative information or not. Default is TRUE

Value

An object of class validation is a list containing the following components:

Q.orig

The original Q-matrix that maybe contains some mis-specifications and need to be validate.

Q.sug

The Q-matrix that suggested by certain validation method.

priority

An I × K matrix that contains the priority of every attribute for each item. Only when the search.method is "PAA", the value is availble. See details.

Hull.fit

A list containing all the information needed to plot the Hull plot, which is available only when method = "Hull".

iter

The number of iteration.

time.cost

The time that CPU cost to finish the function.

The GDI method

The GDI method (de la Torre & Chiu, 2016), as the first Q-matrix validation method applicable to saturated models, serves as an important foundation for various mainstream Q-matrix validation methods.

The method calculates the proportion of variance accounted for (PVAF; @seealso get.PVAF) for all possible q-vectors for each item, selects the q-vector with a PVAF just greater than the cut-off point (or Epsilon, EPS) as the correction result, and the variance \zeta^2 is the generalized discriminating index (GDI; de la Torre & Chiu, 2016). Therefore, the GDI method is also considered as a generalized extension of the delta method (de la Torre, 2008), which also takes maximizing discrimination as its basic idea. In the GDI method, \zeta^2 is defined as the weighted variance of the correct response probabilities across all mastery patterns, that is:

\zeta^2 = \sum_{l=1}^{2^K} \pi_{l} {(P(X_{pi}=1|\mathbf{\alpha}_{l}) - P_{i}^{mean})}^2

where \pi_{l} represents the prior probability of mastery pattern l; P_{i}^{mean}=\sum_{k=1}^{K}\pi_{l}P(X_{pi}=1|\mathbf{\alpha}_{l}) is the weighted average of the correct response probabilities across all attribute mastery patterns. When the q-vector is correctly specified, the calculated \zeta^2 should be maximized, indicating the maximum discrimination of the item. However, in reality, \zeta^2 continues to increase when the q-vector is over-specified, and the more attributes that are over-specified, the larger \zeta^2 becomes. The q-vector with all attributes set to 1 (i.e., \mathbf{q}_{1:K}) has the largest \zeta^2 (de la Torre, 2016). This is because an increase in attributes in the q-vector leads to an increase in item parameters, resulting in greater differences in correct response probabilities across attribute patterns and, consequently, increased variance. However, this increase in variance is spurious. Therefore, de la Torre et al. calculated PVAF = \frac{\zeta^2}{\zeta_{1:K}^2} to describe the degree to which the discrimination of the current q-vector explains the maximum discrimination. They selected an appropriate PVAF cut-off point to achieve a balance between q-vector fit and parsimony. According to previous studies, the PVAF cut-off point is typically set at 0.95 (Ma & de la Torre, 2020; Najera et al., 2021).

The Wald method

The Wald method (Ma & de la Torre, 2020) combines the Wald test with PVAF to correct the Q-matrix at the item level. Its basic logic is as follows: when correcting item i, the single attribute that maximizes the PVAF value is added to a vector with all attributes set to \mathbf{0} (i.e., \mathbf{q} = (0, 0, \ldots, 0)) as a starting point. In subsequent iterations, attributes in this vector are continuously added or removed through the Wald test. The correction process ends when the PVAF exceeds the cut-off point or when no further attribute changes occur. The Wald statistic follows an asymptotic \chi^{2} distribution with a degree of freedom of 2^{K^\ast} - 1.

The calculation method is as follows:

Wald = (\mathbf{R} \times P_{i}(\mathbf{\alpha}))^{'} (\mathbf{R} \times \mathbf{V}_{i} \times \mathbf{R})^{-1} (\mathbf{R} \times P_{i}(\mathbf{\alpha}))

\mathbf{R} represents the restriction matrix; P_{i}(\mathbf{\alpha}) denotes the vector of correct response probabilities for item i; \mathbf{V}_i is the variance-covariance matrix of the correct response probabilities for item i, which can be obtained by multiplying the \mathbf{M}_i matrix (de la Torre, 2011) with the variance-covariance matrix of item parameters \mathbf{\Sigma}_i, i.e., \mathbf{V}_i = \mathbf{M}_i \times \mathbf{\Sigma}_i. The \mathbf{\Sigma}_i can be derived by inverting the information matrix. Using the the empirical cross-product information matrix (de la Torre, 2011) to calculate \mathbf{\Sigma}_i.

\mathbf{M}_i is a 2^{K^\ast} × 2^{K^\ast} matrix that represents the relationship between the parameters of item i and the attribute mastery patterns. The rows represent different mastery patterns, while the columns represent different item parameters.

The Hull method

The Hull method (Najera et al., 2021) addresses the issue of the cut-off point in the GDI method and demonstrates good performance in simulation studies. Najera et al. applied the Hull method for determining the number of factors to retain in exploratory factor analysis (Lorenzo-Seva et al., 2011) to the retention of attribute quantities in the q-vector, specifically for Q-matrix validation. The Hull method aligns with the GDI approach in its philosophy of seeking a balance between fit and parsimony. While GDI relies on a preset, arbitrary cut-off point to determine this balance, the Hull method utilizes the most pronounced elbow in the Hull plot to make this judgment. The the most pronounced elbow is determined using the following formula:

st = \frac{(f_k - f_{k-1}) / (np_k - np_{k-1})}{(f_{k+1} - f_k) / (np_{k+1} - np_k)}

where f_k represents the fit-index value (can be PVAF @seealso get.PVAF or R2 @seealso get.R2) when the q-vector contains k attributes, similarly, f_{k-1} and f_{k+1} represent the fit-index value when the q-vector contains k-1 and k+1 attributes, respectively. {np}_k denotes the number of parameters when the q-vector has k attributes, which is 2^k for a saturated model. Likewise, {np}_{k-1} and {np}_{k+1} represent the number of parameters when the q-vector has k-1 and k+1 attributes, respectively. The Hull method calculates the st index for all possible q-vectors and retains the q-vector with the maximum st index as the corrected result. Najera et al. (2021) removed any concave points from the Hull plot, and when only the first and last points remained in the plot, the saturated q-vector was selected.

The MLR-B method

The MLR-B method proposed by Tu et al. (2022) differs from the GDI, Wald and Hull method in that it does not employ PVAF. Instead, it directly uses the marginal probabilities of attribute mastery for subjects to perform multivariate logistic regression on their observed scores. This approach assumes all possible q-vectors and conducts 2^K-1 regression modelings. After proposing regression equations that exclude any insignificant regression coefficients, it selects the q-vector corresponding to the equation with the minimum AIC fit as the validation result. The performance of this method in both the LCDM and GDM models even surpasses that of the Hull method, making it an efficient and reliable approach for Q-matrix correction.

Iterative procedure

The iterative procedure that one item modification at a time is item level iteration ("item") in (Najera et al., 2020, 2021), while the iterative procedure that the entire Q-matrix is modified at each iteration is test level iteration ("test") (Najera et al., 2020; Tu et al., 2022).

The steps of the item level iterative procedure algorithm are as follows:

Step1

Fit the CDM according to the item responses and the provisional Q-matrix (\mathbf{Q}^0).

Step2

Validate the provisional Q-matrix and gain a suggested Q-matrix (\mathbf{Q}^1).

Step3

for each item, PVAF_{0i} as the PVAF of the provisional q-vector specified in \mathbf{Q}^0, and PVAF_{1i} as the PVAF of the suggested q-vector in \mathbf{Q}^1.

Step4

Calculate all items' \delta PVAF_{i}, defined as \delta PVAF_{i} = |PVAF_{1i} - PVAF_{0i}|

Step5

Define the hit item as the item with the highest \delta PVAF_{i}.

Step6

Update \mathbf{Q}^0 by changing the provisional q-vector by the suggested q-vector of the hit item.

Step7

Iterate over Steps 1 to 6 until \sum_{i=1}^{I} \delta PVAF_{i} = 0

The steps of the test level iterative procedure algorithm are as follows:

Step1

Fit the CDM according to the item responses and the provisional Q-matrix (\mathbf{Q}^0).

Step2

Validate the provisional Q-matrix and gain a suggested Q-matrix (\mathbf{Q}^1).

Step3

Check whether \mathbf{Q}^1 = \mathbf{Q}^0. If TRUE, terminate the iterative algorithm. If FALSE, Update \mathbf{Q}^0 as \mathbf{Q}^1.

Step4

Iterate over Steps 1 and 3 until one of conditions as follows is satisfied: 1. \mathbf{Q}^1 = \mathbf{Q}^0; 2. Reach the max iteration (maxitr); 3. \mathbf{Q}^1 does not satisfy the condition that an attribute is measured by one item at least.

Search algorithm

Three search algorithms are available: Exhaustive Search Algorithm (ESA), Sequential Search Algorithm (SSA), and Priority Attribute Algorithm (PAA). ESA is a brute-force algorithm. When validating the q-vector of a particular item, it traverses all possible q-vectors and selects the most appropriate one based on the chosen Q-matrix validation method. Since there are 2^{K-1} possible q-vectors with K attributes, ESA requires 2^{K-1} searches.

SSA reduces the number of searches by adding one attribute at a time to the q-vector in a stepwise manner. Therefore, in the worst-case scenario, SSA requires K(K-1)/2 searches. The detailed steps are as follows:

Step 1

Define an empty q-vector \mathbf{q}^0=[00...0] of length K, where all elements are 0.

Step 2

Examine all single-attribute q-vectors, which are those formed by changing one of the 0s in \mathbf{q}^0 to 1. According to the criteria of the chosen Q-matrix validation method, select the optimal single-attribute q-vector, denoted as \mathbf{q}^1.

Step 3

Examine all two-attribute q-vectors, which are those formed by changing one of the 0s in \mathbf{q}^1 to 1. According to the criteria of the chosen Q-matrix validation method, select the optimal two-attribute q-vector, denoted as \mathbf{q}^2.

Step 4

Repeat this process until \mathbf{q}^K is found, or the stopping criterion of the chosen Q-matrix validation method is met.

PAA is a highly efficient and concise algorithm that evaluates whether each attribute needs to be included in the q-vector based on the priority of the attributes. @seealso get.priority. Therefore, even in the worst-case scenario, PAA only requires K searches. The detailed process is as follows:

Step 1

Using the applicable CDM (e.g. the G-DINA model) to estimate the model parameters and obtain the marginal attribute mastery probabilities matrix \mathbf{\Lambda}

Step 2

Use LASSO regression to calculate the priority of each attribute in the q-vector for item i

Step 3

Check whether each attribute is included in the optimal q-vector based on the attribute priorities from high to low seriatim and output the final suggested q-vector according to the criteria of the chosen Q-matrix validation method.

It should be noted that the Wald method proposed by Ma & de la Torre (2020) uses a "stepwise" search approach. This approach involves incrementally adding or removing 1 from the q-vector and evaluating the significance of the change using the Wald test: 1. If removing a 1 results in non-significance (indicating that the 1 is unnecessary), the 1 is removed from the q-vector; otherwise, the q-vector remains unchanged. 2. If adding a 1 results in significance (indicating that the 1 is necessary), the 1 is added to the q-vector; otherwise, the q-vector remains unchanged. The process stops when the q-vector no longer changes or when the PVAF reaches the preset cut-off point (i.e., 0.95).

The "forward" search approach is another search method available for the Wald method, and its logic is simple because it merely keeps turning the 0s in the q vector into 1s, stopping when no more 0s can be turned into 1s or the PVAF reaches the cut-off point.

Stepwise and Forward are unique search approach of the Wald method, and users should be aware of this. Since stepwise is inefficient and differs significantly from the extremely high efficiency of PAA, Qval also provides PAA for q-vector search in the Wald method. When applying the PAA version of the Wald method, the search still examines whether each attribute is necessary (by checking if the Wald test reaches significance after adding the attribute) according to attribute priority. The search stops when no further necessary attributes are found or when the PVAF reaches the preset cut-off point (i.e., 0.95).

Author(s)

Haijiang Qin <Haijiang133@outlook.com>

References

de la Torre, J., & Chiu, C. Y. (2016). A General Method of Empirical Q-matrix Validation. Psychometrika, 81(2), 253-273. DOI: 10.1007/s11336-015-9467-8.

de la Torre, J. (2008). An Empirically Based Method of Q-Matrix Validation for the DINA Model: Development and Applications. Journal of Education Measurement, 45(4), 343-362. DOI: 10.1111/j.1745-3984.2008.00069.x.

Lorenzo-Seva, U., Timmerman, M. E., & Kiers, H. A. (2011). The Hull method for selecting the number of common factors. Multivariate Behavioral Research, 46, 340–364. DOI: 10.1080/00273171.2011.564527.

Ma, W., & de la Torre, J. (2020). An empirical Q-matrix validation method for the sequential generalized DINA model. British Journal of Mathematical and Statistical Psychology, 73(1), 142-163. DOI: 10.1111/bmsp.12156.

McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in economics (pp. 105–142). New York, NY: Academic Press.

Najera, P., Sorrel, M. A., & Abad, F. J. (2019). Reconsidering Cutoff Points in the General Method of Empirical Q-Matrix Validation. Educational and Psychological Measurement, 79(4), 727-753. DOI: 10.1177/0013164418822700.

Najera, P., Sorrel, M. A., de la Torre, J., & Abad, F. J. (2020). Improving Robustness in Q-Matrix Validation Using an Iterative and Dynamic Procedure. Applied Psychological Measurement, 44(6), 431-446. DOI: 10.1177/0146621620909904.

Najera, P., Sorrel, M. A., de la Torre, J., & Abad, F. J. (2021). Balancing fit and parsimony to improve Q-matrix validation. British Journal of Mathematical and Statistical Psychology, 74 Suppl 1, 110-130. DOI: 10.1111/bmsp.12228.

Terzi, R., & de la Torre, J. (2018). An Iterative Method for Empirically-Based Q-Matrix Validation. International Journal of Assessment Tools in Education, 248-262. DOI: 10.21449/ijate.40719.

Tu, D., Chiu, J., Ma, W., Wang, D., Cai, Y., & Ouyang, X. (2022). A multiple logistic regression-based (MLR-B) Q-matrix validation method for cognitive diagnosis models: A confirmatory approach. Behavior Research Methods. DOI: 10.3758/s13428-022-01880-x.

Examples

################################################################
#                           Example 1                          #
#             The GDI method to validate Q-matrix              #
################################################################
set.seed(123)

library(Qval)

## generate Q-matrix and data
K <- 4
I <- 20
example.Q <- sim.Q(K, I)
IQ <- list(
  P0 = runif(I, 0.0, 0.2),
  P1 = runif(I, 0.8, 1.0)
)
example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ,
                         model = "GDINA", distribute = "horder")

## simulate random mis-specifications
example.MQ <- sim.MQ(example.Q, 0.1)


## using MMLE/EM to fit CDM model first
example.CDM.obj <- CDM(example.data$dat, example.MQ)

## using the fitted CDM.obj to avoid extra parameter estimation.
Q.GDI.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "GDI")


## also can validate the Q-matrix directly
Q.GDI.obj <- validation(example.data$dat, example.MQ)

## item level iteration
Q.GDI.obj <- validation(example.data$dat, example.MQ, method = "GDI",
                        iter.level = "item", maxitr = 150)

## search method
Q.GDI.obj <- validation(example.data$dat, example.MQ, method = "GDI",
                        search.method = "ESA")

## cut-off point
Q.GDI.obj <- validation(example.data$dat, example.MQ, method = "GDI",
                        eps = 0.90)

## check QRR
print(zQRR(example.Q, Q.GDI.obj$Q.sug))




################################################################
#                           Example 2                          #
#             The Wald method to validate Q-matrix             #
################################################################
set.seed(123)

library(Qval)

## generate Q-matrix and data
K <- 4
I <- 20
example.Q <- sim.Q(K, I)
IQ <- list(
  P0 = runif(I, 0.0, 0.2),
  P1 = runif(I, 0.8, 1.0)
)
example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA",
                         distribute = "horder")

## simulate random mis-specifications
example.MQ <- sim.MQ(example.Q, 0.1)


## using MMLE/EM to fit CDM first
example.CDM.obj <- CDM(example.data$dat, example.MQ)

## using the fitted CDM.obj to avoid extra parameter estimation.
Q.Wald.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "Wald")


## also can validate the Q-matrix directly
Q.Wald.obj <- validation(example.data$dat, example.MQ, method = "Wald")

## check QRR
print(zQRR(example.Q, Q.Wald.obj$Q.sug))




################################################################
#                           Example 3                          #
#             The Hull method to validate Q-matrix             #
################################################################
set.seed(123)

library(Qval)

## generate Q-matrix and data
K <- 4
I <- 20
example.Q <- sim.Q(K, I)
IQ <- list(
  P0 = runif(I, 0.0, 0.2),
  P1 = runif(I, 0.8, 1.0)
)
example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA",
                         distribute = "horder")

## simulate random mis-specifications
example.MQ <- sim.MQ(example.Q, 0.1)


## using MMLE/EM to fit CDM first
example.CDM.obj <- CDM(example.data$dat, example.MQ)

## using the fitted CDM.obj to avoid extra parameter estimation.
Q.Hull.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "Hull")


## also can validate the Q-matrix directly
Q.Hull.obj <- validation(example.data$dat, example.MQ, method = "Hull")

## change PVAF to R2 as fit-index
Q.Hull.obj <- validation(example.data$dat, example.MQ, method = "Hull", criter = "R2")

## check QRR
print(zQRR(example.Q, Q.Hull.obj$Q.sug))




################################################################
#                           Example 4                          #
#             The MLR-B method to validate Q-matrix            #
################################################################
set.seed(123)

library(Qval)

## generate Q-matrix and data
K <- 4
I <- 20
example.Q <- sim.Q(K, I)
IQ <- list(
  P0 = runif(I, 0.0, 0.2),
  P1 = runif(I, 0.8, 1.0)
)
example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA",
                         distribute = "horder")

## simulate random mis-specifications
example.MQ <- sim.MQ(example.Q, 0.1)


## using MMLE/EM to fit CDM first
example.CDM.obj <- CDM(example.data$dat, example.MQ)

## using the fitted CDM.obj to avoid extra parameter estimation.
Q.MLR.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "MLR-B")


## also can validate the Q-matrix directly
Q.MLR.obj <- validation(example.data$dat, example.MQ, method  = "MLR-B")

## check QRR
print(zQRR(example.Q, Q.Hull.obj$Q.sug))



[Package Qval version 1.0.0 Index]