Introduction

In psychometrics the development of a scale that measures a single latent variable is a very desirable outcome, because it allows for unambiguous interpretation of a total score across the items of the scale. Such a scale is referred to as a unidimensional scale or an homogenous scale.

In this tutorial I demonstrate how we can use factor analysis to examine whether a scale measures a single latent variable. As an example I examine the fit of a single factor model to the five-item Neuroticism Scale of the Big Five Inventory. The five items are supposedly indicators of a single latent variable. From a factor analysis perspective, the five items are correlated, because they share the same underlying latent variable. Indeed, a factor analyst might argue that the five items are correlated “because” they are influenced by the same latent variable (and only that latent variable). This line of reasoning implies that if the influence of the latent variable (or factor) is removed, the items will no longer be correlated.

The items

The psychTools package of Bill Revelle includes the bfi data set, which contains the responses of 2800 people to the items of the Big Five Inventory. Here we focus on the five Neuroticism items, which are located in columns 16 to 20 of the bfi data frame. People respond to the items on a six-point scale.

N1: Get angry easily.
N2: Get irritated easily.
N3: Have frequent mood swings.
N4: Often feel blue.
N5: Panic easily.

Do these items measure a single latent variable (which we might label Neuroticism)? Does a single latent variable or factor account for the covariances/correlations of these items?

Observed correlations

We start by examining the correlations of the five items. If they all share the same underlying latent variable, they should all be correlated. We observe that the item correlations range from 0.35 (items N2 and N5) to 0.71 (items N1 and N2). These correlations suggest that the five items have something in common.

data(bfi)

mycor <- cor(bfi[16:20], use = "complete.obs")
mycor

          N1        N2        N3        N4        N5
N1 1.0000000 0.7057205 0.5559276 0.3985620 0.3768102
N2 0.7057205 1.0000000 0.5454604 0.3902320 0.3523076
N3 0.5559276 0.5454604 1.0000000 0.5180478 0.4279769
N4 0.3985620 0.3902320 0.5180478 1.0000000 0.3975710
N5 0.3768102 0.3523076 0.4279769 0.3975710 1.0000000

corPlot(bfi[16:20], 
        upper = FALSE, 
        scale = TRUE,
        cex = .8,
        main = "Correlation plot of the Neuroticism items")

Is the correlation matrix an identity matrix?

Bartlett (1951) introduced a test of the null hypothesis that a matrix is an identity matrix. An identity matrix is one where the diagonal contains 1’s, and the off-diagonal elements are zeroes. When applied to correlation matrices, Bartlett’s test is used to test the hypothesis that all the correlations are equal to zero (within error).

Bartlett’s test is distributed as chi-square if the observed correlation matrix is an identity matrix. Our results show that \(\chi^2\)(10) = 4904.381, p = 0, which means that we can reject the null hypothesis that the observed correlation matrix is an identity matrix.

Note that Bartlett’s test is most often used in exploratory factor analysis contexts. Also, on condition that the researcher has a good substantive reason to believe that the manifest variables share one or more latent variables, the test will only in very rare situations fail to reject the null hypothesis. This will almost certainly indicate that the data are of very poor quality or that the sample size is very small.

cortest.bartlett(mycor, n = 2800)

$chisq
[1] 4904.381

$p.value
[1] 0

$df
[1] 10

Is the correlation matrix factorable?

Kaiser (1970) introduced a measure of sampling adequacy (MSA) as an index of whether a correlation matrix is factorable. The index is referred to in the literature as the Kaiser-Meyer-Olkin (KMO) index. Values closer to 1 indicate better factor-ability. The KMO index for the correlations of the Neuroticism items was 0.80.

The KMO index is mostly used in exploratory factor analysis contexts. However, it does no harm to examine it and it can alert us to potentially poor quality data.

KMO(mycor)

Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = mycor)
Overall MSA =  0.8
MSA for each item = 
  N1   N2   N3   N4   N5 
0.75 0.75 0.83 0.83 0.88

Fitting the single factor model

Specifying the single factor model

We specify the single factor model using lavaan syntax. We start by giving the model a name. We specify the model in between two single quotation marks. We give the latent variable or factor a name (here we chose to label the factor “Neuroticism”). We use the “=~” operator to specify that the items that follow are the indicators of the factor. Note that the names of these items must correspond to the names they have in the data frame. By default, lavaan will fix the correlations of the unique variances of the manifest variables to zero.

model1 <- '
Neuroticism =~ N1 + N2 + N3 + N4 + N5
'

Fitting the single factor model to the data

We use the cfa() function of the lavaan package to fit our confirmatory factor analysis model to the observed data. To identify the model and to set the metric of the latent variable, we have to introduce a constraint in our model. We can choose from two very common constraints: (a) fix one of the unstandardized factor loadings to unity, or (b) fix the variance of the latent variable to unity. By default, lavaan will fix the first unstandardized factor loading to unity (in our example this will be the factor loading of item N1).

If we would rather fix the variance of the latent variable to unity, we have to add the argument std.lv = TRUE, as I have done here.

The parameters of the model are estimated with maximum likelihood, which assumes multivariate normality of the data. This assumption is almost always violated when we work with psychological test items. The consequences are that the standard errors of the parameters are underestimated, which leads to p-values that are too low, and that the model is made to look as if it fits more poorly than it really does. Against this background we employ a robust maximum likelihood estimator (MLR), which will yield appropriate standard errors.

fit.model1 <- cfa(model     = model1,
                  data      = bfi,
                  std.lv    = TRUE,
                  estimator = "MLR")

Examining the results

The fit of the model

We start by inspecting the fit statistics of the model. First, our results show that the null hypothesis of perfect fit has to be rejected: \(\chi^2\)(5) = 360.932, p = 0. The TLI suggests unsatisfactory fit (TLI = 0.849), but the CFI suggests marginally acceptable fit (CFI = 0.925). The RMSEA suggests poor fit (RMSEA = 0.163). Finally, the SRMR suggests marginally acceptable fit (SRMR = 0.056).

The parameters of the model

The estimated factor loadings reflect the strength of the relations between the latent variable and the manifest variables. These factor loadings are regression coefficients–they show by how many units the scores on the items change if the latent variable were to increase by one unit. Note that there are three sets of factor loadings: (a) the unstandardized factor loadings, (b) factor loadings where the latent variable is standardized, but the items are unstandardized, and (c) factor loadings where both the latent variable and the items are standardized. If the model was identified by fixing the variance of the latent variable to unity, (a) and (b) will be the same. The fully standardized factor loadings, which can be found in the column labeled “Std.all” are typically interpreted.

summary(fit.model1, 
        standardized = TRUE,
        fit.measures = TRUE)

lavaan 0.6-19 ended normally after 19 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        10

                                                  Used       Total
  Number of observations                          2694        2800

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                               360.932     313.521
  Degrees of freedom                                 5           5
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.151
    Yuan-Bentler correction (Mplus variant)                       

Model Test Baseline Model:

  Test statistic                              4724.621    3492.154
  Degrees of freedom                                10          10
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.353

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.925       0.911
  Tucker-Lewis Index (TLI)                       0.849       0.823
                                                                  
  Robust Comparative Fit Index (CFI)                         0.925
  Robust Tucker-Lewis Index (TLI)                            0.849

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -23078.504  -23078.504
  Scaling correction factor                                  1.007
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)     -22898.038  -22898.038
  Scaling correction factor                                  1.055
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                               46177.007   46177.007
  Bayesian (BIC)                             46235.995   46235.995
  Sample-size adjusted Bayesian (SABIC)      46204.222   46204.222

Root Mean Square Error of Approximation:

  RMSEA                                          0.163       0.151
  90 Percent confidence interval - lower         0.149       0.138
  90 Percent confidence interval - upper         0.177       0.165
  P-value H_0: RMSEA <= 0.050                    0.000       0.000
  P-value H_0: RMSEA >= 0.080                    1.000       1.000
                                                                  
  Robust RMSEA                                               0.162
  90 Percent confidence interval - lower                     0.147
  90 Percent confidence interval - upper                     0.178
  P-value H_0: Robust RMSEA <= 0.050                         0.000
  P-value H_0: Robust RMSEA >= 0.080                         1.000

Standardized Root Mean Square Residual:

  SRMR                                           0.056       0.056

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Neuroticism =~                                                        
    N1                1.286    0.025   50.920    0.000    1.286    0.818
    N2                1.225    0.025   49.597    0.000    1.225    0.803
    N3                1.147    0.028   41.182    0.000    1.147    0.717
    N4                0.872    0.033   26.438    0.000    0.872    0.554
    N5                0.813    0.034   24.170    0.000    0.813    0.502

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .N1                0.819    0.048   16.954    0.000    0.819    0.331
   .N2                0.828    0.046   18.124    0.000    0.828    0.356
   .N3                1.245    0.052   23.807    0.000    1.245    0.486
   .N4                1.714    0.055   31.118    0.000    1.714    0.693
   .N5                1.968    0.057   34.381    0.000    1.968    0.748
    Neuroticism       1.000                               1.000    1.000

Correlation residuals

Recall that a single factor model specifies that a single latent variable accounts for the correlations/covariances of the items, which implies that if the influence of the factor is removed the items will be uncorrelated.

The obtained factor loadings are used to find the model implied or reproduced correlations of manifest variables. This is done via matrix algebra by multiplying the matrix of factor loadings with its transpose. The differences between the observed correlations and the model implied correlations (i.e. the correlations produced by the factor) are known as the residuals. If the factor indeed accounts for the observed correlations of the manifest variables these residuals should be close to zero. Our results show that there are some residuals that clearly deviate from zero.

myresiduals <- residuals(fit.model1, type = "cor")
myresiduals

$type
[1] "cor.bollen"

$cov
       N1     N2     N3     N4     N5
N1  0.000                            
N2  0.049  0.000                     
N3 -0.030 -0.030  0.000              
N4 -0.055 -0.055  0.121  0.000       
N5 -0.034 -0.050  0.068  0.120  0.000

corPlot(myresiduals$cov,
        upper = FALSE,
        cex = 0.8)

Modification indices

Modification indices are used to identify model constraints that contribute toward model misfit. In the case of a single factor model the most basic constraint is that the unique variances of the manifest items are uncorrelated. The unique variance of a manifest variable reflects the part of its variance that is not explained by the common factor. The unique variances should be uncorrelated, because the hypothesis of a single factor model is that a single latent variable accounts for all the covariances/correlations of the manifest items. Correlated unique variances are a sign that there is some shared variance (or covariance) that is not explained by the single factor model.

In our example the modification indices indicate that the constraint of uncorrelated unique variances of items N1 and N2 contribute heavily toward the misfit of the model. The observed mi = 389.209, is an estimate of how much the chi-square statistic will reduce if the unique variances of these two items are allowed to correlate.

modificationindices(fit.model1)

   lhs op rhs      mi    epc sepc.lv sepc.all sepc.nox
12  N1 ~~  N2 389.209  0.830   0.830    1.008    1.008
13  N1 ~~  N3  59.293 -0.294  -0.294   -0.291   -0.291
14  N1 ~~  N4  74.042 -0.285  -0.285   -0.241   -0.241
15  N1 ~~  N5  23.730 -0.166  -0.166   -0.131   -0.131
16  N2 ~~  N3  47.931 -0.251  -0.251   -0.247   -0.247
17  N2 ~~  N4  63.384 -0.255  -0.255   -0.214   -0.214
18  N2 ~~  N5  46.372 -0.225  -0.225   -0.176   -0.176
19  N3 ~~  N4 170.279  0.443   0.443    0.304    0.304
20  N3 ~~  N5  48.325  0.248   0.248    0.158    0.158
21  N4 ~~  N5  87.621  0.360   0.360    0.196    0.196

Fitting a modified single factor model

Specifying the modified single factor model

We modify our model by allowing the unique variances of items N1 and N2 to correlate.

model2 <- '
Neuroticism =~ N1 + N2 + N3 + N4 + N5

# Allow the unique variances of N1 and N2 to correlate
N1 ~~ N2
'

Fitting the modified model to the data

options(scipen = 999)
fit.model2 <- cfa(model  = model2,
                  data   = bfi,
                  std.lv = TRUE)

Examining the results

The fit of the modified model

Our results show that the null hypothesis of perfect fit has to be rejected: \(\chi^2\)(4) = 31.505, p = 0.0000024. The TLI = 0.985 suggests excellent fit , as does the CFI = 0.994. Both the RMSEA = 0.051 and the SRMR = 0.017 suggests very good fit.

The parameters of the modified model

The fully standardized factor loadings of items N1 and N2 of the modified model are noticeably lower than their corresponding loadings in the single factor model. This is so because the variance that these two variables share above and beyond the factor of interest was ‘shunted’ into their factor loadings (after all, the shared variance has to go somewhere and that was the only place it could go).

options(scipen = 999)
summary(fit.model2, 
        standardized = TRUE,
        fit.measures = TRUE)

lavaan 0.6-19 ended normally after 23 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        11

                                                  Used       Total
  Number of observations                          2694        2800

Model Test User Model:
                                                      
  Test statistic                                31.505
  Degrees of freedom                                 4
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

  Test statistic                              4724.621
  Degrees of freedom                                10
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.994
  Tucker-Lewis Index (TLI)                       0.985

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -22913.790
  Loglikelihood unrestricted model (H1)     -22898.038
                                                      
  Akaike (AIC)                               45849.580
  Bayesian (BIC)                             45914.467
  Sample-size adjusted Bayesian (SABIC)      45879.517

Root Mean Square Error of Approximation:

  RMSEA                                          0.051
  90 Percent confidence interval - lower         0.035
  90 Percent confidence interval - upper         0.068
  P-value H_0: RMSEA <= 0.050                    0.443
  P-value H_0: RMSEA >= 0.080                    0.002

Standardized Root Mean Square Residual:

  SRMR                                           0.017

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Neuroticism =~                                                        
    N1                1.054    0.031   34.507    0.000    1.054    0.670
    N2                0.996    0.030   33.360    0.000    0.996    0.653
    N3                1.307    0.030   43.741    0.000    1.307    0.817
    N4                0.997    0.030   32.767    0.000    0.997    0.634
    N5                0.893    0.032   27.781    0.000    0.893    0.551

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
 .N1 ~~                                                                 
   .N2                0.644    0.039   16.308    0.000    0.644    0.477

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .N1                1.363    0.049   28.063    0.000    1.363    0.551
   .N2                1.337    0.047   28.728    0.000    1.337    0.574
   .N3                0.853    0.048   17.744    0.000    0.853    0.333
   .N4                1.480    0.049   30.292    0.000    1.480    0.598
   .N5                1.832    0.056   32.674    0.000    1.832    0.697
    Neuroticism       1.000                               1.000    1.000

Correlation residuals

The matrix of correlation residuals shows that the modified single factor model gives a very good account of the observed covariances/correlations of the five items.

myresiduals2 <- residuals(fit.model2, type = "cor")
myresiduals2

$type
[1] "cor.bollen"

$cov
       N1     N2     N3     N4     N5
N1  0.000                            
N2  0.000  0.000                     
N3  0.009  0.012  0.000              
N4 -0.026 -0.023  0.000  0.000       
N5  0.008 -0.007 -0.022  0.049  0.000

corPlot(myresiduals2$cov,
        upper = FALSE,
        cex = 0.8)

A note about correlated unique variances

In general it is not a good idea to allow the unique variances of manifest variables to correlate. It can easily become a mindless escape out of a poorly fitting model. If however, there are good substantive grounds for allowing such correlations, then they might help to obtain better estimates of the parameters of interest. In our example, the factor loadings of items N1 and N2 in the modified model reflect their relation with the latent variable after we have controlled for the unwanted variance that these two items share.

Inspection of the content of items N1 and N2 shows that both reflect aspects of being “short-tempered”. Whereas our results show that responses to these items are driven by the Neuroticism factor, it is also conceivable that being “short-tempered” is driven by an “Aggression” factor that is independent of the Neuroticism factor. We don’t have enough items to properly serve as indicators of such and Aggression factor and our only recourse is to allow the unique variances of these items to correlate Another recourse would be to delete one of the two offending items. This might be the best course of action in the test development phase or if a test consists of many items. In our example, which has only five items, deleting an item would be a drastic step.

Allowing the unique variances of items N1 and N2 to correlate gave us insight into the functioning of the 5-item Neuroticism scale and has given us clues about how it might be improved in future.

References

Bartlett, M. S., (1951). The effect of standardization on a chi square approximation in factor analysis, Biometrika, 38, 337-344.

Kaiser, H. F. (1970) A second generation little jiffy. Psychometrika, 35(4):401-415.

Revelle, W. (2024a). psych: Procedures for psychological, psychometric, and personality research. Northwestern University, Evanston, Illinois. R package version 2.4.6, https://CRAN.R-project.org/package=psych.

Revelle, W. (2024b). psychTools: Tools to accompany the ‘psych’ package for psychological research. Northwestern University, Evanston, Illinois. R package version 2.4.3, https://CRAN.R-project.org/package=psychTools.

Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1-36. https://doi.org/10.18637/jss.v048.i02