Parallel analysis with SPSS to determine number of factors (Part 1)

 In a previous post, I discussed various options for determining the number of common factors that may account for the intercorrelations among a set of measured variables (or indicators). One of the best performing methods for identifying the number of factors during exploratory factor analysis (EFA) is parallel analysis. This simulation-based method begins by generating at random a large number of correlation matrices assuming independent variates, computing the set of *eigenvalues from those matrices, and then finding the mean or the 95th percentile of the eigenvalues associated with each component or factor. Following, the mean (or 95th percentile) of the random eigenvalues for each component/factor is compared against those computed from your data. The number of factors you decide to retain is equal to the number with eigenvalues computed from your data that exceed the randomly generated eigenvalues. 

As I mentioned previously, there is debate as to whether one should use the eigenvalues from an unreduced correlation matrix or from a reduced correlation matrix; and the issue appears unresolved at this time. In this post (and the next), I will show you ways you can perform parallel analysis in SPSS using either an unreduced or reduced correlation matrix. In the example below, I will show you how you can use a web utility (found at https://analytics.gonzaga.edu/parallelengine/) for generating random eigenvalues associated with a PCA solution and comparing them against those from your own data. Unfortunately, this utility is not particularly useful for SPSS users when the extraction approach is principal axis factoring (PAF). In my next post, I will show you how to use syntax (I have slightly modified to improve ease of use) by Brian O'Connor that will allow you to perform parallel analysis when using either PCA or PAF. 

*Note: You can think of eigenvalues as representing variance in a set of measured variables. Each successive component or factor is extracted such that it is orthogonal to all previous components/factors and explains proportionately less variation than the preceding components/factors. See Brown (2006).

-----------------------------------------------------------------------------------------------------------------------------

 To help you to better visualize what occurs with parallel analysis, I have created two scree plots in RStudio (using the psych package) that include both observed and randomly generated eigenvalues. 

The plot below was created using the unreduced correlation matrix for a set of 12 measured variables, where the eigenvalues were derived using Principal components (PC) extraction. The blue line contains eigenvalues that are calculated from the raw data (by way of the unreduced correlation matrix). The red line contains randomly generated eigenvalues. As you can see, the red line containing the randomly generated eigenvalues appears almost horizontal and just has a slight slope, which forms the expectation for the trend in the eigenvalues assuming the measured variables are uncorrelated. The number of factors to retain is based on the number of components (see Component number on the X-axis) with eigenvalues (see Y-axis) that exceed those that were randomly generated. The plot suggests a three-factor solution, given that the eigenvalues for the first three components exceed the randomly generated eigenvalues for those same components.   



The next plot contains eigenvalues computed from a reduced correlation matrix, with the extraction method being principal axis factoring. In this analysis, we see the number of recommended factors is four. Perhaps the difference reflects in results reflects conditions discussed by Crawford et al. (2010). 


Unfortunately, we are unable to generate plots like this directly in SPSS without a lot of undue difficulty (even with O'Connor's original syntax). Thus, the strategy I describe below (and in my next posting will simply involve direct comparisons of eigenvalues without the visualizations aids you see above. Maybe some day (I'm not holding my breath!), SPSS will add in these kinds of features.  

---------------------------------------------------------------------------------------------------------------------------

SPSS Example

*A video walk-through is also provided here: Youtube link

For this example, we will use the SPSS data file I describe in this blog post. The data is comprised of n=2951 (simulated) observations involving 12 items from a scale. [As described by Finch and West (1997), three factors are assumed to account for the interrelations among the items. Of course, often when performing EFA you may not have a clear idea of how many factors to expect!] Before we perform our analysis, let's look at (a subset of) the data as it appears in SPSS...


Our first step is to generate the eigenvalues from our data. Using the drop-down menus...


Since we will be comparing eigenvalues using a PCA solution, we can leave the defaults below 'as is'.



Be sure to click 'Univariate descriptives' under the Descriptives menu. You will need the total effective sample size for your analysis. (In the screenshot below, I have selected several additional options I typically select for diagnostic purposes.)

Note: If you have missing data, the program defaults to listwise deletion. [If you go under Options (not shown), you have the option of pairwise deletion and impute means (for missing values). The former introduces ambiguity regarding sample size since sample sizes may vary across correlations due to differences in missing data on each pair of variables.] We are sticking with this default.



Output

The first table we see here contains our descriptives, including the total effective sample size. The n's for the items in this survey are all the same.  


The output contained in the Total column (under Initial eigenvalues) contains all the eigenvalues from the unreduced correlation matrix. As will always be the case, the first component will have the largest eigenvalue (and account for the largest percentage of the total variation); the second component will have the next largest eigenvalue (and account for the next largest percentage of variation); and so on. We will compare the randomly generated eigenvalues against these eigenvalues.

Generating random eigenvalues

Next, let's generate the random eigenvalues for comparison against those generated from our data. We need to go to https://analytics.gonzaga.edu/parallelengine/. Type in the number of measured variables (which will be items if doing an item-based factor analysis), the sample size (which we collected from the Descriptives portion of our SPSS output), and the number of random correlations to generate (I have increased it from 100 to 1000). Leave the Type of Analysis setting to Principal components. By default, as you type in or change values on the left, the web utility generates the mean and 95th percentile of randomly generated eigenvalues. By default, the Seed is set to 1000 (unless you change it). 


Comparison and judgment

For this demonstration, I will use the means of the randomly generated eigenvalues for determining the number of factors. [I could've easily chosen the 95th percentile of the eigenvalues. Using that approach would provide a somewhat more conservative approach during decision-making.] 

The eigenvalues from our data (right) exceed the means [and indeed even the 95th percentile] of the randomly generated eigenvalues for the first three components. That is, 3.059 > 1.1039 (c1), 1.985 > 1.0779 (c2), 1.200 > 1.057 (c3). The randomly generated eigenvalue for component four (1.039) exceeds the eigenvalue for component four from our data (.862). These results suggest a three-factor model is the best representation of the data.


Note that the eigenvalues from the random data (left) are essentially those connected by the red line in our previous graph (generated using the psych package in RStudio; see also below). The eigenvalues from our data (right) are those connected by the blue line in that graph.


 

As I have discussed here, it is a wise idea to consider multiple sources of evidence when deciding upon the number of factors that will best represent the data. And, indeed, as we saw in our earlier plots where random eigenvalues were generated from a full- and reduced correlation matrix, there was a difference in the number of factors suggested. This is where consideration of those multiple sources of evidence (including the interpretability of rotated factors under different factor models) becomes important.  

My next post covers parallel analysis involving a reduced correlation matrix.

References

Brown, T. A. (2006). Confirmatory factor analysis for applied research. The Guilford Press.

Crawford, A. V., Green, S. B., Levy, R., Lo, W. J., Scott, L., Svetina, D., & Thompson, M. S. (2010). Evaluation of parallel analysis methods for determining the number of factors. Educational and Psychological Measurement, 70, 885–901.

William Revelle (2023). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois. R package version 2.3.3, https://CRAN.R-project.org/package=psych.





Comments

Popular posts from this blog

Factor analysis of EBI items: Tutorial with RStudio and EFA.dimensions package

Process model 7 using Hayes Process macro with RStudio

Multilevel path analysis in lavaan using RStudio