Parallel analysis with SPSS to determine number of factors (Part 2)

 In my previous post on parallel analysis (PA) using SPSS, I demonstrated how to use SPSS with an online parallel analysis engine here to arrive at an estimate of the number of factors to extract during exploratory factor analysis (EFA). This engine works well when you are performing PA using principal components analysis (PCA), whereby eigenvalues associated from unreduced (real and simulated) correlation matrices (see Horn, 1965) are compared. In this case, the observed eigenvalues for all principal components are printed out in SPSS by default under the Initial Eigenvalues column. An alternative approach to PA involves comparing eigenvalues from reduced (real and simulated) correlation matrices (see link). The factoring method in this case is generally principal axis factoring (PAF). Although SPSS and the PA engine both include the option to use PAF, SPSS does not produce a complete set of eigenvalues from one's data for comparison against those randomly generated using the PA engine. 

An alternative approach is to utilize a syntax file that was written for performing the analysis by Brian O'Connor. I have a fairly recent video demonstration of how to use this file. A copy of the SPSS data used in that video can be obtained here.

In the interest of simplifying the syntax and reducing the likelihood of you obtaining warning/error messages, I have modified the code a bit. You can download my adapted syntax file here

My adaptation has removed options for typing in a correlation matrix (based on observed correlations among a set of variables) or reading data from a saved file on your computer. The only option available is to perform the analysis on a data set that is currently open active in memory - which is most likely going to be the case when you are seeking to perform PA. A second change to the file I made is the removal of code for generating a plot of the observed and random eigenvalues. To generate a plot using the original code, you had to specify a location on your computer to save the data (save results /outfile= 'screedata.sav' / var=root rawdata means percntyl .) to be used in plotting and then to call that file back up (GET file= 'screedata.sav'.). This required modification of the original code so relevant pointers to the folders and files are added. Since many people are not familiar with the process of adding pointers to different locations and files on their computers, I removed this option to reduce unnecessary frustration. 

For the demonstration of how to use the adapted syntax file, we will be continuing with the PA demonstration that was begin here.

Step 1: Open the dataset containing your raw data.

Step 2: Find the syntax file on your computer and open it up.



Step 3: Scroll down the open syntax file to line #61. Type the names of the variables (as they are in the dataset) you are factor analyzing in row #61 after VAR = . For the number of simulated datasets, set this equal to at least 1000 (line #64). For the desired percentile (line #67), leave as 95. In line #72, set kind = 2. [If you set it to 1, then the parallel analysis will be for a PCA.] My suggestion is to leave line #77 set to the default (i.e., randtype=1). DO NOT CHANGE ANYTHING ELSE IN THE SYNTAX FILE!


Step 4: Go to Edit and click Select All.


Step 5: Once everything highlights in blue, then click the big GREEN arrow to run the syntax.


Step 6: Interpretation of output.


The first column (Root) refers to the factor number. The values in the second column (Raw Data) are the eigenvalues generated from your raw data. The third column contains the means of the eigenvalues from the randomly generated matrices. The fourth column contains the 95th percentiles for the eigenvalues from the randomly generated matrices. Compare either the third or fourth column values against those in the Raw Data column. Retain only those factors with eigenvalues in the Raw Data column exceed the randomly generated eigenvalues (in either of the last two columns).

Let's say we wish to use the means of the randomly generated eigenvalues. Comparing the values in columns 2 and 3, we see that the raw data eigenvalues for the first four factors (i.e., 2.3679, 1.2443, .43498, .10534) exceed the corresponding randomly generated eigenvalues (.1086, .0822, .06996, .02468). We see the randomly generated eigenvalue (.02556) for the fifth factor exceeded the corresponding eigenvalue from the raw data (-.0254). This suggests retention of four factors. 


If we use the 95th percentile of randomly generated eigenvalues for the comparison, we again would see suggested retention of four factors.


Unlike what we saw with the PCA approach (which suggested a three factor model), the current analysis suggests an additional factor. As mentioned here, there may be some indication that PA using reduced correlation matrices could lead to overfactoring. This warning (written by O'Connor) appears in the original output generated using PAF, and in the current output based on my adapted syntax file. 


Shortcut...

If the variables you are submitting to factor analysis are all adjacent to each other, you can use the following modification of the syntax:




References

Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185.

O'Connor, B. P. (2000). SPSS and SAS programs for determining the number of components using parallel analysis and Velicer's MAP test. Behavior Research Methods, Instruments & Computers, 32(3), 396–402.

Comments

Popular posts from this blog

Factor analysis of EBI items: Tutorial with RStudio and EFA.dimensions package

Process model 7 using Hayes Process macro with RStudio

Multilevel path analysis in lavaan using RStudio