Process model 7 using Stata: Testing first-stage moderated mediation involving continuous X, W, M, and Y variables

The demonstrations on this page are based on Hayes' (2018) Process Model 7 that is one of many programmed into his Process Macro (link) and described in his book, Introduction to Mediation, Moderation, and Conditional Process Modeling: A Regression-based Approach (link to details). Process model 7 is effectively a model of first-stage moderated mediation, where the first path in the mediation sequence is moderated. The most basic version of this model can be represented both conceptually and statistically (Hayes, 2018). As we see in the conceptual model below, the indirect (or mediated) effect of the focal antecedent / independent variable (X) on the consequent / dependent variable (Y) via the mediator (M, the process variable) is assumed to be contingent on the level of a moderating variable (W).  



The statistical diagram shown below operationalizes the conceptual model in the form of two equations. The first equation involves estimating regression parameters from a model whereby the proposed mediator is regressed onto the independent variable (X), proposed moderator (W), and the product of the independent and moderator variables (XW). In effect, the first model is consistent with a standard moderated multiple regression. The second equation involves requires estimation of parameters from a model where the dependent variable is regressed onto the proposed mediator and independent variable. 



STATA Example

For our example, we will be testing the equivalent of Hayes Process Model 7 using a FREE DO-FILE (download here: link) that I have created. At this time, I know of no other way to easily run this type of model in Stata [and the only things I have seen online for how to run this model involves programming on your part; see Link]. The Do-file that I have created was developed to be easily modifiable to allow you to test Process Model 7 where you have single continuous X, M, Y, and W variables. However, make sure to only make changes where you are prompted to (otherwise, you run the risk of inaccurate output).

We will be testing the effect of student subject matter interest (X) on achievement (Y) via the mediator (M), engagement using this simulated data [download here: link). We will test whether self-efficacy moderates the mediation of the effect of interest on achievement is itself moderated by self-efficacy (W). We will also control for one covariate, socioeconomic status (SES), in the model. The diagram displayed below is a conceptual diagram of the relationships among our variables. The single-headed arrow from self-efficacy to the path emanating from interest and directed toward engagement represents the proposed moderation of that path by efficacy.



The diagram shown below is a statistical representation of the relationships among the variables in our model. The statistical diagram captures the equations that are comprise our model. 




The regression parameters associated with Equation 1 are estimated from an OLS regression, where the proposed mediator is regressed onto the independent variable, moderator, product or interaction term, and the covariate. This is a standard moderated multiple regression (that also includes a covariate). If the regression coefficient a3 is statistically significant, then this is considered evidence that the effect of the independent variable on the mediator is conditioned on the moderator. In the context of moderated multiple regression, this conditional effect is plotted to visualize the interaction and/or probed (using simple slopes or the Johnson-Neyman technique) in order to provide more description concerning the nature of the moderation. 





The regression parameters associated with Equation 2 are also estimated from an OLS regression. In this case, the dependent variable or consequent is regressed onto the independent variable and the proposed mediator.


Now, let's use the do-file (download here: link) to generate our results and interpret them. 

In the section below, you enter the variables you are analyzing inside the quotation marks following 'global varlist ='. Next, you enter any covariates for your model inside the quotation marks following 'global covs = '. If you have no covariates to include, then leave a space between the quotation marks.



The next section allows you to request mean centering of your X and W variables and your desired approach to conditioning values (for generating and testing simple slopes. Here, I have typed 0 for mean centering which means that X and W will NOT be mean centered prior to forming the product term. [I will discuss output based on mean-centering later.] I have also typed 1 to request the traditional approach to selecting conditioning values (i.e., mean-1sd, mean, mean+1sd).



The remaining portion of the do-file to complete involves options for the number of bootstrap samples, setting a global seed value if desired (which allows replication of your bootstrap results), and requesting a plot of the simple slopes. Below, I have requested 1000 bootstrap replications. I have typed 0 so that a random seed number is generated (consistent with the default in Hayes' Process macro). I have also typed 1 to request a plot of simple slopes (based on the conditioning values indicated above).



Once you have completed this section, then click Execute (do) button. 



Making sense of the results

These results are from the first regression model where the mediator (engagement) is regressed onto the focal independent variable (interest), moderator (efficacy), the product term (i.e., interest X efficacy), and the covariate (SES). 




The slope associated with the interaction term [i.e., interest X efficacy] is  significantly different from 0 (b=.0288, s.e.=.0099, p=.004), which is consistent with the assumption that the effect of student interest on engagement is moderated by self-efficacy. [That is, one should not treat the slope for interest to be invariant across levels of the moderator. Rather, the slope is conditional on the level of the moderating variable.] Note, however, that interactions are symmetric meaning that the results given above are also consistent with the possibility that the effect of self-efficacy on engagement is moderated by interest. Ultimately, the interpretation you provide regarding which variable is the focal independent variable and which is the moderator should be based on the roles you give to the two variables based on your conceptual/theoretical framework (Darlington & Hayes, 2017; Hayes, 2018.

The slopes for each of the variables forming the product term quantifies the predictive relationship between that variable and the mediator (engage) for those cases with a value of 0 on the other predictor (Hayes, 2018). The slope (-.092) for interest quantifies the predictive relationship between interest and engagement for cases with a value of 0 on efficacy. The slope (-.0599) for efficacy represents the relationship between efficacy and engage for cases with a value of 0 on interest.

When 0 falls outside of the observed range of data (as is the case in this example) for the predictors forming the product term, there will be no substantive interpretation for the slopes (Darlington & Hayes, 2017). As such, there is little value in spending time attempting to interpret these slopes in terms of meaning or significance. However, when 0 does fall within the observed range of the data, then you can meaningfully interpret the slopes and tests. [Later, we will re-run the analysis using mean-centering, which will allow for substantive interpretation of these slopes.] 

One other thing to note: Researchers often interpret the effects of the predictor variables that comprise the product term as "main effects" in the context of moderated multiple regression. You should avoid using that term since these regression slopes actually represent the predicted relationship at one specific value of the other variable - i.e., 0. 

Finally, we see that SES is a positive and significant predictor (b=.4198, s.e.=.0443, p<.001) of engagement holding constant the remaining predictors in the model.

The results above provide evidence consistent with moderation of the path from interest to engagement by self-efficacy. Typically, one will follow up this type of finding by probing the interaction. 

The output below are simple slopes [computed at mean-1sd, mean, and mean+1sd of the moderator] for the effect of interest on engagement at three relative levels of self-efficacy, where mean-1sd represents (which is 10.95186) cases relatively low on efficacy, the mean (15.24785) represents average, and mean+1sd (19.54384) represents cases relatively high on efficacy. 

The dy/dx column contain the simple slopes at the aforementioned levels of the moderator. Significance tests of the slopes are also provided in the form of t-tests. We see here that all three slopes are positive and statistically significant (with p's = or < .001). Notably, the slopes do appear to become increasingly positive as we move from lower levels of self-efficacy (the moderator) to higher levels of efficacy. At 1sd below the mean (i.e., 10.952) of efficacy, the slope is .223. At the mean (i.e., 15.249) of efficacy, the slope is .347. At 1sd above the mean (i.e., 19.543), the slope is roughly .471. As you can see, .223 <.347 <.471.



Below is the plot of simple slopes we requested using the conditioning values of mean-1sd, mean, and mean+1sd. 




The green line is the slope for a hypothetical cases falling at 1sd above the mean, whereas the blue line is the slope for cases falling at 1sd below the mean on efficacy. The red line represents the slope for cases falling at the mean of efficacy. The slopes for these lines correspond to the values shown in the dy/dx column in the previous table. 

The next set of results are based on the second regression model, where achievement (the dependent variable) was regressed onto interest (the independent variable) and engagement (the mediator). We see that the effect of the mediator (engagement) on achievement - i.e., path b - is positive and significantly different from 0 (b=.1964, s.e.=.0352, p<.001). We also see that direct effect (path c-prime) of interest on achievement is positive and significant (b=.4195, s.e.=.054, p<.001) as is the path from the SES to achievement (b=.2553, s.e.=.0524, p<.001).



If we substitute the values of the unstandardized regression coefficients from our two regression models into the statistical diagram shown before, we get a more comprehensive view of the parameter estimates for the model:




If we had not included a mediator in our model, then our statistical diagram would have looked like the diagram below. The coefficients for paths a and b could then be multiplied to quantify the indirect effect of X on Y through M - i.e., indirect effect = ab.  



In the model we are testing we are specifying that the indirect effect is moderated, meaning that the indirect effect is conditional on W. As such, we cannot quantify the indirect effect using a single number. Rather, the indirect effect is expressed as follows (using the path labels from the diagram below): conditional indirect effect =  (a1+a3W)b1 = a1b1 + a3b1W. In other words, the conditional indirect effect is a function of W: f(W) = a1b1 + a3b1W. As you can see, if we substitute 0 into the function, then the conditional indirect effect is a1b1. If we substitute 2 into the function, then the conditional indirect is a1b1 + 2a3b1. and so forth. 


Using the values from our diagram with values for the paths above, the function for the conditional indirect effect is: f(W) = (-.0924+.0288W)*.1964. If we substitute the mean-1sd of self-efficacy (i.e., 10.952) into the function for W, then the conditional indirect effect is:  f(10.952) = (-.0924+.0288*10.952)*.1964 = .0438.  If we substitute the mean of self-efficacy (i.e., 15.248) into the function, then the conditional indirect effect is: f(15.248) =  (-.0924+.0288*15.248)*.1964 = .0681. If we substitute the mean+1sd of self-efficacy (i.e., 19.544) into the function, the conditional indirect effect is: f(19.544) (-.0924+.0288*19.544)*.1964 = .0924. These are roughly the same values (small differences due to rounding error) found in the final portion of our output (below). The observed coefficient for CIElow is the conditional indirect effect at a single point on self-efficacy to represent a relatively low value. The coefficient for CIEmean is the conditional indirect effect computed at the mean on efficacy, which is a point designating a moderated level of efficacy. The coefficient for CIEhigh is the conditional indirect effect at a point on efficacy designating a relatively high level of efficacy.


These conditional indirect effects can be tested using the 95% percentile bootstrap confidence intervals associated with each estimate. If 0 (the null) does not fall between the lower and upper bound of a given interval, then we infer a difference between a conditional indirect effect and 0. If 0 falls between the lower and upper bound, then we infer the conditional indirect effect is not different from 0. 

We see that at 1sd below the mean on efficacy, the conditional indirect effect (of roughly .0439) is different from 0, since 0 falls outside of the confidence interval: 95%CI = (.0169,.0755). At the mean on efficacy, the conditional indirect effect (of roughly .0682) is different from 0, since 0 falls outside of the confidence interval: 95%CI = (.0386,.0999). At the mean+1sd on efficacy, the conditional indirect effect (of roughly .0925) is different from 0, since 0 falls outside of the confidence interval: 95%CI = (.0533,.1357). 

IMPORTANT! The conditional indirect effects we have just examined say nothing about whether the indirect effect of interest (X) on achievement (Y) through engagement (M) is moderated by efficacy (W). To make this determination, we will rely on the Index of Moderated Mediation (IMM), which quantifies the degree to which the conditional indirect effect is a linear function of the moderator (Hayes, 2015). If we go back to our function notation [f(W) = a1b1 + a3b1W] expressing the conditional indirect effect as a function of W, we see that a3b1 is that portion of the function indicates the change in the conditional indirect effect per unit increase on the moderating variable. Thus, the IMM for this model is a3b1. Based on the results path coefficients in our figure, the IMM is computed as: .0288*.1964 = .00566. This is reflected in our final output:

        


If the value of the IMM is 0, this means that the indirect effect of the focal independent variable (interest) on the dependent variable (achieve) is not linearly dependent on the proposed moderator. As with the conditional indirect effects in this table, we can also test whether the IMM is different from 0 using the percentile bootstrap confidence interval. If 0 (the null) does not fall between the lower and upper bound of the interval, then we reject the null and infer that the indirect effect is conditioned on the moderator. The percentile bootstrap confidence interval ranges from .00169 to .010287. Thus, we consider this as evidence of moderated mediation. The positive value of the IMM indicates that as we move from lower efficacy to higher efficacy, the indirect effect becomes increasingly positive. 

From an analytic standpoint, one generally start by testing IMM first to address the question of whether there is evidence of moderated mediation. Then, if the IMM is significant, then one proceeds to probing the conditional indirect effects (shown in the table). If the IMM does not indicate significance then one has evidence that the mediation is not moderated. Thus, one might consider re-specifying (see e.g., Hayes, 2018) the model without the moderated mediation (i.e., a conventional mediation model). 

EXAMINING OUTPUT AFTER REQUESTING MEAN-CENTERING WHEN USING DO-FILE

As noted earlier, the regression slopes for the variables used to form the product term are interpreted as the predicted relationship between a given variable and the consequent (in this case, the mediator) when the other variable is 0. If the value of 0 does not fall within the observed range of the data for your variable(s), then interpretation of the regression slopes for these variables is rendered meaningless (Darlington & Hayes, 2017). It IS possible to generate meaningful interpretations of the regression coefficients for these variables through the use of mean-centering. When a variable is mean centered, that original variable is converted [X-mean(X); W-mean(W)] into a new one where the values are deviations from the mean. The mean of the newly centered variables is 0. The standard deviations of the two variables will remain the same as with the original variables. 

The do-file contains an option to request mean centering. Simply type 1 following the equals sign.



The resulting output for the first regression model is:




Notice that the regression slope, the standard error, and t- and p-values for the interaction term (as well as for SES, the covariate) are all the same as before. The only differences in this output from the previous output is that the slopes for interest (now interest_mc, where mc indicates "mean-centered") and efficacy (now efficacy_mc) represent the predictive relationships between each of these variables at the mean of the other. 

The simple slopes and tests shown next are exactly the same as those we generated earlier with the original uncentered X and W variables. Notice in the legend that the point at 1sd above the mean on efficacy is 4.295987 (which equals the sd for efficacy and the mean-centered efficacy variable). The point at 1sd below the mean is computed simply as -1*4.295987 = -4.29587. The value of 4.72e-.08 is effectively 0 (the mean of the centered efficacy variable). 



Since we mean centered the efficacy (W) variable, we see that the values in the table at the mean of are exactly the same as those found in the main regression output for interest.



The plot of simple slopes is the same as before except that the X axis and conditioning values on W pertain to the centered variables. 


 
The regression slopes and significance tests for the second regression model are all the same. [The only difference in output being the difference in the intercept.]



Finally, the Index of Moderated Mediation (IMM) and conditional indirect effects found in the Observed Coefficient column below are all the exactly the same as before. [Any differences in the values for the bootstrap standard errors and upper and lower bounds of the percentile confidence intervals are due to randomly generated seed values, given the setting we used with the do-file. To obtain the same values every time you generate bootstrap results with the same data, you can set your own seed value.]



CHANGING CONDITIONING VALUES IN DO-FILE


The conditioning values for X and W that we have used so far (i.e., mean-1sd, mean, mean+1sd) for generating and testing simple slopes are very standard approaches to selecting points that are relatively low, medium, and high on those variables. One limitation of using these points, however, is that when your X and/or W variable is highly skewed, this can lead one to use conditioning values that fall outside the range of these variables. Hayes (2018) recommends instead using the 16th, 50th, and 84th percentiles of the empirical distributions of the X and W variables. This option is available through Hayes' Process macro, and I have included that option in the Do-file. Simple type 0 following the equals sign as in the case below.



Changing this option will not change the results in the two regression tables using centering or no centering. Where this change will show up is in the table containing the simple slopes tests and the plot of simple slopes. The conditioning value at the 16th percentile of the mean-centered efficacy variable is -4.34443. The conditioning value at the 50th percentile is -.0346882. The conditioning value at the 84th percentile is 4.489562.



Additionally, you will find that the conditional indirect effects will be based on the conditioning values using percentiles. 



References and suggested readings

Darlington, R.B, & Hayes, A.F. (2017). Regression analysis and linear models: Concepts, applications, and implementation. New York: The Guilford Press.


Hayes, A.F. (2015). An index and test of linear moderated mediation. Multivariate Behavioral Research, 50, 1-22. 

 

Hayes, A.F. (2018). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach (2nd edition). New York: The Guilford Press. 


Hayes, A.F. & Rockwood, N.J.(2020). Conditional process analysis: Concepts, computation, and advances in the modeling of the contingencies of mechanisms. American Behavioral Scientist, 64, 19-54. [Download from Sage site here: link]























Comments

Popular posts from this blog

Factor analysis of EBI items: Tutorial with RStudio and EFA.dimensions package

Process model 7 using Hayes Process macro with RStudio

Multilevel path analysis in lavaan using RStudio