Simple and parallel mediation using Process macro in RStudio

What is mediation analysis?

According to Hayes (2022), ‘Mediation analysis is a statistical method used to evaluate evidence from studies designed to test hypotheses about how some causal antecedent variable X transmits its effect on a consequent variable Y’ (p. 80). The proposed mediator(s) [M(s), or intervening variable(s)] in a model reflects a researcher’s conceptualization of ‘how’ X and Y are related, with one or more mediators serving as potential process(es) by which X exerts its effect on Y. Specification of mediation models should made (and defended) on theoretical grounds, as potential alternatives have the possibility of fitting the data as well or better than one’s primary hypothesized model. Ultimately, mediation analysis allows one to test whether one’s theoretical model is a plausible explanation for the relationship between X and Y. If the model exhibits poor fit to the data, this may cause the researcher to ‘rule out’ that model as a plausible explanation. Finding a model exhibits adequate or good fit to the data during mediation analysis typically leads to non-rejection of the model. However, with any non-rejected model, it should be regarded as provisional (see ‘A word of caution’ below).

Simple mediation model

The conceptual model below represents the specified effects within a simple mediation model. In this model, X is considered the antecedent variable and Y is the consequent. Variable M is the proposed mediator.

X is specified as causally antecedent to M (path a) and Y (path c’), while M is specified as causally antecedent to Y (path b) only. Paths a, b, and c-prime (c’) are referred to as direct effects within the model.
The total effect of X on Y can be broken down into two separate effects: a direct effect (path c’) and an indirect effect. The indirect effect of X on Y is quantified as the product of paths a and b – i.e., ab.
To estimate the a, b, and c’ paths [with a and b going into computing the indirect effect (ab)], we can use OLS regression. In fact, this is the estimation approach utilized in the Process macro. Path a is estimated using simple linear regression, where M is regressed onto X. Paths b and c’ are estimated using a multiple regression, where X and M serve as predictors of Y. Below, we have the two components of the statistical model.

Parallel mediation model

The previous conceptual model can be extended with the inclusion of additional mediators – in parallel. The model above is an example of a model where the effect of X on Y is proposed to be mediated through two intervening variables: M1 and M2. The direct effect of X on M1 is quantified with the coefficient $a_{1}$ , while the direct effect of M1 on Y is quantified as $b_{1}$ . The direct effect of X on M2 is quantified with the coefficient $a_{2}$ , while the direct effect of M2 on Y is quantified as $b_{2}$ . Computing the product of $a_{1}$ and $b_{1}$ results in the quantification of a specific indirect effect of X on Y [occurring through the tracing involving paths $a_{1}$ and $b_{1}$ ]. The product of paths $a_{2}$ and $b_{2}$ quantifies a second specific indirect effect [occurring through the tracing involving paths $a_{2}$ and $b_{2}$ ] in the model. The sum of the two specific indirect effects produces the total indirect effect of X on Y: i.e., $a_{1} b_{1} + a_{2} b_{2}$ . The remaining direct effect of X on Y is quantified with path c’. Summing all direct and indirect effects gives us the total effect of X on Y in the model.

A word of caution

As noted earlier, a mediation model represents one’s conceptualization of the causal relationships among a set of variables. Keep in mind that if you find your model exhibits reasonable fit to the data in your study, this DOES NOT PROVE that your causal formulation is correct in any absolute sense. A good-fitting model is simply an indication that your model is consistent with your data, thereby allowing the adoption of that model as a provisional explanation of the relations among your variables. In other words, model fit allows one to say that the model being tested is a reasonable candidate for explaining associations among a set of variables. It DOES NOT rule out other possible model specifications (alternative explanations) that may fit the data as well or better than yours. This is why it is so important to ground your model in substantive and well-reasoned theory.

Example 1: Simple mediation model

For our example, we will test a model where the effect task orientation on intention to continue sport is proposed to be mediated by autonomous orientation. This model represents a portion of the model tested by Keshtidar and Behzadnia (2017) with this data. With this model, it was expected that individuals scoring higher on task orientation would exhibit greater levels of autonomous motivation and demonstrate a greater intention to continue on with their sport. It was also expected that individuals scoring higher on autonomous motivation would report greater intention to continue on with their sport.

OLS regressions used to estimate parameters in full model…

Running the example 1 model and generating the output

First, import your data into RStudio. You can download a copy of the data for the first (and all susbsequent examples in this blog) by going to https://drive.google.com/file/d/1NM6VkNCb1sQ8F8llT39gFdz9gTKnAWvX/view.

library(haven)
sport <- read_sav("sport.sav")

Next, you MUST activate the Process macro first. When it has been activated, you will see the following message:

## 
## ********************* PROCESS for R Version 4.3.1 ********************* 
##  
##            Written by Andrew F. Hayes, Ph.D.  www.afhayes.com              
##    Documentation available in Hayes (2022). www.guilford.com/p/hayes3   
##  
## *********************************************************************** 
##  
## PROCESS is now ready for use.
## Copyright 2020-2023 by Andrew F. Hayes ALL RIGHTS RESERVED
## Workshop schedule at http://haskayne.ucalgary.ca/CCRAM
##

Finally, use the process() function and its associated arguments to specify the model and request various outputs. Here is a description of the arguments in the code below:

data: Name of the dataframe containing your data
y: The consequent (outcome) for your model. Make sure it is in quotation marks.
x: The antecedent for your model. Make sure it is in quotation marks.
m: The name of the mediator in the model. Make sure it is in quotes. If you have multiple mediators, use the c() function [shown later in Example 3 on parallel mediators.]
model: Specify the model number for simple mediation (or parallel mediation, later) as 4.
seed: This is a number that - if you were to run the same analysis again in the future on the same data - would produce the same bootstrap confidence intervals.
stand: Set this equal to 1 to obtain standardized regression slopes in the output, as well as standardized indirect effects. If you do not want standardized effects in your output, set this equal to 0 or leave the argument out altogether. Note: Only mediation models will produce standardized estimates if requested.
total: Set this equal to 1 to obtain the total effects model. If you do not want the total effects model in the output, set this equal to 0 or leave it out altogether.
progress: You can set this equal to 0 if you do not want the bootstrapping progress to show up in your output. If you want it to show up, then set this equal to 1 or leave the argument out altogether.

process(data=sport, y="contin", x="task", m="auton", model=4, seed=12345, 
        stand=1, total=1, progress=0)

Interpreting the output

The first part of the output contains a synopsis of the Process model that was run, sample size, the variables included in the analysis, and the random seed.

The second part of the output contains the results based on the regression of the mediator, autonomous motivation, onto the antecedent, task orientation. Both the unstandardized and standardized regression parameters are provided.
The unstandardized direct effect of task orientation on autonomous motivation was positive and statistically significant (b=1.5144, s.e.=.2026, p<.001). The regression slope indicates that two cases differing by one unit on task-orientation are expected to differ by 1.5144 on autonomous motivation, with the case with the higher level of task motivation having greater autonomous motivation. The standardized path coefficient is .4160. This value indicates that two cases differing by one standard deviation on task orientation are expected to differ by .416 standard deviations on autonomous motivation, with the case with the higher level of task motivation having greater autonomous motivation.

The next part of the regression output contains the regression of intention to continue sport onto the mediator (autonomous motivation) and the antecedent (task orientation). As a set, task orientation and autonomous motivation jointly predicted intention to continue sport, $R^{2}$ =.1408, F(2,266)=21.7955, p<.001. The unstandardized direct effect of autonomous motivation on intention to continue was positive and significant (b=.1335, s.e.=.0243, p<.001). The regression slope for autonomous motivation indicates that two cases differing by one unit on autonomous motivation, but having the same value on task orientation, are predicted to differ by .1335 on intention to continue sport.
The direct effect of task orientation on intention to continue was positive but not significant (b=.0943, s.e.=.0886, p=.2881). The unstandardized slope indicates two cases having the same score on autonomous motivation but differing by one unit on task orientation are predicted to differ by .0943.
The standardized path coefficients for task orientation and autonomous motivation were .0665 and .3426, respectively. The standardized coefficient for autonomous motivation indicates that two cases with the same score on task orientation but differing by one standard deviation on autonomous motivation are expected to differ by .3426 standard deviations on intention to continue. The standardized slope for task orientation indicates two cases with the same score on autonomous motivation but differing by one standard deviation on task orientation are expected to differ by .0655 standard deviations on intention to continue.

The output for the total effect model shown below is based on the regression of Y (intention to continue) onto X (task orientation) without controlling for the mediator (M=autonomous motivation).

According to Hayes (2022), researchers using Baron and Kenny’s (1986) causal steps approach when testing mediation typically start with this model [total effects model] to determine first whether there is an effect of X on Y to be mediated. Their assumption is that if X is unrelated to Y, then there is no need to proceed further in testing for mediation.
Hayes (2022) provides a thorough discussion [and refutation of] the causal steps strategy in detail on pages 119-126 (which I won’t go over here). Included in this discussion (p. 123) is what he describes as the ‘flawed logic’ of assuming X must be significantly related to Y for mediation to exist. In my opinion, the reasoning is sound. Thus, although we requested the output shown here, it is not particularly germane to the testing of mediation.
Interpretation: Task orientation was a significant and positive predictor (b=.2965, s.e.=.0849, p=.0006) of intention to continue.

The next part of our output contains the estimates and tests for direct, indirect, and total effects. The indirect effects are of focal interest when we are testing for mediation.
First, I draw your attention to the unstandardized indirect effect and percentile bootstrap confidence interval. This is found in the section, Indirect effect(s) of X on Y. The indirect effect of X on Y is computed as the product of the unstandardized path coefficients: 1.5144.1335=.2021. The unstandardized indirect effect here can be interpreted as follows: Each raw score unit increase on task orientation is associated with a .2021 raw score increase on intention to continue sport, as mediated by autonomous motivation.

The statistical significance of the unstandardized indirect effect is tested using a bootstrap confidence interval [here, set at 95%]. If 0 (the null) falls between the lower and upper bound of the confidence interval, then we maintain the null and conclude the indirect effect is not significant. If 0 falls outside the lower and upper bound of the interval, then we conclude the indirect effect is significant [making it consistent with the possibility of mediation].
The bootstrap 95% confidence interval from our analysis ranges from .1179 to .2986. Since 0 does not fall between the lower and upper bound, we conclude the indirect effect is significantly different from 0.

The completely standardized indirect effect is found in the section Completely standardized indirect effect(s) of X on Y. The value in our output (.1426) is computed as the product of the standardized path coefficients for the direct effect of task orientation on autonomous motivation (.4160) and the direct effect of autonomous motivation on intention to continue (.3426): .4160.3427=.1426. The bootstrap 95% confidence interval for the standardized indirect effect ranges from .0866 to .2063. Since 0 does not fall between the lower and upper bounds, the standardized indirect effect is deemed statistically significant. The standardized indirect effect is interpreted here as follows: Each standard deviation increase on task orientation is associated with an increase of .1426 standard deviations on intention to continue sport, as mediated by autonomous motivation.

The last part of the output (see below) contains additional details and errors from the model.

Example 2: Simple mediation with control variables included

Now we will re-run our analysis after adding control variables into our model. [The model below was NOT part of the original article and was simply created to demonstrate the inclusion of covariates in a mediation model using Process.]

Running the example 2 model and generating our output

Let’s briefly go over the arguments associated with the code provided below for running the analysis:

data: Name of the dataframe containing your data
y: The consequent (outcome) for your model. Make sure it is in quotation marks.
x: The antecedent for your model. Make sure it is in quotation marks.
cov: The names of any covariates in your model. If you have multiple covariates (as we do here in this analysis), use the c() function [see code snippet below]. Make sure all names are in quotation marks.
m: The name of the mediator in the model. Make sure it is in quotes. If you have multiple mediators, use the c() function [shown later in Example 3 on parallel mediators.]
model: Specify the model number for simple mediation (or parallel mediation, later) as 4.
seed: This is a number that - if you were to run the same analysis again in the future on the same data - would produce the same bootstrap confidence intervals.
stand: Set this equal to 1 to obtain standardized regression slopes in the output, as well as standardized indirect effects. If you do not want standardized effects in your output, set this equal to 0 or leave the argument out altogether. Note: Only mediation models will produce standardized estimates if requested.
total: Set this equal to 1 to obtain the total effects model. If you do not want the total effects model in the output, set this equal to 0 or leave it out altogether.
progress: You can set this equal to 0 if you do not want the bootstrapping progress to show up in your output. If you want it to show up, then set this equal to 1 or leave the argument out altogether.

process(data=sport, y="contin", x="task", m="auton", cov=c("ego","control"),
        model=4, seed=12345, stand=1, total=1, progress=0)

Interpreting the output

Notice in the first block of information about the model, the names of the covariates are included.

The first regression output is based on the regression of autonomous motivation onto task orientation (X) and the two covariates, ego orientation and controlled motivation. As a set, task orientation, ego orientation, and controlled motivation jointly accounted for statistically significant variation in autonomous motivation, $R^{2}$ =.2010, F(3,265)=22.2236, p<.001. The unstandardized direct effect of task orientation on autonomous motivation was positive and significant (b=1.3697, s.e.=.2133, p<.001). Neither the direct effect of ego orientation nor the direct effect of controlled motivation on autonomous motivation was statistically significant at the conventional .05 level.

The second set of regression output is based on the regression of intention to continue onto the mediator (autonomous motivation), the antecedent (task orientation) and the two covariates in the model. Altogether the predictors accounted for statistically significant variation in intention to continue, $R^{2}$ =.1587, F(4,264)=12.4498, p<.001. The unstandardized direct effect of autonomous orientation on intention to continue was positive and statistically significant (b=.1236, s.e.=.0246, p<.001). Of the remaining predictors, the control variable - ego orientation - was a significant predictor (b=.1575, s.e.=.0781, p=.0449).

The total effect model in this case includes the covariates that were added to the full mediation model.

The remaining output contains the unstandardized direct and indirect effects and percentile bootstrap confidence intervals.

The unstandardized indirect effect of task orientation on intention to continue sport was computed as the product of the a=1.3697 and b=.1236 paths [the path coefficients are from the unstandardized regression results]: i.e., 1.3697.1236 = .1693.
The bootstrap 95% confidence interval ranged from .0896 to .2630. Since 0 (the null) does not fall between the lower and upper bound, we conclude the unstandardized indirect effect is statistically significant.
The standardized indirect effect of task orientation on intention to continue sport was computed as the product of the a=.3763 and b=.3173 paths [the path coefficients are from the unstandardized regression results]: i.e., .3763.3173 = .1194.The percentile bootstrap confidence interval ranged from .0644 to .1851. Since 0 (the null) does not fall between the lower and upper bound, we conclude the standardized indirect effect is statistically significant.

Example 3: Parallel mediation

For this example, we will test a parallel mediation model where the effect task orientation on intention to continue sport is proposed to be mediated by both autonomous motivation and controlled motivation. This model represents a portion of the model tested by Keshtidar and Behzadnia (2017) with this data. With this model, it was expected that individuals scoring higher on task orientation would exhibit greater levels of autonomous motivation, lower levels of controlled motivation, and greater intention to continue on with their sport. It was also expected that individuals scoring higher on autonomous motivation would report greater intention to continue on with their sport. Individuals scoring higher on controlled motivation were expected to exhibit lower motivation to continue with their sport.

Running the example 3 model and generating the output

Without going back through all the previous arguments, I will note here that we are using the m argument along with the c() function to list the covariates included in our model. The covariates must be in quotation marks.

process(data=sport, y="contin", x="task", m=c("auton","control"),
        model=4, seed=12345, stand=1, total=1, progress=0)

## 
## ********************* PROCESS for R Version 4.3.1 ********************* 
##  
##            Written by Andrew F. Hayes, Ph.D.  www.afhayes.com              
##    Documentation available in Hayes (2022). www.guilford.com/p/hayes3   
##  
## *********************************************************************** 
##                
## Model : 4      
##     Y : contin 
##     X : task   
##    M1 : auton  
##    M2 : control
## 
## Sample size: 269
## 
## Custom seed: 12345
## 
## 
## *********************************************************************** 
## Outcome Variable: auton
## 
## Model Summary: 
##           R      R-sq       MSE         F       df1       df2         p
##      0.4160    0.1731    4.4861   55.8895    1.0000  267.0000    0.0000
## 
## Model: 
##              coeff        se         t         p      LLCI      ULCI
## constant   12.9811    0.8266   15.7050    0.0000   11.3537   14.6086
## task        1.5144    0.2026    7.4759    0.0000    1.1156    1.9132
## 
## Standardized coefficients:
##          coeff
## task    0.4160
## 
## *********************************************************************** 
## Outcome Variable: control
## 
## Model Summary: 
##           R      R-sq       MSE         F       df1       df2         p
##      0.0306    0.0009   20.4579    0.2508    1.0000  267.0000    0.6169
## 
## Model: 
##              coeff        se         t         p      LLCI      ULCI
## constant   10.8629    1.7651    6.1542    0.0000    7.3876   14.3382
## task        0.2166    0.4326    0.5008    0.6169   -0.6351    1.0684
## 
## Standardized coefficients:
##          coeff
## task    0.0306
## 
## *********************************************************************** 
## Outcome Variable: contin
## 
## Model Summary: 
##           R      R-sq       MSE         F       df1       df2         p
##      0.3818    0.1458    0.7085   15.0713    3.0000  265.0000    0.0000
## 
## Model: 
##              coeff        se         t         p      LLCI      ULCI
## constant    1.7045    0.4621    3.6888    0.0003    0.7947    2.6143
## task        0.0982    0.0886    1.1085    0.2687   -0.0762    0.2726
## auton       0.1289    0.0246    5.2387    0.0000    0.0804    0.1773
## control     0.0143    0.0115    1.2391    0.2164   -0.0084    0.0370
## 
## Standardized coefficients:
##             coeff
## task       0.0693
## auton      0.3309
## control    0.0713
## 
## ************************ TOTAL EFFECT MODEL *************************** 
## Outcome Variable: contin
## 
## Model Summary: 
##           R      R-sq       MSE         F       df1       df2         p
##      0.2091    0.0437    0.7872   12.2060    1.0000  267.0000    0.0006
## 
## Model: 
##              coeff        se         t         p      LLCI      ULCI
## constant    3.5326    0.3462   10.2026    0.0000    2.8509    4.2143
## task        0.2965    0.0849    3.4937    0.0006    0.1294    0.4635
## 
## Standardized coefficients:
##          coeff
## task    0.2091
## 
## *********************************************************************** 
## Bootstrapping in progress. Please wait.
## 
## ************ TOTAL, DIRECT, AND INDIRECT EFFECTS OF X ON Y ************
## 
## Total effect of X on Y:
##      effect        se         t         p      LLCI      ULCI      c_cs
##      0.2965    0.0849    3.4937    0.0006    0.1294    0.4635    0.2091
## 
## Direct effect of X on Y:
##      effect        se         t         p      LLCI      ULCI     c'_cs
##      0.0982    0.0886    1.1085    0.2687   -0.0762    0.2726    0.0693
## 
## Indirect effect(s) of X on Y:
##            Effect    BootSE  BootLLCI  BootULCI
## TOTAL      0.1983    0.0469    0.1129    0.2964
## auton      0.1952    0.0454    0.1118    0.2892
## control    0.0031    0.0084   -0.0111    0.0240
## 
## Completely standardized indirect effect(s) of X on Y:
##            Effect    BootSE  BootLLCI  BootULCI
## TOTAL      0.1398    0.0313    0.0820    0.2058
## auton      0.1377    0.0306    0.0820    0.2019
## control    0.0022    0.0057   -0.0078    0.0164
## 
## ******************** ANALYSIS NOTES AND ERRORS ************************ 
## 
## Level of confidence for all confidence intervals in output: 95
## 
## Number of bootstraps for percentile bootstrap confidence intervals: 5000

Interpreting the output

The first part of the output contains the model description. Notice both mediators are referenced.

The first regression output is based on the regression of autonomous motivation (“M1”) onto task orientation. The direct effect of task orientation on autonomous motivation was positive and significant (b=1.5144, s.e.=.2026, p<.001), with task orientation accounting for 17.31% of the variation in autonomous motivation.

The next regression output contains the regression of controlled motivation (“M2”) onto task orientation. The direct effect of task orientation on controlled motivation was positive but non-significant (b=2166, s.e.=.4326, p=.5008), with task orientation accounting for .09% of the variation in controlled motivation.

The final regression output is the regression of intention to continue sport onto task orientation (the antecedent) and autonomous and controlled motivation (the proposed mediators). As a set, task orientation, controlled motivation, and autonomous motivation accounted for significant variation in intention to continue sport, $R^{2}$ =.1458, F(3,265)=15.0713, p<.001. The direct effect of autonomous motivation was positive and significant (b=.1289, s.e.=.0246, p<.001). The direct effects for task orientation and controlled motivation were both positive, but non-significant.

Total effect model…

The last part of our output contains the tests of direct, indirect, and total effects. Since more than a single mediator is proposed, the indirect effects in this part of the output are specific indirect effects. The unstandardized specific indirect effect of task orientation on intention to continue via autonomous motivation is computed as 1.5144 * .1289 = .1952. The bootstrap 95% confidence interval ranges from .1118 to .2892. Since 0 does not fall between the lower and upper bound of the interval, this specific indirect effect is significant.
The specific indirect effect of task orientation on intention to continue via controlled motivation is computed as .2166 * .0143 = .0031. The bootstrap 95% confidence interval ranges from -.0111 to .0240. Since 0 DOES fall between the lower and upper bound of the interval, this specific indirect effect is non-significant.

The total effect of X on Y is equal to the sum of all direct and indirect effects of task orientation on intention to continue sport:
Direct effect = .0982 First specific indirect effect = .1952 Second specific indirect effect = .0031
Total effect = .0982 + .1952 + .0031 = .2965
This effect is the same as what you would get if you simply regressed Y onto X.
Completely standardized specific indirect effects and confidence intervals:
Task-orientation->Autonomous Motivation->Intention: .4160.3309 = .1377
Task-orientation->Controlled Motivation->Intention: .0306.0712 = .0022

References

Hayes, A. F. (2022). Introduction to mediation, moderation, and conditional process analysis (3rd Ed.). New York: The Guilford Press.
Keshtidar M., Behzadnia, B. (2017). Prediction of intention to continue sport in athlete students: A self-determination theory approach. PLoS ONE, 12(2): e0171673. doi:10.1371/journal. pone.0171673

Search This Blog

Mike's Quant Hub