Multilevel path analysis in lavaan using RStudio
Overview
This presentation is designed to demonstrate how you can perform a multilevel path analysis using lavaan. The example is adapted from a presentation by Heck and Thomas (2015) in their text, An Introduction to Multilevel Modeling Techniques: MLM and SEM Approaches Using MPLUS. On pages 122-124, these authors modeled Level 1 and Level 2 predictors of employee satisfaction with benefits and work conditions (both of which were measured as outcomes at Level 1). The data are based on 12,445 employees who were clustered within 160 organizations.
The variables in the model and dataset, as described by Heck and Thomas (2015) include:
- benefit (satisfaction with benefits, measured at the employee level - i.e., level 1)
- cond (satisfaction with work conditions, measured at the employee level - i.e., level 1)
- female (coded 0=male, 1=female; Level 1 indicator variable)
- white (coded 0=nonwhite, 1=white; Level 1 indicator variable)
- zproduct (organizational productivity, measured at Level 2)
- org1 (organizational context variable, measured at Level 2)
- org2 (second organizational context variable, measured at Level 2)
Note: No description was provided in their discussion as to what exactly
what it means for an organization to score higher or lower on the
context variables. Also, my apologies for the archaic labeling of
variables (female & white).
Download a copy of the data, here. It is saved in a Google drive as ch4mv.csv. You will need to import the data into RStudio first prior to running the analyses below.
Youtube video walk-through can be found here: https://youtu.be/jaZtRgBJ_Tk
Let’s start by taking a brief look at the data structure and its
layout.
str(ch4mv)
## 'data.frame': 12445 obs. of 13 variables:
## $ orgid : int 1 1 1 1 1 1 1 1 1 1 ...
## $ female : int 1 0 1 0 0 0 0 1 1 1 ...
## $ white : int 0 1 1 0 1 1 0 0 1 1 ...
## $ satpay : int 11 10 11 12 13 14 16 6 12 9 ...
## $ morale : int 19 34 30 39 38 38 35 11 35 30 ...
## $ org1 : int 1 1 1 1 1 1 1 1 1 1 ...
## $ org2 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ benefit : num 3.8 7.6 7.6 7.6 7.6 5.7 3.8 1.9 7.6 5.7 ...
## $ cond : num 3.6 7.2 5.4 7.2 9 5.4 5.4 1.8 7.2 5.4 ...
## $ zresourc: num -1.32 -1.32 -1.32 -1.32 -1.32 -1.32 -1.32 -1.32 -1.32 -1.32 ...
## $ zproduct: num 1.82 1.82 1.82 1.82 1.82 1.82 1.82 1.82 1.82 1.82 ...
## $ lev1wt : num 1.2 1.46 1.2 1 1.46 1.46 1 1.2 2.07 1.2 ...
## $ lev2wt : num 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
head(ch4mv)
## orgid female white satpay morale org1 org2 benefit cond zresourc zproduct
## 1 1 1 0 11 19 1 0 3.8 3.6 -1.32 1.82
## 2 1 0 1 10 34 1 0 7.6 7.2 -1.32 1.82
## 3 1 1 1 11 30 1 0 7.6 5.4 -1.32 1.82
## 4 1 0 0 12 39 1 0 7.6 7.2 -1.32 1.82
## 5 1 0 1 13 38 1 0 7.6 9.0 -1.32 1.82
## 6 1 0 1 14 38 1 0 5.7 5.4 -1.32 1.82
## lev1wt lev2wt
## 1 1.20 1.1
## 2 1.46 1.1
## 3 1.20 1.1
## 4 1.00 1.1
## 5 1.46 1.1
## 6 1.46 1.1
Running the analysis in lavaan
Step 1: Load lavaan using the library() function.
library(lavaan)
Step 2: Create an object that contains your model specification. The model specification MUST begin and end with single quotation marks. Everything in between the single quotes below pertains to the model.
Unlike a single-level model using lavaan, our model specification is broken out into two levels. You MUST CORRECTLY type in the code as level: 1 and level: 2. [If you type level 1: and level 2: your analysis WILL NOT RUN!]
Use the same syntax you would use with a single-level model in lavaan to specify the paths in your model at each level. For additional details, see https://www.jstatsoft.org/index.php/jss/article/view/v048i02/594.
model1 <- '
level: 1
#In this portion, we are specifying the Level 1
#predictors of the outcomes (benefit and cond).
benefit ~ white + female
cond ~ white + female
#In Heck and Thomas (2015) model the authors
#specified the endogeous variables to covary
#at Level 1. This is reflected below using
#the ~~
benefit ~~ cond
level: 2
#Here, we are specifying the Level 2
#predictors of the outcomes. In this case,
#we are using group-level predictors of
#variation in random intercepts associated with
#our outcome variables. Notice that below I
#I have typed in the names of the Level 1
#outcomes (benefit and cond). Lavaan
#will automatically treat these variables as having
#randomly varying intercepts across Level 2 units.
benefit ~ org1 + org2 + zproduct
cond ~ org1 + org2 + zproduct
zproduct ~ org1 + org2
#Heck and Thomas also specified the Level 2
#intercepts for the endogenous variables to covary,
#hence the addition of the following code.
benefit ~~ cond
'
Step 3: Use the sem() function in lavaan to fit the model to the data. When you run your analysis, make sure to store your results in an object (here, I am naming it fit1)
The first argument for the function is the model (saved as an object in the previous step). The second argument below is the data argument that references the dataframe containing our data. The third argument below references the clustering variable which, in our case is orgid (the organizational identifier in the dataframe). Make sure this is in quotation marks.
fit1 <- sem(model = model1, data = ch4mv, cluster = "orgid")
Step 4: Use summary() function to call up your results. My preference is to request fit statistics, r-square values, and standardized coefficients by including the fit.measures, rsquare, and standardized arguments and setting them equal to TRUE.
summary(fit1, fit.measures=T, rsquare=T, standardized = T)
## lavaan 0.6.13 ended normally after 124 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 22
##
## Number of observations 12445
## Number of clusters [orgid] 160
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Model Test Baseline Model:
##
## Test statistic 9501.224
## Degrees of freedom 14
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -40521.827
## Loglikelihood unrestricted model (H1) -40521.828
##
## Akaike (AIC) 81087.655
## Bayesian (BIC) 81251.095
## Sample-size adjusted Bayesian (SABIC) 81181.181
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: RMSEA <= 0.050 NA
## P-value H_0: RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual (corr metric):
##
## SRMR (within covariance matrix) 0.000
## SRMR (between covariance matrix) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
##
## Level 1 [within]:
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## benefit ~
## white 0.224 0.027 8.312 0.000 0.224 0.076
## female 0.697 0.026 27.137 0.000 0.697 0.237
## cond ~
## white 0.248 0.026 9.417 0.000 0.248 0.085
## female 0.707 0.025 28.097 0.000 0.707 0.245
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit ~~
## .cond 1.344 0.022 62.048 0.000 1.344 0.676
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit 0.000 0.000 0.000
## .cond 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit 2.029 0.026 78.339 0.000 2.029 0.938
## .cond 1.948 0.025 78.363 0.000 1.948 0.933
##
## R-Square:
## Estimate
## benefit 0.062
## cond 0.067
##
##
## Level 2 [orgid]:
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## benefit ~
## org1 0.287 0.080 3.612 0.000 0.287 0.241
## org2 0.036 0.085 0.418 0.676 0.036 0.028
## zproduct 0.311 0.031 9.898 0.000 0.311 0.651
## cond ~
## org1 0.315 0.069 4.583 0.000 0.315 0.296
## org2 0.025 0.073 0.346 0.729 0.025 0.022
## zproduct 0.291 0.027 10.692 0.000 0.291 0.679
## zproduct ~
## org1 -0.112 0.201 -0.558 0.577 -0.112 -0.045
## org2 -0.483 0.217 -2.226 0.026 -0.483 -0.180
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit ~~
## .cond 0.095 0.015 6.365 0.000 0.095 0.859
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit 4.412 0.046 96.474 0.000 4.412 8.789
## .cond 4.677 0.040 116.123 0.000 4.677 10.404
## .zproduct 0.119 0.107 1.109 0.268 0.119 0.113
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit 0.133 0.019 6.979 0.000 0.133 0.526
## .cond 0.093 0.014 6.607 0.000 0.093 0.458
## .zproduct 1.068 0.119 8.946 0.000 1.068 0.970
##
## R-Square:
## Estimate
## benefit 0.474
## cond 0.542
## zproduct 0.030
The chi-square statistic and degrees of freedom for the model are both 0, indicating that the model is saturated. As such, we can only really evaluate fit of the model by examining the path coefficients and associated tests. At Level 1, employee gender identity (‘female’) and ethnic/racial identity (‘white’) emerged as positive and significant predictors of their satisfaction with benefits and working conditions. The predictors accounted for roughly 6.2% and 6.7% of the variation within organizations. At Level 2, organizational context 1 was a positive and significant predictor of organization-level satisfaction with benefits and work conditions. It did not significantly predict organizational productivity. Organizational context 2 was a negative and significant predictor of organizational productivity, but did not predict organization-level satisfaction with benefits or conditions. Organizational productivity emerged as a positive and significant predictor of both organizational satisfaction with benefits and conditions. At Level 2, approximately 47.4% and 54.2% of the variation in satisfaction with benefits and conditions, respectively, was accounted for by the organizational context and productivity variables. Approximately 3% of the variation in organizational productivity was accounted for by the organizational context variables.
# If you would like to obtain the intraclass
# correlations for your level 1 variables, you
# can use the lavInspect function. The first argument
# refers to the fit object and the second requests
# the icc's. Make sure icc is in quotes!
lavInspect(fit1, "icc")
## benefit cond white female
## 0.104 0.088 0.000 0.000
Testing indirect effects at Level 2
Computing and testing indirect effects at Level 1 or Level 2 using lavaan relies on the same method one would use if you were only testing a single-level model.
- Assign labels to the paths involved in the tracing from a predictor variable to an outcome variable in the form of <path label>*<predictor name>. The labels I assign below are: a1, a2, b1, and b2.
- Construct estimands for indirect effects using := symbol. [It is a colon followed immediately by an equal sign] Use the following form: <name of estimand> := <expression for computing estimand>. In the code below, I am assigning names of IE1 and IE2 to two separate estimands. The first estimand (IE1) is computed as the product of paths a1 and b1 - i.e., org1 -> zproduct -> benefit. The second estimand (IE2) is computed as the product of paths a2 and b2 - i.e., org2 -> zproduct -> cond. And so on…
model1v2 <- '
level: 1
benefit ~ white + female
cond ~ white + female
benefit ~~ cond
level: 2
benefit ~ org1 + org2 + b1*zproduct
cond ~ org1 + org2 + b2*zproduct
zproduct ~ a1*org1 + a2*org2
benefit ~~ cond
# Level 2 indirect effects
IE1 := a1*b1 #org1 -> zproduct ->benefit
IE2 := a2*b2 #org2 -> zproduct ->cond
IE3 := a1*b2 #org1 -> zproduct -> cond
IE4 := a2*b1 #org2 -> zproduct ->benefit
'
fit1v2 <- sem(model = model1v2, data = ch4mv, cluster = "orgid")
summary(fit1v2, fit.measures=T, rsquare=T, standardized = T)
## lavaan 0.6.13 ended normally after 124 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 22
##
## Number of observations 12445
## Number of clusters [orgid] 160
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Model Test Baseline Model:
##
## Test statistic 9501.224
## Degrees of freedom 14
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -40521.827
## Loglikelihood unrestricted model (H1) -40521.828
##
## Akaike (AIC) 81087.655
## Bayesian (BIC) 81251.095
## Sample-size adjusted Bayesian (SABIC) 81181.181
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: RMSEA <= 0.050 NA
## P-value H_0: RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual (corr metric):
##
## SRMR (within covariance matrix) 0.000
## SRMR (between covariance matrix) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
##
## Level 1 [within]:
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## benefit ~
## white 0.224 0.027 8.312 0.000 0.224 0.076
## female 0.697 0.026 27.137 0.000 0.697 0.237
## cond ~
## white 0.248 0.026 9.417 0.000 0.248 0.085
## female 0.707 0.025 28.097 0.000 0.707 0.245
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit ~~
## .cond 1.344 0.022 62.048 0.000 1.344 0.676
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit 0.000 0.000 0.000
## .cond 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit 2.029 0.026 78.339 0.000 2.029 0.938
## .cond 1.948 0.025 78.363 0.000 1.948 0.933
##
## R-Square:
## Estimate
## benefit 0.062
## cond 0.067
##
##
## Level 2 [orgid]:
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## benefit ~
## org1 0.287 0.080 3.612 0.000 0.287 0.241
## org2 0.036 0.085 0.418 0.676 0.036 0.028
## zproduct (b1) 0.311 0.031 9.898 0.000 0.311 0.651
## cond ~
## org1 0.315 0.069 4.583 0.000 0.315 0.296
## org2 0.025 0.073 0.346 0.729 0.025 0.022
## zproduct (b2) 0.291 0.027 10.692 0.000 0.291 0.679
## zproduct ~
## org1 (a1) -0.112 0.201 -0.558 0.577 -0.112 -0.045
## org2 (a2) -0.483 0.217 -2.226 0.026 -0.483 -0.180
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit ~~
## .cond 0.095 0.015 6.365 0.000 0.095 0.859
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit 4.412 0.046 96.474 0.000 4.412 8.789
## .cond 4.677 0.040 116.123 0.000 4.677 10.404
## .zproduct 0.119 0.107 1.109 0.268 0.119 0.113
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit 0.133 0.019 6.979 0.000 0.133 0.526
## .cond 0.093 0.014 6.607 0.000 0.093 0.458
## .zproduct 1.068 0.119 8.946 0.000 1.068 0.970
##
## R-Square:
## Estimate
## benefit 0.474
## cond 0.542
## zproduct 0.030
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## IE1 -0.035 0.063 -0.557 0.578 -0.035 -0.029
## IE2 -0.141 0.065 -2.179 0.029 -0.141 -0.122
## IE3 -0.033 0.059 -0.557 0.578 -0.033 -0.031
## IE4 -0.150 0.069 -2.172 0.030 -0.150 -0.117
The computed indirect effects and associated tests are found at the end of the output under the heading Defined parameters. It turns out that IE2 (org2 -> zproduct ->cond) and IE4 (org2 -> zproduct ->benefit) were the only statistically significant indirect effects for our model.
Re-fitting our model using MLR
Technically, in Heck and Thomas’ (2015) analysis they used the MLR estimator [Maximum likelihood with robust standard errors & a scaled chi-square statistic] when carrying out their analysis using MPLUS (pp. 122-124). We can replicate their results exactly (or very nearly so) by adding the following argument by explicitly adding the estimator argument and setting it equal to “MLR” (make sure to INCLUDE THE QUOTES) when using the sem() function.
Note: The MLR estimator is described at https://lavaan.ugent.be/tutorial/est.html.fit1v3 <- sem(model = model1v2, data = ch4mv, cluster = "orgid", estimator = "MLR")
summary(fit1v3, fit.measures=T, rsquare=T, standardized = T)
## lavaan 0.6.13 ended normally after 124 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 22
##
## Number of observations 12445
## Number of clusters [orgid] 160
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 0.000 0.000
## Degrees of freedom 0 0
##
## Model Test Baseline Model:
##
## Test statistic 9501.224 9262.437
## Degrees of freedom 14 14
## P-value 0.000 0.000
## Scaling correction factor 1.026
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000 1.000
## Tucker-Lewis Index (TLI) 1.000 1.000
##
## Robust Comparative Fit Index (CFI) NA
## Robust Tucker-Lewis Index (TLI) NA
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -40521.827 -40521.827
## Loglikelihood unrestricted model (H1) -40521.828 -40521.828
##
## Akaike (AIC) 81087.655 81087.655
## Bayesian (BIC) 81251.095 81251.095
## Sample-size adjusted Bayesian (SABIC) 81181.181 81181.181
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000 NA
## 90 Percent confidence interval - lower 0.000 NA
## 90 Percent confidence interval - upper 0.000 NA
## P-value H_0: RMSEA <= 0.050 NA NA
## P-value H_0: RMSEA >= 0.080 NA NA
##
## Robust RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: Robust RMSEA <= 0.050 NA
## P-value H_0: Robust RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual (corr metric):
##
## SRMR (within covariance matrix) 0.000 0.000
## SRMR (between covariance matrix) 0.000 0.000
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
##
## Level 1 [within]:
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## benefit ~
## white 0.224 0.029 7.763 0.000 0.224 0.076
## female 0.697 0.028 24.947 0.000 0.697 0.237
## cond ~
## white 0.248 0.030 8.385 0.000 0.248 0.085
## female 0.707 0.031 22.734 0.000 0.707 0.245
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit ~~
## .cond 1.344 0.035 38.290 0.000 1.344 0.676
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit 0.000 0.000 0.000
## .cond 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit 2.029 0.039 52.560 0.000 2.029 0.938
## .cond 1.948 0.039 49.580 0.000 1.948 0.933
##
## R-Square:
## Estimate
## benefit 0.062
## cond 0.067
##
##
## Level 2 [orgid]:
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## benefit ~
## org1 0.287 0.079 3.637 0.000 0.287 0.241
## org2 0.036 0.083 0.427 0.669 0.036 0.028
## zproduct (b1) 0.311 0.033 9.578 0.000 0.311 0.651
## cond ~
## org1 0.315 0.063 5.011 0.000 0.315 0.296
## org2 0.025 0.076 0.335 0.738 0.025 0.022
## zproduct (b2) 0.291 0.029 9.939 0.000 0.291 0.679
## zproduct ~
## org1 (a1) -0.112 0.220 -0.509 0.611 -0.112 -0.045
## org2 (a2) -0.483 0.200 -2.415 0.016 -0.483 -0.180
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit ~~
## .cond 0.095 0.016 6.083 0.000 0.095 0.859
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit 4.412 0.053 82.662 0.000 4.412 8.789
## .cond 4.677 0.049 95.290 0.000 4.677 10.404
## .zproduct 0.119 0.104 1.145 0.252 0.119 0.113
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .benefit 0.133 0.020 6.642 0.000 0.133 0.526
## .cond 0.093 0.015 6.248 0.000 0.093 0.458
## .zproduct 1.068 0.108 9.858 0.000 1.068 0.970
##
## R-Square:
## Estimate
## benefit 0.474
## cond 0.542
## zproduct 0.030
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## IE1 -0.035 0.069 -0.505 0.614 -0.035 -0.029
## IE2 -0.141 0.059 -2.377 0.017 -0.141 -0.122
## IE3 -0.033 0.065 -0.506 0.613 -0.033 -0.031
## IE4 -0.150 0.063 -2.379 0.017 -0.150 -0.117
In our output, we not only obtain the ‘Standard’ output we had before. Additional versions of the fit statistics are provided, many of which appear in the ‘Scaled’ column.
References
Heck, R. H., & Thomas, S. L. (2015). An introduction to multilevel modeling techniques: MLM and SEM approaches using MPLUS. Routledge: New York.
Comments
Post a Comment