This R package implements the dynamic panel data modeling framework described by Allison, Williams, and Moral-Benito (2017). This approach allows fitting models with fixed effects that do not assume strict exogeneity of predictors. That means you can simultaneously get the robustness to confounding offered by fixed effects models and account for reciprocal causation between the predictors and the outcome variable. The estimating approach from Allison et al. provides better finite sample performance in terms of both bias and efficiency than other popular methods (e.g., the Arellano-Bond estimator).

These models are fit using structural equation models, using maximum likelihood estimation and offering the missing data handling and flexibility afforded by SEM. This package will reshape your data, specify the model properly, and fit it with lavaan.

Note: This software is in development. It is best to cross-reference all results with xtdpdml for Stata. Go to https://www3.nd.edu/~rwilliam/dynamic/ to learn about xtdpdml and the underlying method. You may also be interested in the article by Paul Allison, Richard Williams, and Enrique Moral-Benito in Socius, accessible here.

Installation

You will need to the devtools package installed to install this package from Github as well as its companion package, panelr.

Also note this package’s other dependencies: lavaan, stringr, rlang, and crayon, in addition to the dependencies of panelr and the aforementioned packages.

Usage

This package assumes your data are in long format, with each row representing a single observation of a single participant. Contrast this with wide format in which each row contains all observations of a single participant. Better compatibility with wide data will be implemented in a future release, but in the meantime you may use the long_panel function implemented in panelr to reshape to long format in a relatively pain-free way.

First we load the package and the WageData from panelr.

This next line of code converts the data to class panel_data, which is a class specific to the panelr that helps to simplify the treatment of the long-form panel data. You don’t have to do this, but it saves you from providing id and wave arguments to the model fitting function each time you use it.

wages <- panel_data(WageData, id = id, wave = t)

Basic formula syntax

The formula syntax used in this package is meant to be as similar to a typical regression model as possible.

The most basic model can be specified like any other: y ~ x, where y is the dependent variable and x is a time-varying predictor. If you would like to include time-invariant predictors, you will make the formula consist of two parts, separated with a bar (|) like so: y ~ x | z where z is a time invariant predictor, like ethnicity.

One of the innovations of the method, however, is the notion of predetermined, or sequentially exogenous, predictors. To specify a model with a predetermined variable, put the variable within a pre function, y ~ pre(x1) + x2 | z. This tells the function that x1 is predetermined while x2 is strictly exogenous by assumption. You could have multiple predetermined predictors as well (e.g., y ~ pre(x1) + pre(x2) | z).

As implied by the “cross-lagged” terminology, you may also fit models with lagged predictors. Simply apply the lag function to the lagged predictors in the formula: y ~ pre(lag(x1)) + lag(x2) | z. To specify more than 1 lag, just provide it as an argument. For instance, y ~ pre(lag(x1, 2)) + lag(x2) | z will use 2 lags of the x1 variable.

Socius article example

This will replicate the analysis of the wages data in the Socius article that describes these models.

Note that to get matching standard errors, set information = "observed" to override lavaan’s default, information = "expected".

fit <- dpm(wks ~ pre(lag(union)) + lag(lwage) | ed, data = wages,
           err.inv = TRUE, information = "observed")
summary(fit)
MODEL INFO:
Dependent variable: wks 
Total observations: 595 
Complete observations: 595 
Time periods: 2 - 7 

MODEL FIT:
𝛘²(76) = 138.476
RMSEA = 0.037, 90% CI [0.027, 0.047]
p(RMSEA < .05) = 0.986
SRMR = 0.025 

                Est.  S.E. z val.     p    
union (t - 1) -1.206 0.522 -2.309 0.021   *
lwage (t - 1)  0.588 0.488  1.204 0.229    
ed            -0.107 0.056 -1.893 0.058   .
wks (t - 1)    0.188 0.020  9.586 0.000 ***

Model converged after 613 iterations

Any arguments supplied other than those that are documented within the dpm function are passed on to sem from the lavaan package.

Model specification options

The following arguments allow you to make changes to the default model specification:

  • y.lag: By default the lag 1 value of the DV is included as a predictor (this is why they are dynamic models). You may choose a different value or multiple values instead, though you may not choose 0 or NULL (i.e., you must have some lagged DV).
  • fixed.effects: By default, the model is specified as a fixed effects model. If you set this to FALSE, you get a random effects specification instead.
  • error.inv: This constrains error variances to be equal in each wave. It is FALSE by default.
  • const.inv: This constrains the constants to be equal in each wave. It is FALSE by default, but if TRUE it eliminates cross-sectional dependence.
  • y.free: This allows the regression coefficent of the lagged DV to vary across time. It is FALSE by default and you can either set it to TRUE or to the specific lag number(s).
  • alpha.free: If TRUE, relaxes the constraint that the fixed effects are equal across time. Default is FALSE to be consistent with how fixed effects models normally work.

Summary options

You have most of the options available to you via lavaan’s summary method.

You can choose to omit any of: the z statistics (zstat = FALSE), the standard errors (se = FALSE), or the p values (pvalue = FALSE). You may also add confidence intervals (ci = TRUE) at any specified level (ci.level = .95). If you used bootstrapping for uncertainty intervals, you can also specify the method (boot.ci.type = "perc").

The number of digits to print can be set via digits or with the option dpm-digits. You may also standardize coefficients via lavaan’s method using standardize = TRUE.

Other features

Lavaan syntax only

If you just want the lavaan model specification and don’t want this package to fit the model for you, you can set print.only = TRUE. To reduce the amount of output, I’m condensing wages to 4 waves here.

dpm(wks ~ pre(lag(union)) + lag(lwage) | ed, data = wages[wages$t < 5,],
    print.only = TRUE)
## Main regressions

wks_2 ~ en1 * union_1 + ex1 * lwage_1 + c1 * ed + p1 * wks_1
wks_3 ~ en1 * union_2 + ex1 * lwage_2 + c1 * ed + p1 * wks_2
wks_4 ~ en1 * union_3 + ex1 * lwage_3 + c1 * ed + p1 * wks_3

## Alpha latent variable (random intercept)

alpha =~ 1 * wks_2 + 1 * wks_3 + 1 * wks_4

## Alpha free to covary with observed variables (fixed effects)

alpha ~~  union_1 +  union_2 +  union_3 +  lwage_1 +  lwage_2 +  lwage_3 +  wks_1

## Correlating DV errors with future values of predetermined predictors

wks_2 ~~ union_3

## Predetermined predictors covariances

union_1 ~~ ed + lwage_1 + lwage_2 + lwage_3 + wks_1
union_2 ~~ ed + lwage_1 + lwage_2 + lwage_3 + union_1 + wks_1
union_3 ~~ ed + lwage_1 + lwage_2 + lwage_3 + union_1 + union_2 + wks_1

## Exogenous (time varying and invariant) predictors covariances

lwage_1 ~~ ed + wks_1
lwage_2 ~~ ed + lwage_1 + wks_1
lwage_3 ~~ ed + lwage_1 + lwage_2 + wks_1

ed ~~ wks_1

## DV error variance free to vary across waves

wks_2 ~~ wks_2
wks_3 ~~ wks_3
wks_4 ~~ wks_4

## Let DV variance vary across waves

wks_2 ~ 1
wks_3 ~ 1
wks_4 ~ 1

Extract components

Alternately, you can extract the lavaan model syntax and wide-formatted data from the fitted model object to do your own fitting and tweaking.

The model is a special type of lavaan object. This means most methods implemented for lavaan objects will work on these. You can also convert the fitted model into a typical lavaan object:

Get full lavaan summary

While you could convert the model to lavaan model and apply any of lavaan’s functions to it (and you should!), as a convenience you can use lav_summary() to get lavaan’s summary of the model.

Missing data

Take advantage of lavaan’s missing data handling by using the missing = "fiml" argument as well as any other arguments accepted by lavaan::sem().

Missing features/problems

  • CFI/TLI fit measures are much different than Stata’s and consistently more optimistic. For now, they are not printed with the summary because they are probably misleading.
  • You cannot use multiple lags of the same predictor (e.g., y ~ x + lag(x)). (Fixed in 1.0.0)
  • The function does not yet support input data that is already in wide format.
  • You cannot apply arbitrary functions to variables in the formula like you can with regression models. For instance, a specification like y ~ scale(x) will cause an error.

The following xtdpdml (Stata) options are not implemented:

  • xfree
  • yfree (added as y.free argument in 1.0.0)
  • re (added as fixed.effects argument in 1.0.0)
  • ylag (added as y.lag argument in 1.0.0)
  • std (but standardize argument of summary may suffice)

Roadmap

  • Get proper CFI/TLI statistics
  • Allow full use of formula syntax, e.g. y ~ scale(x)
  • Create a predict method and perhaps some ability to plot predictions
  • Add broom methods (tidy, glance)

Reference

Allison, P. D., Williams, R., & Moral-Benito, E. (2017). Maximum likelihood for cross-lagged panel models with fixed effects. Socius, 3, 1-17.