Unlocking the Power of purrr: How to Create Multiple Lags Like a Pro in R

A quick guide to creating multiple lags in R for Time Series Analysis with purrr

An ethereal representation of multiple Time Series

Are you tired of creating lag variables one by one? Are you ready to level up your time series analysis game? Forget everything you know about creating lag variables. There’s a better way, and it’s been right in front of you all along.

This is a good one. We’ll make use of the semi-unknown partial function to create a useful wrapper around the lag function. Let’s go straight to the point.

First, we create a new function called map_lag. This function is essentially a mapped version of the lag function from dplyr, where we pre-fill the n argument to create different lag functions. Then, we can apply this list of functions, each one representing a different lag length, to the desired variable.

And just like that, voila! We have multiple lag variables without breaking a sweat. To make things even better, we can change the names of our newly created lag variables on the fly to make them more meaningful.

calculate_lags <- function(df, var, lags){
  map_lag <- lags %>% map(~partial(lag, n = .x))
  return(df %>% mutate(across(.cols = {{var}}, .fns = map_lag, .names = "{.col}_lag{lags}")))
}

Let’s see a quick example. We’ll be using the closing prices of the TSLA stock to showcase its use. We have a data frame like this:

tsla %>% head(4)
## # A tibble: 4 × 6
##   date        open  high   low close    volume
##   <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>
## 1 2022-01-03  383.  400.  379.  400. 104686047
## 2 2022-01-04  397.  403.  374.  383. 100248258
## 3 2022-01-05  382.  390.  360.  363.  80119797
## 4 2022-01-06  359   363.  340.  355.  90336474

We simply pass the desired lags to the function, as well as the column we will apply the lags on. Note that we are also using tidyevaluation to reference the column without quotes. This way we keep the tidyverse vibe intact.

tsla %>% calculate_lags(close, 1:3) %>% head()
## # A tibble: 6 × 9
##   date        open  high   low close    volume close_lag1 close_lag2 close_lag3
##   <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>      <dbl>      <dbl>      <dbl>
## 1 2022-01-03  383.  400.  379.  400. 104686047        NA         NA         NA 
## 2 2022-01-04  397.  403.  374.  383. 100248258       400.        NA         NA 
## 3 2022-01-05  382.  390.  360.  363.  80119797       383.       400.        NA 
## 4 2022-01-06  359   363.  340.  355.  90336474       363.       383.       400.
## 5 2022-01-07  360.  360.  337.  342.  84164748       355.       363.       383.
## 6 2022-01-10  333.  353.  327.  353.  91814877       342.       355.       363.

It’s time to create your own lags like a pro. Embrace the power of purrr and partial and take your time series analysis to the next level. You will impress your colleagues with your advanced R skills and will have more time to focus on the real analysis.

Short and sweet!


Pablo Cánovas
Pablo Cánovas
Senior Data Scientist at Spotahome

Data Scientist, formerly physicist | Tidyverse believer, piping life | Hanging out at TypeThePipe

Related