Tidylog

Do you find Tidyverse pipelines useful? But do you need some kind of logging inside the fancy pipes? Here’s Tidylog, logging your pipelines

Some time ago I made one of the best discoveries I have ever made in the Tidyverse: a tool called tidylog. This package is built on top of dplyr and tidyr and provides us with feedback on the results of the operations. Actually, this is a feature that already appeared in the Stata software.

When performing one operation at a time, it is easy to track the changes made on a table. However things get increasingly obscure when chaining multiple functions or dealing with big data frames.

We all love piping operations. I often ‘play’ to perform the whole transformation without leaving the pipeflow. But the counterpart is missing the intermediate states: you can make some big mistakes and be unaware of them until it’s too late and maybe you have to undone some work or rethink your analysis.

In this context, some additional info is always welcome. I think this feature is specially convenient for beginners, but not only! I have myself wasted several hours debugging long pipelines and trying to understand where the problems came from.

Let’s see a tiny bit of its behaviour with a simple example:

Pretty neat! It is specially useful with joins, as it provides plenty of details and they can be a source of duplicated or missing rows.

I decided to write this little post now to celebrate that tidylog v1.0.0 has recently been released! Check the official repo out to see more examples or show some love to @elbersb on Twitter!

All in all, I think this package was a missing piece in the Tidyverse ecosystem: It is incredibly useful, whereas making advantage of it is as simple as writing library(tidylog). Integrating this package into our daily R work is a no-brainer!

Pablo Cánovas
Pablo Cánovas
Senior Data Scientist at Spotahome

Data Scientist, formerly physicist | Tidyverse believer, piping life | Hanging out at TypeThePipe

Related