Filtering a data frame by condition on multiple columns

You could write the condition over each column, but I would like to see you dealing with 100+ features

Some times you need to filter a data frame applying the same condition over multiple columns. Obviously you could explicitly write the condition over every column, but that’s not very handy.

For those situations, it is much better to use filter_at in combination with all_vars.

Imagine we have the famous iris dataset with some attributes missing and want to get rid of those observations with any missing value.

We could write the condition on every column, but that would cumbersome:

Instead, we just have to select the columns we will filter on and apply the condition:

Here we have used the function all_vars in the predicate to explicit that every feature must satisfy the condition. To be honest, for that purpose it would have been easier to simply use iris %>% na.omit().

But what if we wanted the opposite? Keeping only the rows with all the selected features missing is as easy as changing the predicate part:

Another option is to apply the condition on any feature. That’s where any_vars comes handy. Here we keep only the observations with at least one missing feature:

Also, there are some other fancy ways to manipulate data frames with the filter family. One trick is using contains() or starts_with() to select the variables:

Another example is applying the condition on columns that satisfy certain condition with filter_if (notice the rowid fetaure here):

Pablo Cánovas
Pablo Cánovas
Data Scientist at Repsol

Data Scientist, formerly physicist | Tidyverse believer, piping life | Hanging out at TypeThePipe