Select datetime columns matching time zones with Python Polars selectors. Calculate offset with the new dst_offset.
Unlocking the Power of Python Polars: A Comprehensive Guide to Selecting and Manipulating Datetime Columns with Selectors and dst_offset.
Working with Polars datetime columns
Following our Python Polars series, today we are delving into the exciting world of datetime data wrangling. The journey has been captivating so far, as Polars’ syntax and powerful capabilities have truly revolutionized data wrangling in Python. Polars’ datetime handling capabilities have proven to be a game-changer for data wrangling in Python.
Polars’ datetime handling goes beyond the basics, offering seamless filtering, sorting, and grouping based on datetime columns. This allows you to unlock valuable insights from your time-series data effortlessly. Say goodbye to the days of wrestling with complex datetime operations manually, and join the Polars bandwagon to unleash the true potential of datetime handling in Python!
Now, let’s explore one of the latest additions to Polars (version 0.18.5): the .dt.dst_offset
function. This function allows you to retrieve the daylight savings offset at each point in your datetime data. This becomes particularly useful when working with data from multiple countries, as daylight savings may not be observed simultaneously across the globe. With .dt.dst_offset
, you can effortlessly account for these variations and ensure accurate analysis. It’s a powerful tool that adds an extra dimension to your datetime manipulations, allowing you to navigate the intricacies of global time changes with ease.
To put it all into practice, let’s consider a scenario where we have a dataset containing datetime information from various countries. Using Polars’ datetime functions, such as .dt.dst_offset
, we can calculate the daylight savings offset for each timestamp, enabling us to gain deeper insights into how time changes affect our data across different regions. This newfound capability empowers us to make more informed decisions and derive valuable conclusions from our time-series analysis.
Datetime Selectors
We start our exploration of datetime columns in Polars by directly selecting them using the Polars selectors: select(cs.datetime(time_zone=“*“)). This enables us to effortlessly extract and manipulate datetime information from our dataset.
What are Polars selectors? Selectors in Polars provide an intuitive way to select columns from DataFrame or LazyFrame objects based on various criteria like column name, data type, or other properties. They enhance and unify the functionality offered by the col() expression, making column selection a breeze. What’s more, selectors even allow you to broadcast expressions over the selected columns, further expanding their versatility. With selectors, you have a powerful tool at your disposal to streamline column selection and manipulation in Polars.
To make use of this feature, ensure that you have Polars version 0.18.1 or above. If you’re using Poetry or Pipenv for dependency management, upgrading your dependencies to the latest versions will ensure you have access to all the new functionalities and enhancements.
Polars pl.date_range
function
Let’s delve into the powerful date_range function provided by Polars. This function allows you to generate a sequence of dates within a specified range with incredible ease. By specifying the start and end dates, along with the desired frequency (such as daily, weekly, or monthly), you can quickly create a DataFrame that spans the entire range of dates.
The date_range function also provides flexibility in handling time zones. You can specify the time zone for the generated dates, ensuring consistency and accuracy in your time-related analyses across different region for a wide range of use cases, such as creating synthetic time-series data, filling gaps in existing data, or generating test datasets for time-related analyses.
Note that starting from previous versions of Polars, the date_range function now requires the eager=True
parameter to be specified in order to expand the range over rows. This ensures that the date range is not consolidated into a single row if that is not the desired behavior.
Polars convert_time_zone()
function
Datetime columns when any row contains time zone information, converts the column in TZ aware and ‘UTC’ as default, converting the existing timezones to UTC. If you want to maintain a custom tz for the column, you can use te schema / schema_override constructor arguments
Example: Select datetime columns matching their time zone and compute their offset with Polars Selectors
But let’s take it a step further and introduce an exciting new feature: generating row offsets based on datetime columns! First of all you would need a Polars version > 0.18.1. You should upgrade your poetry/pipenv dependencies.
And here you can find and example of how to select datetime columns and generate their row offset!
from datetime import date, datetime
import polars as pl
import polars.selectors as cs
import zoneinfo
import pytz
start = pl.datetime(2023, 9, 1)
stop = pl.datetime(2023, 12, 1)
df = pl.DataFrame(
{"dt": pl.date_range(start=start, end=stop, interval="1mo", time_zone="UTC", eager=True)},
schema={"dt": pl.Datetime("ns", "UTC")},
).with_columns(
[
pl.col("dt").dt.convert_time_zone("Europe/Berlin").suffix("_Berlin"),
pl.col("dt").dt.convert_time_zone("Australia/Sydney").suffix("_Sydney"),
]
)
Select all datetime columns with timezone
dt | dt_Berlin | dt_Sydney |
---|---|---|
datetime[ns, UTC] | datetime[ns, Europe/Berlin] | datetime[ns, Australia/Sydney] |
2023-09-01 00:00:00 UTC | 2023-09-01 02:00:00 CEST | 2023-09-01 10:00:00 AEST |
2023-10-01 00:00:00 UTC | 2023-10-01 02:00:00 CEST | 2023-10-01 11:00:00 AEDT |
2023-11-01 00:00:00 UTC | 2023-11-01 01:00:00 CET | 2023-11-01 11:00:00 AEDT |
2023-12-01 00:00:00 UTC | 2023-12-01 01:00:00 CET | 2023-12-01 11:00:00 AEDT |
Select the offsets that may differ by column but also by row as the daylight savings are not the same around the world.
dt | dt_Berlin | dt_Sydney |
---|---|---|
duration[ms] | duration[ms] | duration[ms] |
0ms | 1h | 0ms |
0ms | 1h | 1h |
0ms | 0ms | 1h |
0ms | 0ms | 1h |
Note how if any column metadata is provided, the timezone is infered from the first record, and the following rows are transported to that time zone.
Get ready for our next post, where we’ll unveil other datetime operations with Polars! From rolling windows to rounding dates, combining datetime values, and converting string columns seamlessly – we’ve got it all covered. Stay in the loop and discover the boundless possibilities of data manipulation with Polars in Python!