Data Wrangling with {lubricate}

The dates and time data are often messy and inconsistent, making it challenging to analyze. The {lubridate} package provides a set of functions that make it easy to work with dates and times in R. It allows you to parse dates from various formats, extract components like year, month, day, hour, minute, and second, and perform calculations with dates and times. The package also provides functions for handling time zones and daylight saving time. It’s part of the tidyverse and provides functions to parse, extract, manipulate, and format dates/times. It is part of the {tidyverse}, designed to simplify parsing, manipulating, and wrangling dates/times in R. The package is particularly useful for data wrangling tasks, such as cleaning and transforming date/time data, extracting components (like year, month, day), and performing calculations with dates/times. It also provides functions for handling time zones and daylight saving time.

{lubridate} Function Reference

The {lubridate} package provides a variety of functions for working with dates and times. Here are some of the most commonly used functions, categorized by their purpose:

Category	Functions
Parsing	`ymd()`, `mdy()`, `dmy()`, `parse_date_time()`
Extract Parts	`year()`, `month()`, `day()`, `hour()`, `minute()`, `second()`
Manipulation	`make_date()`, `make_datetime()`, `floor_date()`, `ceiling_date()`
Timezones	`with_tz()`, `force_tz()`

Check and Install Requir

Code

packages <- c(
          'tidyverse'
          )

#| warning: false
#| error: false

# Install missing packages
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

Verify Installation

Code

# Verify installation
cat("Installed packages:\n")

Installed packages:

Code

print(sapply(packages, requireNamespace, quietly = TRUE))

tidyverse 
     TRUE

Load Packages

Code

# Load packages with suppressed messages
invisible(lapply(packages, function(pkg) {
  suppressPackageStartupMessages(library(pkg, character.only = TRUE))
}))

Check Loaded Packages

Code

# Check loaded packages
cat("Successfully loaded packages:\n")

Successfully loaded packages:

Code

print(search()[grepl("package:", search())])

 [1] "package:lubridate" "package:forcats"   "package:stringr"  
 [4] "package:dplyr"     "package:purrr"     "package:readr"    
 [7] "package:tidyr"     "package:tibble"    "package:ggplot2"  
[10] "package:tidyverse" "package:stats"     "package:graphics" 
[13] "package:grDevices" "package:utils"     "package:datasets" 
[16] "package:methods"   "package:base"

Data

Let’s simulate a small dataset with messy date formats.

Code

set.seed(123)
df <- tibble(
  id = 1:10,
  name = sample(c("Alice", "Bob", "Carol"), 10, replace = TRUE),
  raw_date = sample(c("2025-04-10", "10/04/2025", "April 10, 2025"), 10, replace = TRUE),
  timestamp = sample(seq(
    as.POSIXct("2025-04-10 08:00"),
    as.POSIXct("2025-04-10 18:00"),
    by = "1 hour"
  ), 10, replace = TRUE)
)

print(df)

# A tibble: 10 × 4
      id name  raw_date       timestamp          
   <int> <chr> <chr>          <dttm>             
 1     1 Carol 10/04/2025     2025-04-10 16:00:00
 2     2 Carol 10/04/2025     2025-04-10 10:00:00
 3     3 Carol 2025-04-10     2025-04-10 15:00:00
 4     4 Bob   10/04/2025     2025-04-10 17:00:00
 5     5 Carol April 10, 2025 2025-04-10 14:00:00
 6     6 Bob   2025-04-10     2025-04-10 17:00:00
 7     7 Bob   April 10, 2025 2025-04-10 16:00:00
 8     8 Bob   April 10, 2025 2025-04-10 10:00:00
 9     9 Carol 2025-04-10     2025-04-10 11:00:00
10    10 Alice 2025-04-10     2025-04-10 08:00:00

Parse Dates and Times

We’ll clean the inconsistent raw_date column and extract useful features.

Code

df_clean <- df %>%
  mutate(
    parsed_date = lubridate::parse_date_time(raw_date, orders = c("ymd", "dmy", "B d, Y")),
    year = lubridate::year(parsed_date),
    month = lubridate::month(parsed_date, label = TRUE),
    day = lubridate::day(parsed_date),
    weekday = lubridate::wday(parsed_date, label = TRUE),
    hour = lubridate::hour(timestamp),
    minute = lubridate::minute(timestamp)
  )

glimpse(df_clean)

Rows: 10
Columns: 11
$ id          <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
$ name        <chr> "Carol", "Carol", "Carol", "Bob", "Carol", "Bob", "Bob", "…
$ raw_date    <chr> "10/04/2025", "10/04/2025", "2025-04-10", "10/04/2025", "A…
$ timestamp   <dttm> 2025-04-10 16:00:00, 2025-04-10 10:00:00, 2025-04-10 15:00…
$ parsed_date <dttm> 2025-04-10, 2025-04-10, 2025-04-10, 2025-04-10, 2025-04-1…
$ year        <dbl> 2025, 2025, 2025, 2025, 2025, 2025, 2025, 2025, 2025, 2025
$ month       <ord> Apr, Apr, Apr, Apr, Apr, Apr, Apr, Apr, Apr, Apr
$ day         <int> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10
$ weekday     <ord> Thu, Thu, Thu, Thu, Thu, Thu, Thu, Thu, Thu, Thu
$ hour        <int> 16, 10, 15, 17, 14, 17, 16, 10, 11, 8
$ minute      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

Common Wrangling Tasks with `lubridate`

Reformat Dates

Code

df_clean <- df_clean %>%
  mutate(date_reformatted = format(parsed_date, "%d-%b-%Y"))
head(df_clean)

# A tibble: 6 × 12
     id name  raw_date timestamp           parsed_date          year month   day
  <int> <chr> <chr>    <dttm>              <dttm>              <dbl> <ord> <int>
1     1 Carol 10/04/2… 2025-04-10 16:00:00 2025-04-10 00:00:00  2025 Apr      10
2     2 Carol 10/04/2… 2025-04-10 10:00:00 2025-04-10 00:00:00  2025 Apr      10
3     3 Carol 2025-04… 2025-04-10 15:00:00 2025-04-10 00:00:00  2025 Apr      10
4     4 Bob   10/04/2… 2025-04-10 17:00:00 2025-04-10 00:00:00  2025 Apr      10
5     5 Carol April 1… 2025-04-10 14:00:00 2025-04-10 00:00:00  2025 Apr      10
6     6 Bob   2025-04… 2025-04-10 17:00:00 2025-04-10 00:00:00  2025 Apr      10
# ℹ 4 more variables: weekday <ord>, hour <int>, minute <int>,
#   date_reformatted <chr>

Filter Data for Specific Times

Code

# Filter rows where timestamp is after 12 PM
df_clean %>% filter(hour > 12)

# A tibble: 6 × 12
     id name  raw_date timestamp           parsed_date          year month   day
  <int> <chr> <chr>    <dttm>              <dttm>              <dbl> <ord> <int>
1     1 Carol 10/04/2… 2025-04-10 16:00:00 2025-04-10 00:00:00  2025 Apr      10
2     3 Carol 2025-04… 2025-04-10 15:00:00 2025-04-10 00:00:00  2025 Apr      10
3     4 Bob   10/04/2… 2025-04-10 17:00:00 2025-04-10 00:00:00  2025 Apr      10
4     5 Carol April 1… 2025-04-10 14:00:00 2025-04-10 00:00:00  2025 Apr      10
5     6 Bob   2025-04… 2025-04-10 17:00:00 2025-04-10 00:00:00  2025 Apr      10
6     7 Bob   April 1… 2025-04-10 16:00:00 2025-04-10 00:00:00  2025 Apr      10
# ℹ 4 more variables: weekday <ord>, hour <int>, minute <int>,
#   date_reformatted <chr>

Grouping by Weekday

Code

df_clean %>%
  group_by(weekday) %>%
  summarise(entries = n())

# A tibble: 1 × 2
  weekday entries
  <ord>     <int>
1 Thu          10

Construct New Datetime from Parts

Code

df_clean <- df_clean %>%
  mutate(full_datetime = make_datetime(year, month(parsed_date), day, hour = hour, min = minute))

print(df_clean$full_datetime)

 [1] "2025-04-10 16:00:00 UTC" "2025-04-10 10:00:00 UTC"
 [3] "2025-04-10 15:00:00 UTC" "2025-04-10 17:00:00 UTC"
 [5] "2025-04-10 14:00:00 UTC" "2025-04-10 17:00:00 UTC"
 [7] "2025-04-10 16:00:00 UTC" "2025-04-10 10:00:00 UTC"
 [9] "2025-04-10 11:00:00 UTC" "2025-04-10 08:00:00 UTC"

Change Timezones

Code

df_clean <- df_clean %>%
  mutate(
    full_datetime_utc = with_tz(full_datetime, tzone = "UTC"),
    full_datetime_tokyo = with_tz(full_datetime, tzone = "Asia/Tokyo")
  )

Summary and Conlusions

In this tutorial, we covered the basics of using the {lubridate} package for date and time manipulation in R. We learned how to parse inconsistent date formats, extract components like year, month, and weekday, filter and group data by date and time, construct new datetime values, and handle timezones. By the end of this tutorial, you should be able to:

Parse inconsistent date formats
Extract year, month, weekday, hour, minute, etc.
Filter/group by date and time
Construct and manipulate datetime values
Handle timezones and rounding

References

lubridate documentation

Data Wrangling with {lubricate}

{lubridate} Function Reference

Check and Install Requir

Verify Installation

Load Packages

Check Loaded Packages

Data

Parse Dates and Times

Common Wrangling Tasks with lubridate

Reformat Dates

Filter Data for Specific Times

Grouping by Weekday

Construct New Datetime from Parts

Change Timezones

Summary and Conlusions

References

Common Wrangling Tasks with `lubridate`