Data Wrangling with {lubricate}

The dates and time data are often messy and inconsistent, making it challenging to analyze. The {lubridate} package provides a set of functions that make it easy to work with dates and times in R. It allows you to parse dates from various formats, extract components like year, month, day, hour, minute, and second, and perform calculations with dates and times. The package also provides functions for handling time zones and daylight saving time. It’s part of the tidyverse and provides functions to parse, extract, manipulate, and format dates/times. It is part of the {tidyverse}, designed to simplify parsing, manipulating, and wrangling dates/times in R. The package is particularly useful for data wrangling tasks, such as cleaning and transforming date/time data, extracting components (like year, month, day), and performing calculations with dates/times. It also provides functions for handling time zones and daylight saving time.

{lubridate} Function Reference

The {lubridate} package provides a variety of functions for working with dates and times. Here are some of the most commonly used functions, categorized by their purpose:

Category Functions
Parsing ymd(), mdy(), dmy(), parse_date_time()
Extract Parts year(), month(), day(), hour(), minute(), second()
Manipulation make_date(), make_datetime(), floor_date(), ceiling_date()
Timezones with_tz(), force_tz()

Check and Install Requir

Code
packages <- c(
          'tidyverse'
          ) 
#| warning: false
#| error: false

# Install missing packages
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

Verify Installation

Code
# Verify installation
cat("Installed packages:\n")
Installed packages:
Code
print(sapply(packages, requireNamespace, quietly = TRUE))
tidyverse 
     TRUE 

Load Packages

Code
# Load packages with suppressed messages
invisible(lapply(packages, function(pkg) {
  suppressPackageStartupMessages(library(pkg, character.only = TRUE))
}))

Check Loaded Packages

Code
# Check loaded packages
cat("Successfully loaded packages:\n")
Successfully loaded packages:
Code
print(search()[grepl("package:", search())])
 [1] "package:lubridate" "package:forcats"   "package:stringr"  
 [4] "package:dplyr"     "package:purrr"     "package:readr"    
 [7] "package:tidyr"     "package:tibble"    "package:ggplot2"  
[10] "package:tidyverse" "package:stats"     "package:graphics" 
[13] "package:grDevices" "package:utils"     "package:datasets" 
[16] "package:methods"   "package:base"     

Data

Let’s simulate a small dataset with messy date formats.

Code
set.seed(123)
df <- tibble(
  id = 1:10,
  name = sample(c("Alice", "Bob", "Carol"), 10, replace = TRUE),
  raw_date = sample(c("2025-04-10", "10/04/2025", "April 10, 2025"), 10, replace = TRUE),
  timestamp = sample(seq(
    as.POSIXct("2025-04-10 08:00"),
    as.POSIXct("2025-04-10 18:00"),
    by = "1 hour"
  ), 10, replace = TRUE)
)

print(df)
# A tibble: 10 × 4
      id name  raw_date       timestamp          
   <int> <chr> <chr>          <dttm>             
 1     1 Carol 10/04/2025     2025-04-10 16:00:00
 2     2 Carol 10/04/2025     2025-04-10 10:00:00
 3     3 Carol 2025-04-10     2025-04-10 15:00:00
 4     4 Bob   10/04/2025     2025-04-10 17:00:00
 5     5 Carol April 10, 2025 2025-04-10 14:00:00
 6     6 Bob   2025-04-10     2025-04-10 17:00:00
 7     7 Bob   April 10, 2025 2025-04-10 16:00:00
 8     8 Bob   April 10, 2025 2025-04-10 10:00:00
 9     9 Carol 2025-04-10     2025-04-10 11:00:00
10    10 Alice 2025-04-10     2025-04-10 08:00:00

Parse Dates and Times

We’ll clean the inconsistent raw_date column and extract useful features.

Code
df_clean <- df %>%
  mutate(
    parsed_date = lubridate::parse_date_time(raw_date, orders = c("ymd", "dmy", "B d, Y")),
    year = lubridate::year(parsed_date),
    month = lubridate::month(parsed_date, label = TRUE),
    day = lubridate::day(parsed_date),
    weekday = lubridate::wday(parsed_date, label = TRUE),
    hour = lubridate::hour(timestamp),
    minute = lubridate::minute(timestamp)
  )

glimpse(df_clean)
Rows: 10
Columns: 11
$ id          <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
$ name        <chr> "Carol", "Carol", "Carol", "Bob", "Carol", "Bob", "Bob", "…
$ raw_date    <chr> "10/04/2025", "10/04/2025", "2025-04-10", "10/04/2025", "A…
$ timestamp   <dttm> 2025-04-10 16:00:00, 2025-04-10 10:00:00, 2025-04-10 15:00…
$ parsed_date <dttm> 2025-04-10, 2025-04-10, 2025-04-10, 2025-04-10, 2025-04-1…
$ year        <dbl> 2025, 2025, 2025, 2025, 2025, 2025, 2025, 2025, 2025, 2025
$ month       <ord> Apr, Apr, Apr, Apr, Apr, Apr, Apr, Apr, Apr, Apr
$ day         <int> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10
$ weekday     <ord> Thu, Thu, Thu, Thu, Thu, Thu, Thu, Thu, Thu, Thu
$ hour        <int> 16, 10, 15, 17, 14, 17, 16, 10, 11, 8
$ minute      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

Common Wrangling Tasks with lubridate

Reformat Dates

Code
df_clean <- df_clean %>%
  mutate(date_reformatted = format(parsed_date, "%d-%b-%Y"))
head(df_clean)
# A tibble: 6 × 12
     id name  raw_date timestamp           parsed_date          year month   day
  <int> <chr> <chr>    <dttm>              <dttm>              <dbl> <ord> <int>
1     1 Carol 10/04/2… 2025-04-10 16:00:00 2025-04-10 00:00:00  2025 Apr      10
2     2 Carol 10/04/2… 2025-04-10 10:00:00 2025-04-10 00:00:00  2025 Apr      10
3     3 Carol 2025-04… 2025-04-10 15:00:00 2025-04-10 00:00:00  2025 Apr      10
4     4 Bob   10/04/2… 2025-04-10 17:00:00 2025-04-10 00:00:00  2025 Apr      10
5     5 Carol April 1… 2025-04-10 14:00:00 2025-04-10 00:00:00  2025 Apr      10
6     6 Bob   2025-04… 2025-04-10 17:00:00 2025-04-10 00:00:00  2025 Apr      10
# ℹ 4 more variables: weekday <ord>, hour <int>, minute <int>,
#   date_reformatted <chr>

Filter Data for Specific Times

Code
# Filter rows where timestamp is after 12 PM
df_clean %>% filter(hour > 12)
# A tibble: 6 × 12
     id name  raw_date timestamp           parsed_date          year month   day
  <int> <chr> <chr>    <dttm>              <dttm>              <dbl> <ord> <int>
1     1 Carol 10/04/2… 2025-04-10 16:00:00 2025-04-10 00:00:00  2025 Apr      10
2     3 Carol 2025-04… 2025-04-10 15:00:00 2025-04-10 00:00:00  2025 Apr      10
3     4 Bob   10/04/2… 2025-04-10 17:00:00 2025-04-10 00:00:00  2025 Apr      10
4     5 Carol April 1… 2025-04-10 14:00:00 2025-04-10 00:00:00  2025 Apr      10
5     6 Bob   2025-04… 2025-04-10 17:00:00 2025-04-10 00:00:00  2025 Apr      10
6     7 Bob   April 1… 2025-04-10 16:00:00 2025-04-10 00:00:00  2025 Apr      10
# ℹ 4 more variables: weekday <ord>, hour <int>, minute <int>,
#   date_reformatted <chr>

Grouping by Weekday

Code
df_clean %>%
  group_by(weekday) %>%
  summarise(entries = n())
# A tibble: 1 × 2
  weekday entries
  <ord>     <int>
1 Thu          10

Construct New Datetime from Parts

Code
df_clean <- df_clean %>%
  mutate(full_datetime = make_datetime(year, month(parsed_date), day, hour = hour, min = minute))

print(df_clean$full_datetime)
 [1] "2025-04-10 16:00:00 UTC" "2025-04-10 10:00:00 UTC"
 [3] "2025-04-10 15:00:00 UTC" "2025-04-10 17:00:00 UTC"
 [5] "2025-04-10 14:00:00 UTC" "2025-04-10 17:00:00 UTC"
 [7] "2025-04-10 16:00:00 UTC" "2025-04-10 10:00:00 UTC"
 [9] "2025-04-10 11:00:00 UTC" "2025-04-10 08:00:00 UTC"

Change Timezones

Code
df_clean <- df_clean %>%
  mutate(
    full_datetime_utc = with_tz(full_datetime, tzone = "UTC"),
    full_datetime_tokyo = with_tz(full_datetime, tzone = "Asia/Tokyo")
  )

Summary and Conlusions

In this tutorial, we covered the basics of using the {lubridate} package for date and time manipulation in R. We learned how to parse inconsistent date formats, extract components like year, month, and weekday, filter and group data by date and time, construct new datetime values, and handle timezones. By the end of this tutorial, you should be able to:

  • Parse inconsistent date formats
  • Extract year, month, weekday, hour, minute, etc.
  • Filter/group by date and time
  • Construct and manipulate datetime values
  • Handle timezones and rounding

References