Code
<- c(
packages 'tidyverse'
)
The dates and time data are often messy and inconsistent, making it challenging to analyze. The {lubridate} package provides a set of functions that make it easy to work with dates and times in R. It allows you to parse dates from various formats, extract components like year, month, day, hour, minute, and second, and perform calculations with dates and times. The package also provides functions for handling time zones and daylight saving time. It’s part of the tidyverse and provides functions to parse, extract, manipulate, and format dates/times. It is part of the {tidyverse}, designed to simplify parsing, manipulating, and wrangling dates/times in R. The package is particularly useful for data wrangling tasks, such as cleaning and transforming date/time data, extracting components (like year, month, day), and performing calculations with dates/times. It also provides functions for handling time zones and daylight saving time.
The {lubridate} package provides a variety of functions for working with dates and times. Here are some of the most commonly used functions, categorized by their purpose:
Category | Functions |
---|---|
Parsing | ymd() , mdy() , dmy() , parse_date_time() |
Extract Parts | year() , month() , day() , hour() , minute() , second() |
Manipulation | make_date() , make_datetime() , floor_date() , ceiling_date() |
Timezones | with_tz() , force_tz() |
#| warning: false
#| error: false
# Install missing packages
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
Successfully loaded packages:
[1] "package:lubridate" "package:forcats" "package:stringr"
[4] "package:dplyr" "package:purrr" "package:readr"
[7] "package:tidyr" "package:tibble" "package:ggplot2"
[10] "package:tidyverse" "package:stats" "package:graphics"
[13] "package:grDevices" "package:utils" "package:datasets"
[16] "package:methods" "package:base"
Let’s simulate a small dataset with messy date formats.
set.seed(123)
df <- tibble(
id = 1:10,
name = sample(c("Alice", "Bob", "Carol"), 10, replace = TRUE),
raw_date = sample(c("2025-04-10", "10/04/2025", "April 10, 2025"), 10, replace = TRUE),
timestamp = sample(seq(
as.POSIXct("2025-04-10 08:00"),
as.POSIXct("2025-04-10 18:00"),
by = "1 hour"
), 10, replace = TRUE)
)
print(df)
# A tibble: 10 × 4
id name raw_date timestamp
<int> <chr> <chr> <dttm>
1 1 Carol 10/04/2025 2025-04-10 16:00:00
2 2 Carol 10/04/2025 2025-04-10 10:00:00
3 3 Carol 2025-04-10 2025-04-10 15:00:00
4 4 Bob 10/04/2025 2025-04-10 17:00:00
5 5 Carol April 10, 2025 2025-04-10 14:00:00
6 6 Bob 2025-04-10 2025-04-10 17:00:00
7 7 Bob April 10, 2025 2025-04-10 16:00:00
8 8 Bob April 10, 2025 2025-04-10 10:00:00
9 9 Carol 2025-04-10 2025-04-10 11:00:00
10 10 Alice 2025-04-10 2025-04-10 08:00:00
We’ll clean the inconsistent raw_date
column and extract useful features.
df_clean <- df %>%
mutate(
parsed_date = lubridate::parse_date_time(raw_date, orders = c("ymd", "dmy", "B d, Y")),
year = lubridate::year(parsed_date),
month = lubridate::month(parsed_date, label = TRUE),
day = lubridate::day(parsed_date),
weekday = lubridate::wday(parsed_date, label = TRUE),
hour = lubridate::hour(timestamp),
minute = lubridate::minute(timestamp)
)
glimpse(df_clean)
Rows: 10
Columns: 11
$ id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
$ name <chr> "Carol", "Carol", "Carol", "Bob", "Carol", "Bob", "Bob", "…
$ raw_date <chr> "10/04/2025", "10/04/2025", "2025-04-10", "10/04/2025", "A…
$ timestamp <dttm> 2025-04-10 16:00:00, 2025-04-10 10:00:00, 2025-04-10 15:00…
$ parsed_date <dttm> 2025-04-10, 2025-04-10, 2025-04-10, 2025-04-10, 2025-04-1…
$ year <dbl> 2025, 2025, 2025, 2025, 2025, 2025, 2025, 2025, 2025, 2025
$ month <ord> Apr, Apr, Apr, Apr, Apr, Apr, Apr, Apr, Apr, Apr
$ day <int> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10
$ weekday <ord> Thu, Thu, Thu, Thu, Thu, Thu, Thu, Thu, Thu, Thu
$ hour <int> 16, 10, 15, 17, 14, 17, 16, 10, 11, 8
$ minute <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
lubridate
# A tibble: 6 × 12
id name raw_date timestamp parsed_date year month day
<int> <chr> <chr> <dttm> <dttm> <dbl> <ord> <int>
1 1 Carol 10/04/2… 2025-04-10 16:00:00 2025-04-10 00:00:00 2025 Apr 10
2 2 Carol 10/04/2… 2025-04-10 10:00:00 2025-04-10 00:00:00 2025 Apr 10
3 3 Carol 2025-04… 2025-04-10 15:00:00 2025-04-10 00:00:00 2025 Apr 10
4 4 Bob 10/04/2… 2025-04-10 17:00:00 2025-04-10 00:00:00 2025 Apr 10
5 5 Carol April 1… 2025-04-10 14:00:00 2025-04-10 00:00:00 2025 Apr 10
6 6 Bob 2025-04… 2025-04-10 17:00:00 2025-04-10 00:00:00 2025 Apr 10
# ℹ 4 more variables: weekday <ord>, hour <int>, minute <int>,
# date_reformatted <chr>
# A tibble: 6 × 12
id name raw_date timestamp parsed_date year month day
<int> <chr> <chr> <dttm> <dttm> <dbl> <ord> <int>
1 1 Carol 10/04/2… 2025-04-10 16:00:00 2025-04-10 00:00:00 2025 Apr 10
2 3 Carol 2025-04… 2025-04-10 15:00:00 2025-04-10 00:00:00 2025 Apr 10
3 4 Bob 10/04/2… 2025-04-10 17:00:00 2025-04-10 00:00:00 2025 Apr 10
4 5 Carol April 1… 2025-04-10 14:00:00 2025-04-10 00:00:00 2025 Apr 10
5 6 Bob 2025-04… 2025-04-10 17:00:00 2025-04-10 00:00:00 2025 Apr 10
6 7 Bob April 1… 2025-04-10 16:00:00 2025-04-10 00:00:00 2025 Apr 10
# ℹ 4 more variables: weekday <ord>, hour <int>, minute <int>,
# date_reformatted <chr>
[1] "2025-04-10 16:00:00 UTC" "2025-04-10 10:00:00 UTC"
[3] "2025-04-10 15:00:00 UTC" "2025-04-10 17:00:00 UTC"
[5] "2025-04-10 14:00:00 UTC" "2025-04-10 17:00:00 UTC"
[7] "2025-04-10 16:00:00 UTC" "2025-04-10 10:00:00 UTC"
[9] "2025-04-10 11:00:00 UTC" "2025-04-10 08:00:00 UTC"
In this tutorial, we covered the basics of using the {lubridate} package for date and time manipulation in R. We learned how to parse inconsistent date formats, extract components like year, month, and weekday, filter and group data by date and time, construct new datetime values, and handle timezones. By the end of this tutorial, you should be able to: