This lab will cover select functions in the lubridate package that are useful in working with dates and times:

# install.packages("lubridate")
library(lubridate)

Preamble

Dates

Suppose we wanted to calculate the number of days that have elapsed between Dec 12, 2019 and today. We could start by soliciting today’s date using the function Sys.Date()

# Please don't get me started on the inconsistent capitalization
today <- Sys.Date()
today
## [1] "2023-10-06"

At first glance, this appears to be a character string. But if we were to try and subtract off another character string, we run into error

today - "2019-12-12"
## Error in unclass(as.Date(e1)) - e2: non-numeric argument to binary operator

As it turns out, dates, and by dates we mean specifically the combination of month, day, and year, have a special Date class in R. And though it has the appearance of a string (and can be manipulated with stringr functions), it’s underlying representation is that of a numeric; and more precisely, a numeric representing the number of days since January 1, 1970. If we want to be able to subtract dates, we must do with with the as.Date() constructor, which takes as its argument a character vector.

# Is is a Date class
class(today)
## [1] "Date"
# But underlying is a numeric (double)
typeof(today)
## [1] "double"
# We can explicitly cast it is numeric, revealing its underlying form
as.numeric(today)
## [1] 19636
# Subtract dates
today - as.Date("2019-12-12")
## Time difference of 1394 days
# Matches the underlying numeric
as.numeric(today) - as.numeric(as.Date("2019-12-12"))
## [1] 1394

Times

Differentiating itself from Date is another concept of time that includes hours, minutes, and seconds known as POSIX (Portable Operating System Interface). And because hours are specific to a particular timezone, this is typically included as well:

# Current time and date
now <- Sys.time()

# Includes date, time, timezone
now
## [1] "2023-10-06 12:52:35 CDT"
# Class POSIXct
class(now)
## [1] "POSIXct" "POSIXt"
# Also a numeric, gives number of seconds to Jan 1, 1970
as.numeric(now)
## [1] 1696614755

Inconveniently, times and dates do not place nicely with one another

# Why give a warning when this should clearly be error?
today - now
## Warning: Incompatible methods ("-.Date", "-.POSIXt") for "-"
## [1] "-4643150-05-02"

Even worse, there is no nice way to create a POSIX object. Because there are so many different ways to represent time, R demands that you tell it explicitly how your character string is formatted if it’s not already exactly what it expects.

# This is apparently fine
as.POSIXct("2017-05-24 08:45")
## [1] "2017-05-24 08:45:00 CDT"
# This is apparently not fine
as.POSIXct("05/24/2017 08:45")
## Error in as.POSIXlt.character(x, tz, ...): character string is not in a standard unambiguous format
# How cumbersome is this?
as.POSIXct("05/24/2017 08:45", format = "%m/%d/%Y %H:%M", tz = "America/Chicago") 
## [1] "2017-05-24 08:45:00 CDT"

lubridate

There are a two things specifically we hope to have demonstrated in this preamble that will be relevant for the lab that follows:

  1. The Date class in R includes month, day, and year, it looks like a character vector, but under the hood it is a number indicating how many days from Jan 1, 1970
  2. The POSIXct class in R includes everything from Date, as well as hours and minutes and sometimes seconds. It also requires a timezone. Similar to Date, it is stored as a numeric, but now as the number of seconds since Jan 1, 1970

This lab will focus on using the lubridate package, significantly reducing the burden associated with handling dates in R.

Lab

Date Components

As you might expect, lubridate comes with a suite of functions intended to extract out the constituent parts of a date.

Component Function
Year year()
Month month()
Day day()
Hour hour()
Minute minute()
Second second()

Here, we consider how these functions operate on a POSIXct type variable. They will work on Date variables as well, though as Date does not include hour, minute, or second, these will be returned as zero

# This is POSIXct
(now <- Sys.time())
## [1] "2023-10-06 12:52:35 CDT"
year(now)
## [1] 2023
day(now)
## [1] 6
month(now)
## [1] 10
hour(now)
## [1] 12
## Returned as zero as there are no seconds
minute(today) 
## [1] 0

\(~\)

Formatting

As we mentioned previously, one the biggest challenges working with dates or times is the multiplicity of formats that are frequently used. For example, the date “September 1, 1939” may get recorded as any number of the following:

  • September 1st, 1939
  • Sept 1, 1939
  • 1939 Sep 1
  • 9/1/1939
  • 9/1/39

lubridate provides a collection of functions that help standardize these formats into a common representation. The functions themselves are mdy, ymd, and mdy – in other words, it asks you to indicate the order in which they are stored and the functions handle the rest.

## Stored in mdy format
mdy("September 1st, 1939")
mdy("9/1/39") # <- We do have to be careful sometimes
mdy("9/1/1939")
## [1] "1939-09-01"
## [1] "2039-09-01"
## [1] "1939-09-01"
# Stored as year month day
ymd("1962 February 7")
ymd("1962/2/7")
## [1] "1962-02-07"
## [1] "1962-02-07"
# You can sometimes get a little crazy
dmy("30th of May 2019")
## [1] "2019-05-30"

These functions also extend to include hours minutes and seconds

mdy_hm("May 12, 2017 4:45pm", tz = "America/Chicago")
## [1] "2017-05-12 16:45:00 CDT"
mdy_hms("05-12-2017 16:45:00", tz = "America/Chicago")
## [1] "2017-05-12 16:45:00 CDT"

An important thing to keep in mind, however, is that this lubridate is not perfect – often without rhyme or reason some things will work while others do not:

# Sept works, 1st works, but Sept 1st does not
mdy("Sept 11th, 2001")
mdy("Oct 1st, 2001")
mdy("Sept 1st, 2001")
## Warning: All formats failed to parse. No formats found.
## [1] "2001-11-20"
## [1] "2001-10-01"
## [1] NA

Question 1: On January 27th 1967 at 6:31 PM, the Apollo 1 spacecraft, planned to be the first manned mission of the Apollo space program, experienced a cabin fire on the landing pad in Cape Kennedy Air Force Station, Florida during a launch simulation, killing all three crew members on board. Nearly 19 years later, on January 28, 1986 at 11:39 AM, the Challenger Shuttle exploded just off the coast of Cape Canaveral, Florida. Rounding each date to the nearest day, determine how many days passed between these two events.

apollo <- "1986 Jan 27th at 6:31:19 PM UTC"
challenger <- "28 January 1967, 1139am"

\(~\)

Common Calculations

The lubridate package also contains a handful of functions to help perform common calculations:

Function Output
yday() day of the year (number from 1-365)
wday() day of week (number from 1-7 or factor label when label=TRUE is used)
floor_date() rounds the date downward
ceiling_date() rounds the date upward
round_date() rounds the date upward/downward (whichever is closer)

A few examples demonstrating these functions are given below:

today <- Sys.Date()

## Day of year
yday(today)
## [1] 279
## Day of week
wday(today)
## [1] 6
# label = TRUE creates an ordered factor
wday(today, label = TRUE)
## [1] Fri
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
## Rounding
floor_date(today, unit = "month") # down to nearest month
## [1] "2023-10-01"
ceiling_date(today, unit = "month") # up to nearest month
## [1] "2023-11-01"
round_date(today, unit = "month") # to whichever is closest
## [1] "2023-10-01"

Question 2: Create a date/time object for 9:15pm in Los Angeles on February 14, 2020. Then, round this date to the nearest day, then determine which day of the week that day was.

Times without dates

Sometimes you’ll encounter data consisting of times without an attached date. These might be times within a day such as “01:30:00” or 1:30 AM, or duration of time such as 1 hour, 30 minutes, and 0 seconds.

The lubridate package provides a simple storage class for times without dates that can be applied using the hms() function.

## Example
(time <- hms("01:10:00"))
## [1] "1H 10M 0S"
#
60*hour(time) + minute(time)
## [1] 70

Because these objects are stored at 00:00:00 we can perform arithmetic on them directly:

hms("01:10:00") - hms("01:05:00")
## [1] "5M 0S"

We can also exploit this fact to easily convert results to seconds using pipelines:

(hms("01:10:00") - hms("01:05:00")) %>% seconds()
## [1] "300S"

\(~\)

Practice (required)

Question 3: The 2015 Boston Marathon took place on April 20th, 2015. It was the 119th running of one of the world’s most well-known races. The data below contain information, results, and splits for each finisher of the marathon:

marathon <- read.csv("https://remiller1450.github.io/data/BostonMarathon2015.csv")

Part A: A marathon is approximately 26.2 miles, making the first half 13.1 miles. Calculate the per mile pace (in seconds) for each participant in the first half of the race. Be sure to store your results.

Part B: Now calculate the pace per mile in the second half of the race. Be sure to store your results.

Part C: Now create a scatterplot displaying the relationship between pace per mile in the first half of the race vs. pace per mile in the second half of the race by age and sex. To do this, you should assemble your results from Parts A and B into a data frame, and you should also include the “Age” and “M.F” columns from the original data when you create this data frame. A target graphic is given below. Note: hms::scale_x_time() and hms::scale_y_time() can be used to display your first half and second half paces on a time scale. The graph shown below uses the argument alpha = 0.2 to reduce the impact of over-plotting, and a 45-degree line is added using geom_abline().

Note: If you get an error that hms is not available, you need to install it with install.packages("hms"). However, do not load the entire package with library(hms), as this will cause problems with the lubridate package. Instead, to use a function from a package without loading the entire package, we use the double colon :: with the form packagename::function. In this case, we are wanting to use the function scale_x_time() from the package hms, hence hms::scale_x_time().

\(~\)