Line Diagrams

Why would I need code for a line diagram?

Line diagrams are easy to draw by hand, but can get unwieldy for real datasets. But sometimes plotting the line diagram can be useful. This includes settings when you want a quick visual check of

  • the administrative censoring date
  • the presence of late entry
  • the maximum follow-up time
  • etc

R code to produce line diagrams

Simple simulated data (used in L1)

First, we generate simple data on 20 participants as describe by L1:

“Say you wish to estimate the 5-year risk of death among people entering HIV care. You have a database of people entering HIV care between 2012 and 2020.”

### Generate some data -----
require(dplyr)
set.seed(123)
year0 <- runif(20, min=2012, max = 2020)
t <- runif(20, min=2, max = 15)
dat <- data.frame(year0, t)


dat <- dat %>% mutate(y=ifelse(t+year0>2020 | t>5, 0, 1),
                      t = ifelse(t+year0>2020, 2020-year0, t), 
                      t = ifelse(t>5, 5, t),
                      id = row_number())

dat
##       year0         t y id
## 1  2014.301 5.0000000 0  1
## 2  2018.306 1.6935589 0  2
## 3  2015.272 4.7281846 0  3
## 4  2019.064 0.9358608 0  4
## 5  2019.524 0.4762617 0  5
## 6  2012.364 5.0000000 0  6
## 7  2016.225 3.7751561 0  7
## 8  2019.139 0.8606476 0  8
## 9  2016.411 3.5885199 0  9
## 10 2015.653 3.9124774 1 10
## 11 2019.655 0.3453332 0 11
## 12 2015.627 4.3733268 0 12
## 13 2017.421 2.5794349 0 13
## 14 2016.581 3.4189328 0 14
## 15 2012.823 2.3199779 1 15
## 16 2019.199 0.8014002 0 16
## 17 2013.969 5.0000000 0 17
## 18 2012.336 4.8133032 1 18
## 19 2014.623 5.0000000 0 19
## 20 2019.636 0.3639708 0 20

Produce the line diagram by creating line segments and points in ggplot.

# Calendar time as timescale
library(grid)
library(ggthemr)
ggthemr('solarized')
line <- ggplot() +
  geom_segment(data = dat %>% filter(y==0), aes(x = year0, y = id, xend = year0+t, yend = id),  arrow = arrow(length = unit(0.1, "cm"))) +
  geom_segment(data = dat %>% filter(y==1), aes(x = year0, y = id, xend = year0+t, yend = id)) +
  scale_y_continuous(name = "ID", breaks = c(1, 5, 10, 15, 20), limits = c(0,20))+
  scale_x_continuous(name = "Calendar Time", breaks=c(2012, 2014, 2016, 2018,  2020), limits = c(2012, 2021)) +
  geom_point(data = dat %>% filter(y==1), aes(x = year0+t, y = id), color = "red", size = 0.6) +
  theme(text = element_text(size = 14, family = "Open Sans"))

line

Reorganize plot to show time since entry into HIV care on the x-axis.

# Calendar time as timescale
line2 <- ggplot() +
  geom_segment(data = dat %>% filter(y==0), aes(x = 0, y = id, xend = t, yend = id),  arrow = arrow(length = unit(0.1, "cm"))) +
  geom_segment(data = dat %>% filter(y==1), aes(x = 0, y = id, xend = t, yend = id)) +
  scale_y_continuous(name = "ID", breaks = c(1, 5, 10, 15, 20), limits = c(0,20))+
  scale_x_continuous(name = "Calendar Time", breaks=c(0, 1, 2, 3, 4, 5), limits = c(0, 6)) +
  geom_point(data = dat %>% filter(y==1), aes(x = t, y = id), color = "red", size = 0.6) +
  theme(text = element_text(size = 14, family = "Open Sans"))

line2

Example data (similar to Cole & Hudgens 2010)

Next, we read in some sample data similar to that used in Cole & Hudgens 2010.

##   int     year        w        t cenyear d late newid yearw    yeart
## 1   1 2010.320 4.679838 5.527338    2020 1    1     1  2015 2015.848
## 2   1 2010.321 4.678720 9.557808    2020 1    1     2  2015 2019.879
## 3   1 2010.427 4.573213 9.573213    2020 0    1     3  2015 2020.000
## 4   1 2010.509 4.490943 9.490943    2020 0    1     4  2015 2020.000
## 5   1 2010.645 4.355168 9.355168    2020 0    1     5  2015 2020.000
## 6   1 2010.775 4.224887 9.224887    2020 0    1     6  2015 2020.000

Calendar timescale

### Create second dataset with only events ------
exdat2 <- exdat[exdat$d == 1, ]

### Plot the lines -----
line <- ggplot(data = exdat) +
  geom_segment(aes(x = year, y = newid, xend = yearw, yend = newid), lty = "dotted") +
  geom_segment(aes(x = yearw, y = newid, xend = yeart, yend = newid)) +
  ylab("ID") +
  scale_x_continuous(name = "Calendar Time", breaks=c(2010, 2015,  2020)) +
  geom_point(data = exdat2, aes(x = yeart, y = newid), color = "red", size = 0.5) +
  theme(text = element_text(size = 16, family = "Open Sans"))

line

Time since AIDS diagnosis

line2 <- ggplot(data = exdat) +
  geom_segment(aes(x = 0, y = newid, xend = w, yend = newid), lty = "dotted") +
  geom_segment(aes(x = w, y = newid, xend = t, yend = newid)) +
  ylab("ID") +
  scale_x_continuous(name = "Time since AIDS diagnosis", breaks = c(0,2, 4, 6, 8, 10)) +
  geom_point(data = exdat2, aes(x = t, y = newid), color = "red", size = 0.5) +
  
  theme(text = element_text(size = 14, family = "Open Sans"))

line2