---
title: "Multiple ITS control introduction for slope change 2nd example (two-stage)"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Multiple ITS control introduction for slope change 2nd example (two-stage)}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE, 
  echo = FALSE,
  warning = FALSE,
  message = FALSE,
  comment = "#>"
)
```

```{r setup}
library(multipleITScontrol)
library(dplyr)
library(ggplot2)
library(lubridate)
library(stringi)
library(rlang)
library(purrr)

phei_calendar <- function(df,
                          date_column = NULL,
                          factor_column = NULL,
                          colours = NULL,
                          title = "Placeholder: Please supply title or 'element_blank()' to `title` argument",
                          subtitle = "Placeholder: Please supply subtitle or 'element_blank()' to `subtitle` argument",
                          caption = "PH.Intelligence@hertfordshire.gov.uk",
                          ncol,
                          ...) {


  date_column <- rlang::sym(date_column)
  factor_column <- rlang::sym(factor_column)

  df <- df |> dplyr::mutate(
    mon = lubridate::month(!!date_column, label = T, abbr = F),
    wkdy = weekdays(!!date_column,
                    abbreviate =
                      T
    ) |> forcats::fct_relevel("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"),
    day = lubridate::mday(!!date_column),
    week = stringi::stri_datetime_fields(!!date_column)$WeekOfMonth,
    year = lubridate::year(!!date_column),
    year_mon = zoo::as.yearmon(!!date_column, "%Y %m")
  ) |>
    dplyr::mutate(across(week, ~ dplyr::case_when(wkdy == "Sun" ~ week - 1,
                                           .default = as.numeric(week)
    )))
  
  df %>%
    ggplot2::ggplot(., ggplot2::aes(wkdy, week)) +
    # custom theme stuff below
    # geom_tile and facet_wrap will do all the heavy lifting
    ggplot2::geom_tile(
      alpha = 0.8,
      ggplot2::aes(fill = !!factor_column),
      color = "black", ...
    ) +
    ggplot2::facet_wrap(~year_mon, scales = "free_x", ncol = ncol) +
    ggplot2::geom_text(ggplot2::aes(label = day)) +
    # put your y-axis down, flip it, and reverse it
    ggplot2::scale_y_reverse(breaks = NULL) +
    # manually fill scale colors to something you like...
    ggplot2::scale_fill_manual(
      values = colours,
      na.value = "white",
      na.translate = FALSE
    ) +
    ggpubr::theme_pubclean() +
    ggplot2::theme(legend.position = "bottom") +
    ggplot2::labs(
      fill = "",
      x = "",
      y = "",
      title = element_blank(),
      caption = "PH.Intelligence@hertfordshire.gov.uk"
    )
}
```

## Usage

This is a basic example which shows you how to solve a common problem with two stage interrupted time series with a control for a slope hypothesis:

**Background**: *Albridge Medical Practice* and *Hollybush Medical Practice* are two medical practices within the same PCN, with similar populations of people, and prevalence of disease.

*Albridge Medical Practice* wants to try a new intervention to improve wellbeing in people diagnosed with depression in their practice.

This example is for scenarios where there is a statistically significant slope change for one intervention, but no level change.

**Intervention 1: Implementing a new Mental Health Support programme**

-   **Objective:** Improve mental wellbeing in patients with low-to-mid level depression.
-   **Start Date:** April 4, 2022
-   **Duration:** 2 months
-   **Description:** The practice introduced weekly Mindfulness Workshops, teaching meditation and breathing techniques to improve self-regulation.
-   **Measurement:** Self-reported wellbeing scores measured at start and end of intervention.

**Intervention 2: Introducing AI led CBT session**

-   **Objective:** Further increase self-reported wellbeing scores.
-   **Start Date:** June 6, 2022 (immediately after the intervention 1 program ends)
-   **Duration:** 6 months
-   **Description:** The practice implements cognitive behavioural therapy (CBT) sessions, aimed at changing negative thought patterns and behaviours.
-   **Measurement:** Self-reported wellbeing scores measured at start and end of intervention.

### Controlled Interrupted Time Series Design (2 stage)

**Step 1: Baseline Period**

-   **Duration:** 3 months (Jan 1, 2022 - April 3, 2022)
-   **Data Collection:** Collect self-reported wellbeing scores.

**Step 2: Intervention 1 Period**

-   **Duration:** 2 months (April 4, 2022 - June 5, 2022)
-   **Data Collection:** Continue collecting self-reported wellbeing scores at end of workshops.

**Step 3: Intervention 2 Period**

-   **Duration:** 6 months (June 6, 2022 - Dec 31, 2022)
-   **Data Collection:** Continue collecting self-reported wellbeing scores at end of CBT.

The calendar plot below summarises the timeline of the interventions:

```{r calendar, echo = FALSE, warning = FALSE, message = FALSE, fig.align="center", fig.height=10, fig.width=7, fig.retina=3}

tibble_data_calendar <- its_data_gp |> 
    group_by(group_var) |>
    arrange(group_var, Date) |> 
    tidyr::complete(Date = seq(min(Date), max(Date), by = "day")) |> 
    tidyr::fill(Period, .direction = "down")

plot <- phei_calendar(
  tibble_data_calendar,
  date_column = "Date",
  "Period",
  colours = c("#3b5163", "#80bb77", "#afd0f0"),
  ncol = 3
) +
  theme(strip.text = element_text(size = rel(0.5)),
        axis.text = element_text(size = rel(0.5)),
        plot.caption = element_text(size = rel(0.5)),
        legend.text = element_text(size = rel(0.5)))
  

plot$layers[[2]]$aes_params$size <- 3

plot


```

# Step 1) Loading data

Sample data can be loaded from the package for this scenario through the bundled dataset `its_data_gp`.

<br></br>

```{r step_1_load_data}
DT::datatable(its_data_gp, options = list(dom = 'tip'), rownames = FALSE)
```

<br></br>

This sample dataset demonstrates the format your own data should be in.

You can observe that in the `Date` column, that the dates are of equal distance between each element, and that there are two rows for each date, corresponding to either `control` or `treatment` in the `group_var` variable. `control` and `treatment` each have three periods, a `Pre-intervention period` detailing measurements of the outcome prior to any intervention, the first intervention detailed by `Intervention 1) Implementing a new Mental Health Support programme`, and the second intervention, detailed by `Intervention 2) Introducing CBT session`.

<br></br>

# Step 2) Transforming the data

The data frame should be passed to `multipleITScontrol::tranform_data()` with suitable arguments selected, specifying the names of the columns to the required variables and starting intervention time points.

```{r, echo = TRUE, results='hide'}
intervention_dates <- c(as.Date("2022-04-04"), as.Date("2022-06-06"))
transformed_data <- 
  multipleITScontrol::transform_data(df = its_data_gp,
               time_var = "Date",
               group_var = "group_var",
               outcome_var =  "score",
               intervention_dates = intervention_dates)
```

Returns the initial data frame with a few transformed variables needed for interrupted time series.

```{r}
transformed_data
```

# Step 3) Fitting ITS model

The transformed data is then fit using `multipleITScontrol::fit_its_model()`. Required arguments are `transformed_data`, which is simply an unmodified object created from `multipleITScontrol::transform_data()` in the step above; a defined impact model, with current options being either '*slope*', \`*level*, or '*levelslope*', and the number of interventions.

```{r, echo = TRUE, results='hide'}
fitted_ITS_model <-
  multipleITScontrol::fit_its_model(transformed_data = transformed_data,
                                    impact_model = "slope",
                                    num_interventions = 2)

fitted_ITS_model
```

Gives a conventional model output from `nlme::gls()`.

```{r}
fitted_ITS_model
```

# Step 4) Analysing ITS model

However, the coefficients given do not make intuitive sense to a lay person. We can call the package's internal `multipleITScontrol::summary_its()` which modifies the summary output by renaming the coefficients to make them easier to interpret in the context of interrupted time series (ITS) analysis.

```{r, echo = TRUE, results='hide'}
my_summary_its_model <- multipleITScontrol::summary_its(fitted_ITS_model)

my_summary_its_model
```

```{r}
my_summary_its_model
```

```{r, echo = TRUE, results='hide'}
summary(my_summary_its_model)
```

```{r}
summary(my_summary_its_model)
```

```{r, echo = TRUE, results='hide'}
sjPlot::tab_model(
  my_summary_its_model,
  dv.labels = "Self-reported Wellbeing Score",
  show.se = TRUE,
  collapse.se = TRUE,
  linebreak = FALSE,
  string.est = "Estimate (std. error)",
  string.ci = "95% CI",
  p.style = "numeric_stars"
)
```

```{r}
sjPlot::tab_model(
  my_summary_its_model,
  dv.labels = "Self-reported Wellbeing Score",
  show.se = TRUE,
  collapse.se = TRUE,
  linebreak = FALSE,
  string.est = "Estimate (std. error)",
  string.ci = "95% CI",
  p.style = "numeric_stars"
)

a <- coef(my_summary_its_model)[[which(names(coef(my_summary_its_model)) == "A) Control y-axis intercept")]] |> round(2)
c <- coef(my_summary_its_model)[[which(names(coef(my_summary_its_model)) == "C) Control pre-intervention slope")]] |> round(2)
d <- coef(my_summary_its_model)[[which(names(coef(my_summary_its_model)) == "D) Pilot pre-intervention slope difference to control")]] |> round(2)
e <- coef(my_summary_its_model)[[which(names(coef(my_summary_its_model)) == "E) Control intervention 1 slope")]] |> round(2)
f <- coef(my_summary_its_model)[[which(names(coef(my_summary_its_model)) == "F) Pilot intervention 1 slope")]] |> round(2)
i <- coef(my_summary_its_model)[[which(names(coef(my_summary_its_model)) == "I) Control intervention 2 slope")]] |> round(2)
j <- coef(my_summary_its_model)[[which(names(coef(my_summary_its_model)) == "J) Pilot intervention 2 slope")]] |> round(2)
```

The predictor coefficients elucidate a few things:

## **Pre-intervention period:**

At the start of the pre-intervention period, ***A)*** ***Control y-axis intercept*** represents the modelled starting score of Hollybush Medical Practice, `r a`.

***C) Control pre-intervention slope*** describes the pre-intervention slope in the control group (`r c`).

***D) Pilot pre-intervention slope difference to control*** describes the difference in the pre-intervention slope in the pilot group with the control group. This coefficient is additive to C) ***Control pre-intervention slope***. I.e. `r c` (C) + `r d` (D) = `r c+d` is the pre-intervention slope per x-axis unit in the pilot data.

## **First intervention**:

***E) Control intervention 1 slope*** describes the slope change that occurs at the intervention break point in the control group at the start of the first intervention, compared to it's pre-intervention period (`r e`).

***F) Pilot intervention 1 slope*** describes the difference in the slope change that occurs at the intervention timepoint in the pilot group for the first intervention compared to the control (`r f`).

These slope changes are pertinent to the slope gradients given in the pre-intervention period. Thus, we add the coefficients ***E)*** ***Control intervention 1 slope** to **C)*** ***Control pre-intervention slope***: `r e` + `r c` = `r e+c` is the average increase for each x-axis unit during the first intervention for the control data.

To ascertain the slope for the pilot data, we add to the pre-intervention slope of the pilot data, the coefficients ***E)*** ***Control intervention 1 slope*** and ***F)*** ***Pilot intervention 1 slope***. ***E*** (`r e`) + ***F*** (`r f`) + ***(C)*** `r c` + ***D*** `r d` (D) = `r e+f+c+d` is the average increase for each x-axis unit during the first intervention for the pilot data.

To ascertain statistical significance with the first intervention slope, we call the function's `multipleITScontrol::slope_difference()`.

```{r, echo = TRUE, results='hide'}
slope_difference(model = my_summary_its_model, intervention = 1)
```

```{r, echo = FALSE}
slope_difference(model = my_summary_its_model, intervention = 1)

```

This brings up the key coefficients and values needed to compare the slopes of the pilot and control during the first intervention.

We identify that the slope difference between the treatment (Albridge Medical Practice) and the control (Hollybush Medical Practice) for the first intervention (Reading Programme) has a slope difference of 0.34 (95% CI: 0.32 - 0.36) per x-axis unit, with a p-value of <0.001, indicating statistical significance.

<!-- Whilst the value of the slope difference is close to the coefficient ***F*** (Pilot intervention 2) in our regression coefficients table, we must still calculate this with the above calculations which `slope_difference()` provides, as the similarity is only due to the slopes of the control and pilot being zero for the pre-intervention period and the control having close to a zero slope for the first intervention. -->

## **Second intervention:**

***I) Control intervention 2 slope*** describes the slope change that occurs at the intervention break point in the control group at the start of the second intervention (`r i`).

Thus, the modelled slope change in the second intervention is ***C) Control pre-intervention slope*** (`r c`) + **E) Control intervention 1 slope** (`r e`) + ***I) Control intervention 2 slope*** (`r i`) = `r c+e+i` is the average cumulative uptake increase for each x-axis unit during the second intervention for the control data.

***J) Pilot intervention 2 slope*** describes the difference in the slope change that occurs at the intervention timepoint in the pilot group for the second intervention. (`r j`).

These slope changes are pertinent to the slope gradients given in the pre-intervention and first intervention period. Thus, we add the coefficients ***C*** (`r c`) + ***D*** (`r d`) + ***E*** (`r e`) + ***F*** (`r f`) + ***I*** (`r i`) + ***J*** (`r j`) = `r c+d+e+f+i+j` is the average cumulative increase for each x-axis unit during the second intervention for the pilot data.

To ascertain statistical significance with the second intervention slope, we call the function's `multipleITScontrol::slope_difference()` again, but change the intervention parameter.

```{r, echo = TRUE, results= 'hide'}
slope_difference(model = my_summary_its_model, intervention = 2)
```

```{r, echo = FALSE}
slope_difference(model = my_summary_its_model, intervention = 2)

```

We identify that the slope difference between the treatment (Albridge Medical Practice) and the control (Hollybush Medical Practice) for the second intervention (Reading Programme) has a slope difference of 0 (95% CI: -0.01 - 0.01) per x-axis unit, with a p-value of 0.636, indicating a non statistically significant result. The effect has been attenuated compared to the first intervention, and this is evident from the plot in step 6.

# Step 5) Fitting Predictions

We can fit predictions with the created model which project the pre-intervention period into the post-intervention period by using the model coefficients using `multipleITScontrol::generate_predictions()`.

```{r, echo = TRUE, results='hide'}
transformed_data_with_predictions <- generate_predictions(transformed_data, fitted_ITS_model)

transformed_data_with_predictions
```

```{r}
DT::datatable(transformed_data_with_predictions, options = list(dom = 'tip', scrollX = TRUE), rownames = FALSE)
```

## Step 6) Plotting the results

We can use the predicted values and map the segmented regression lines which compare whether an intervention had a statistically significant difference.

```{r, echo = TRUE, fig.align="center", fig.width=7, fig.height=7, fig.retina=3}
its_plot(model = my_summary_its_model,
         data_with_predictions = transformed_data_with_predictions, 
         time_var = "time",
         intervention_dates = intervention_dates, 
         y_axis = "Self-reported Wellbeing Score")
```

In this example, the treatment variable is for *Albridge Medical Practice*, whilst the control is for *Hollybush Medical Practice*. The treatment slope shows there was a significant slope change immediately after the first intervention in April 2022, but not in the second intervention in June 2022.