--- title: "Achilles tables" output: bookdown::html_document2: number_sections: true toc: true pandoc_args: ["--number-offset=1,0"] vignette: > %\VignetteIndexEntry{Achilles tables} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction ```{r, echo=FALSE, message=FALSE, warning=FALSE} library(dplyr) library(gt) x <- OmopConstructor:::achillesAnalisisDetails |> mutate(Group = paste0( if_else(is_minimal, "minimal; ", ""), if_else(is_default, "default; ", ""), tolower(category) )) |> select( "ID" = "analysis_id", "Name" = "analysis_name", "1" = "stratum_1_name", "2" = "stratum_2_name", "3" = "stratum_3_name", "4" = "stratum_4_name", "5" = "stratum_5_name", "Group", "category" ) |> mutate(across(c("1", "2", "3", "4", "5"), \(x) coalesce(x, "-"))) xt <- x |> inner_join( x |> group_by(category) |> summarise(min = min(ID)), by = "category" ) |> arrange(min) |> select(!"min") |> group_by(category) |> gt() |> tab_spanner(label = "Analysis", columns = c("ID", "Name")) |> tab_spanner(label = "Stratum", columns = c("1", "2", "3", "4", "5")) |> tab_style( style = cell_text(align = "center", weight = "bold"), locations = cells_column_labels() ) |> tab_style( style = cell_text(align = "center", weight = "bold"), locations = cells_column_spanners() ) |> tab_style( style = cell_fill(color = "#4E6D8C", alpha = 0.1), locations = cells_body(columns = c("ID", "Name")) ) |> tab_style( style = cell_fill(color = "#4E6D8C", alpha = 0.5), locations = cells_column_labels(columns = c("ID", "Name")) ) |> tab_style( style = cell_fill(color = "#4E6D8C", alpha = 0.5), locations = cells_column_spanners(spanners = c("Analysis")) ) |> tab_style( style = cell_fill(color = "#2A9D8F", alpha = 0.1), locations = cells_body(columns = c("1", "2", "3", "4", "5")) ) |> tab_style( style = cell_fill(color = "#2A9D8F", alpha = 0.5), locations = cells_column_labels(columns = c("1", "2", "3", "4", "5")) ) |> tab_style( style = cell_fill(color = "#2A9D8F", alpha = 0.5), locations = cells_column_spanners(spanners = "Stratum") ) |> tab_style( style = cell_fill(color = "#E9C46A", alpha = 0.1), locations = cells_body(columns = c("Group")) ) |> tab_style( style = cell_fill(color = "#E9C46A", alpha = 0.5), locations = cells_column_labels(columns = c("Group")) ) |> tab_style( style = cell_fill(color = "#D1D5DB", alpha = 0.1), locations = cells_row_groups() ) ``` The [`Achilles`](https://ohdsi.github.io/Achilles/) R package is used to provide descriptive statistics of an [OMOP CDM](https://ohdsi.github.io/CommonDataModel/) database. There exist a total of `r nrow(x)` analyses, classified into 21 categories: `r paste0("*", unique(OmopConstructor:::achillesAnalisisDetails$category), "*", collapse = ", ")`. ```{r, echo=FALSE} xt ``` ## Run achilles analysis You can create the Achilles tables using the function `buildAchillesTables()`. The achilles tables (`achilles_results`, `achilles_results_dist`, `achilles_analysis`) will be created in the write schema of your cdm object. You can choose what Achilles analyses to run using the `achillesId` argument providing a list of ids or a 'group' to identify several ids: - `'all'` to run all the analyses. - `'default'` to run the default Achilles analyses. - `'minimal'` to run a subset of Achilles analyses that contains the concept counts of each table, used by packages like [CodelistGenerator](https://darwin-eu.github.io/CodelistGenerator/) to find concept counts quickly. Here you can see how we run achilles analyses in the 'GiBleed' synthetic dataset: ```{r} library(omock) library(OmopConstructor) cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb") cdm cdm <- buildAchillesTables(cdm = cdm, achillesId = "minimal") cdm cdm$achilles_results ``` ## Differences with the Achilles R package `OmopConstructor::buildAchillesTables()` and `OHDSI/Achilles::achilles()` both populate the same three output tables (`achilles_results`, `achilles_results_dist`, `achilles_analysis`) against an OMOP CDM database, but they follow fundamentally different design principles: ### Execution Model The most fundamental difference is *where* computation happens. **OHDSI/Achilles** is SQL-first. Every analysis is a parameterised SQL template rendered by `SqlRender` and executed via `DatabaseConnector` (JDBC). R is purely an orchestrator — no CDM data ever enters R memory. This gives Achilles broad dialect coverage (PostgreSQL, SQL Server, Oracle, BigQuery, Redshift, Spark, DuckDB) and keeps performance independent of R's memory constraints. **OmopConstructor** is R-first. Analyses are expressed as a small vocabulary of configurable operations (`count`, `distribution`, `proportion`, `coocurrent`, `overlap`, `conceptDistribution`) executed through `dplyr`/`dbplyr` against a `cdm_reference` object. The database backend is abstracted by `CDMConnector`/`DBI`, so no Java runtime is required. ### Small Cell Suppression **OHDSI/Achilles** provides a `smallCellCount` parameter. Any result with a count below the specified threshold is suppressed before being written to `achilles_results`, supporting privacy-preserving characterisation out of the box. **OmopConstructor** has no equivalent parameter. Suppression is not implemented at the `buildAchillesTables()` layer as results don't leave the database. When retrieving data from the achilles tables tha packages apply their own min cell count suppression that cna be customised at every step. ### Observation Period Consistency In **OHDSI/Achilles**, the observation period filter is applied inconsistently across analyses. Some analyses count records or persons *only within* a valid observation period; others count *regardless* of observation period. This inconsistency has been reported in several open issues. **OmopConstructor** makes the observation period filter an explicit, uniform operation (`observation start yes/no`) in the analysis configuration. Every analysis that involves an observation period check applies it in the same way, and analyses that do not require it simply omit the operation. This produces consistent behaviour across the full catalogue.