--- title: "Getting Started with TernTables" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with TernTables} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE ) library(TernTables) options(tibble.width = Inf) # show all columns in printed tibbles # Output directory for exported .docx files. # Override by setting options(TernTables.vignette_outdir = "/your/path") before rendering. out_dir <- getOption("TernTables.vignette_outdir", default = tempdir()) ``` ```{css, echo = FALSE} img { border: none !important; box-shadow: none !important; } ``` ## Overview **TernTables** is built for clinical researchers who need to go from raw data to a manuscript-ready Word table — with variable detection, statistical test selection, and formatting all handled automatically. Given a data frame and an optional grouping variable, it automatically: - Detects each variable's type (continuous, binary, categorical) - Selects the appropriate statistical test - Formats *P* values and summary statistics for publication-ready tables - Exports directly to a styled `.docx` Word file and generates a boilerplate statistical methods paragraph - Returns a tibble for inspection, Excel export, or further analysis in R Three table types are supported: **descriptive summaries** (single cohort, no comparisons), **two-group comparisons** (with optional odds ratios), and **comparisons across three or more groups**. The convenience is in the automation, not in any compromise to statistical rigor. Test selection follows established published criteria throughout: normality by Shapiro-Wilk per group, Fisher's exact triggered by the Cochran (1954) expected-cell criterion, and odds ratios reported as unadjusted with the first factor level of the grouping variable as the reference. The auto-generated methods paragraph covers the statistical approach used and is suitable as a starting draft for a manuscript methods section. > **No R required?** TernTables is available as a free point-and-click web > application at [tern-tables.com](https://tern-tables.com/). Upload a CSV > or XLSX, configure your table, and download a formatted Word document — > all without writing a line of code. The web app is powered by this package, > so the statistical methods, normality routing, and Word output are identical. > A built-in side panel shows the R commands running in the background and > the full script can be downloaded at the end of your session, making every > analysis fully transparent and reproducible. For scripted or reproducible > workflows, the R package (this vignette) remains the canonical reference. ## Example Dataset ```{r load-data} data(tern_colon) ``` `tern_colon` is bundled with TernTables. It is derived from `survival::colon` and contains 929 patients from a landmark colon cancer adjuvant chemotherapy trial (Moertel et al., 1990), filtered to the recurrence endpoint — one row per patient. See `?tern_colon` for full details. Key variables used in these examples: | Column | Description | |---|---| | `Age_Years` | Age at registration (years) | | `Sex` | Female / Male | | `Colonic_Obstruction` | Colonic obstruction present — n (%) | | `Bowel_Perforation` | Bowel perforation present — n (%) | | `Positive_Lymph_Nodes_n` | Number of positive lymph nodes | | `Over_4_Positive_Nodes` | More than 4 positive lymph nodes — n (%) | | `Tumor_Adherence` | Tumour adherence to nearby organs — n (%) | | `Tumor_Differentiation` | Well / Moderate / Poor | | `Extent_of_Local_Spread` | Depth of tumour penetration (4 levels) | | `Recurrence` | No Recurrence / Recurrence — **2-group** | | `Treatment_Arm` | Levamisole + 5FU / Levamisole / Observation — **3-group** | --- ## Preprocessing Raw Data (`ternP`) If your source is a raw CSV or XLSX file — rather than an already-clean R object — use `ternP()` to standardize it before passing it to `ternG()` or `ternD()`. It handles the messiness most commonly introduced by manual data entry or spreadsheet workflows: | Transformation | What it fixes | |---|---| | String NA conversion | `"NA"`, `"na"`, `"Na"`, `"unk"` → `NA` | | Whitespace trimming | Leading/trailing spaces in character columns | | Empty column removal | 100% `NA` columns silently dropped | | Blank row removal | Rows where every cell is `NA` | | Case normalization | `"fEMALE"` / `"Female"` unified to title case | `ternP()` also applies two **hard stops** before any cleaning takes place: it errors immediately if any column name matches a protected health information (PHI) pattern (e.g. `MRN`, `DOB`, `FirstName`), or if any unnamed column contains data. ```{r ternP-run, eval = FALSE} # Load a messy CSV shipped with the package path <- system.file("extdata/csv", "tern_colon_messy.csv", package = "TernTables") raw <- readr::read_csv(path, show_col_types = FALSE) result <- ternP(raw) # The print method fires automatically, summarising every transformation applied. ``` The printed summary identifies each transformation and shows the final dimensions of the cleaned data. If the data was already clean, a single "No transformations required" line appears. Three items are returned in the result object: ```{r ternP-access, eval = FALSE} result$clean_data # Cleaned, analysis-ready tibble result$sparse_rows # Rows with >50% NA (retained, not removed — review these) result$feedback # Named list; NULL elements mean no action was taken ``` To write a Word document recording the cleaning steps, call `write_cleaning_doc()`. It is fully dynamic — only paragraphs for triggered transformations are written, so the document is concise for already-clean data. ```{r ternP-doc, eval = FALSE} write_cleaning_doc(result, filename = file.path(out_dir, "cleaning_summary.docx")) ``` Once preprocessing is complete, pass `result$clean_data` directly to `ternD()` or `ternG()`: ```{r ternP-handoff, eval = FALSE} tbl <- ternG(result$clean_data, exclude_vars = c("ID"), group_var = "Recurrence") ``` --- ## Descriptive Table (`ternD`) Use `ternD()` for a single cohort with no group comparisons — the standard "Table 1" in a cohort description. Pass `output_docx` to write a publication-ready Word file in the same call; pass `output_xlsx` to also save the tibble as an Excel file. Use `category_start` to insert bold section headers grouping related variables; anchors can be either the raw column name or the cleaned display label. ```{r ternD-example, results = "hide"} tbl_descriptive <- ternD( data = tern_colon, exclude_vars = c("ID"), output_docx = file.path(out_dir, "Tern_descriptive.docx"), methods_filename = file.path(out_dir, "TernTables_methods.docx"), category_start = c( "Patient Demographics" = "Age (yr)", "Surgical Findings" = "Colonic Obstruction", "Tumor Characteristics" = "Positive Lymph Nodes (n)", "Outcomes" = "Recurrence" ) ) tbl_descriptive ``` Continuous variables show mean ± SD or median [IQR] based on the four-gate ROBUST normality algorithm (n < 3 fail-safe, skewness check, CLT at n ≥ 30, Shapiro-Wilk for small samples). Columns whose values are exactly Y/N, YES/NO, or numeric 0/1 are detected as binary and shown as a single n (%) row (the positive/yes count). All other categorical variables — including two-level variables like Male/Female — are shown with each level as an indented sub-row. Variable names are automatically cleaned for display (`smart_rename = TRUE` by default) — underscores replaced with spaces, capitalisation normalised, and common medical abbreviations formatted (e.g. `Age_Years` → `Age (yr)`, `Positive_Lymph_Nodes_n` → `Positive Lymph Nodes (n)`). Pass `smart_rename = FALSE` to use column names exactly as they appear in the data. Descriptive summary table exported to Word: ```{r ternD-figure, echo=FALSE, fig.align="center", out.width="45%"} knitr::include_graphics("figures/tern_descriptive.png") ``` --- ## Two-Group Comparison (`ternG` — 2 levels) Use `ternG()` to compare variables between two groups. Set `OR_col = TRUE` to add odds ratios with 95% CI for binary variables (Y/N, YES/NO, 0/1) and two-level categorical variables such as Male/Female. For two-level categoricals displayed with sub-rows, the reference level (factor level 1 or alphabetical first) shows `1.00 (ref.)`; the non-reference level shows the computed OR with 95% CI. Fisher's exact or Wald is chosen automatically based on expected cell counts. Pass `output_docx` to write the Word table directly; `output_xlsx` exports the tibble to Excel. ```{r ternG-2group, results = "hide"} tbl_2group <- ternG( data = tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", output_docx = file.path(out_dir, "Tern_2_group.docx"), methods_filename = file.path(out_dir, "TernTables_methods.docx"), OR_col = TRUE, insert_subheads = TRUE, category_start = c( "Patient Demographics" = "Age (yr)", "Surgical Findings" = "Colonic Obstruction", "Tumor Characteristics" = "Positive Lymph Nodes (n)", "Treatment Details" = "Treatment Arm" ) ) tbl_2group ``` The Word table includes an OR column (odds ratio with 95% CI for binary variables) and a *P* value column (test *P* value for each variable). Two-group comparison table exported to Word, with odds ratios and category section headers: ![](figures/tern_2_group.png){width=100%} --- ## Three or More Groups (`ternG` — 3+ levels) The same `ternG()` function handles three or more groups automatically, switching from t-test/Wilcoxon to Welch ANOVA/Kruskal-Wallis as appropriate. Odds ratios are not available for 3+ group comparisons. `consider_normality` controls normality routing; the default (`"ROBUST"`) applies the four-gate algorithm (n < 3 fail-safe → skewness → CLT → Shapiro-Wilk). `FALSE` forces parametric tests throughout; `"FORCE"` forces nonparametric throughout. Set `post_hoc = TRUE` to run pairwise post-hoc tests automatically when the omnibus *P* < 0.05. The test is matched to the omnibus test used: **Games-Howell** follows Welch ANOVA (parametric path); **Dunn’s test with Holm correction** follows Kruskal-Wallis (non-parametric and ordinal path). Results are appended to each cell as compact letter display (CLD) superscripts — groups sharing a letter are not significantly different after correction. Categorical variables never receive post-hoc testing. When `post_hoc = TRUE` and at least one test fires, an explanatory footnote is added automatically to the Word output. ```{r ternG-3group, results = "hide"} tbl_3group <- ternG( data = tern_colon, exclude_vars = c("ID"), group_var = "Treatment_Arm", group_order = c("Observation", "Levamisole", "Levamisole + 5FU"), output_docx = file.path(out_dir, "Tern_3_group.docx"), methods_filename = file.path(out_dir, "TernTables_methods.docx"), consider_normality = "ROBUST", post_hoc = TRUE, category_start = c( "Patient Demographics" = "Age (yr)", "Surgical Findings" = "Colonic Obstruction", "Tumor Characteristics" = "Positive Lymph Nodes (n)", "Outcomes" = "Recurrence" ) ) tbl_3group ``` Three-group comparison table exported to Word with category section headers: ![](figures/tern_3_group.png){width=100%} --- ## Word Output Formatting Two optional parameters control text that appears outside the table body in the exported Word document. **`table_caption`** places a bold size-11 Arial caption above the table, single-spaced with a small gap between the caption and the table: ```{r caption-example, eval = FALSE} tbl_descriptive <- ternD( data = tern_colon, exclude_vars = c("ID"), output_docx = file.path(out_dir, "Tern_descriptive.docx"), table_caption = "Table 1. Baseline patient characteristics." ) ``` **`table_footnote`** adds a merged footer row below the table in size-6 Arial italic, bordered above and below by a double rule. Pass a single string or a character vector for multiple lines (lines are joined with a line break inside the same cell — no extra row spacing): ```{r footnote-example, eval = FALSE} tbl_2group <- ternG( data = tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", OR_col = TRUE, output_docx = file.path(out_dir, "Tern_2_group.docx"), table_caption = "Table 2. Characteristics by recurrence status.", table_footnote = c( "Abbreviations: OR, odds ratio; CI, confidence interval.", "\u2020 P values from chi-square or Wilcoxon rank-sum test.", "\u2021 ORs from unadjusted logistic regression." ) ) ``` Both parameters are also stored in the table's metadata and reproduced automatically when combining tables with `ternB()`. --- ## Statistical Test Logic TernTables selects tests automatically based on variable type and normality: | Variable type | Test (2 groups) | Test (3+ groups) | Post-hoc (3+ groups, `post_hoc = TRUE`, omnibus *p* < 0.05) | |---|---|---|---| | Continuous, normal | Welch's *t*-test | Welch ANOVA | Games-Howell | | Continuous, non-normal | Wilcoxon rank-sum | Kruskal-Wallis | Dunn's + Holm | | Binary / Categorical | Fisher's exact or Chi-squared\* | Fisher's exact or Chi-squared\* | — | | Ordinal (`force_ordinal`) | Wilcoxon rank-sum | Kruskal-Wallis | Dunn's + Holm | \*Fisher's exact is used when any expected cell count is < 5 (Cochran criterion). If the exact algorithm cannot complete (workspace limit exceeded for large tables), Fisher's exact with Monte Carlo simulation (B = 10,000; seed fixed via `getOption("TernTables.seed")`, default 42) is used automatically. Normality routing uses `consider_normality = "ROBUST"` (the default) — a four-gate decision applied per group: (1) any group n < 3 → non-parametric (conservative fail-safe); (2) absolute skewness > 2 in any group → non-parametric regardless of sample size; (3) all groups n ≥ 30 → parametric via the Central Limit Theorem; (4) otherwise Shapiro-Wilk p > 0.05 in all groups → parametric. For 3+ group comparisons, omnibus *P* values are reported. When `post_hoc = TRUE`, pairwise comparisons are performed automatically for continuous and ordinal variables when omnibus *P* < 0.05, using the test paired to the omnibus (Games-Howell or Dunn's + Holm). CLD superscript letters are appended to cell values; groups sharing a letter are not significantly different. Categorical variables never receive post-hoc testing. `post_hoc` defaults to `FALSE`. Set `consider_normality = TRUE` to use Shapiro-Wilk alone (original behaviour). --- ## Methods Document A methods paragraph is written automatically with every `ternD()` and `ternG()` call (`methods_doc = TRUE` by default), saved to `"TernTables_methods.docx"` in the working directory unless overridden via `methods_filename`. Set `methods_doc = FALSE` to suppress it. `write_methods_doc()` can also be called directly on any saved tibble. Pass `show_test = TRUE` to `ternG()` to populate the `test` column; when present, the paragraph is tailored to only the test types that actually appeared (e.g. omits the t-test sentence if all continuous variables were nonparametric). Without it, standard boilerplate is used. ```{r methods-doc, eval = FALSE} write_methods_doc( tbl = tbl_2group, filename = file.path(out_dir, "Tern_methods.docx") ) ``` --- ## Web Application The full TernTables workflow — preprocessing, descriptive tables, two-group and three-group comparisons, Word export, and methods paragraphs — is available as a **free, no-code web application** at [tern-tables.com](https://tern-tables.com/). No R or package installation is required. The web app is powered by the same TernTables R package described in this vignette; all statistical methods and outputs are identical. The web app is transparent by design. A built-in side panel displays the exact R commands being executed in the background as you work, and the full script can be downloaded at the end of your session. The downloaded script runs as-is in R and produces identical output — making every analysis fully auditable and reproducible. This is suitable for submission to statistical reviewers, inclusion in supplemental materials, or IRB documentation, and provides a natural learning path for researchers who want to transition to scripted R workflows. This repository remains the canonical reference for the underlying implementation. --- ## References Moertel CG, Fleming TR, Macdonald JS, et al. (1990). Levamisole and fluorouracil for adjuvant therapy of resected colon carcinoma. *New England Journal of Medicine*, **322**(6), 352–358.