Introduction to Clinical Tables with the {gt} Package
{<a href="https://gt.rstudio.com/index.html" target="_blank" rel="noopener noreferrer">gt</a>} is an R package for displaying tables. Designed to bridge the gap between data analysis and publication-quality output, it is perfect for <a href="https://appsilon.com/reproducible-and-reliable-shiny-apps-for-regulatory-submissions/" target="_blank" rel="noopener">generating clinical tables</a> ready for publication.
{gt} introduces a <strong>comprehensive, intuitive syntax for table creation</strong>, allowing users to craft <strong>detailed</strong>, <strong>aesthetically pleasing tables</strong>. It divides table components into the table header, the stub, the column and spanner column labels, the table body, and the table footer. This makes it extremely easy to format the output.
<img class="wp-image-23344 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65e9e6360346b6b157a95bde_06952167_the-parts-of-a-gt-table.webp" alt="The Parts of a gt Table" width="601" height="377" /> <a href="https://gt.rstudio.com/index.html" target="_blank" rel="noopener">The Parts of a gt Table</a>
<blockquote>To discover packages used in clinical trial data analysis, check out our blog post: <a href="https://appsilon.com/pharmaceutical-and-clinical-trial-data-analysis-packages/" target="_blank" rel="noopener">R Programming and Pharmaceutical Data Analysis (Packages for Clinical Trial Data)</a>.</blockquote>
<h3>Table of Contents</h3><ul> <li><strong><a href="#exploring-clinical-tables-with-gt">Exploring Clinical Tables with gt</a></strong></li> <li><strong><a href="#demographics-and-baseline-characteristics-table">Demographics and Baseline Characteristics Table</a></strong></li> <li><strong><a href="#adverse-events-table">Adverse Events Table</a></strong></li> <li><strong><a href="#demographics-and-baseline-characteristics-clinical-table-with-gt">Demographics and Baseline Characteristics Clinical Table with {gt}</a></strong></li> <li><strong><a href="#adverse-events-clinical-table-with-gt">Adverse Events Clinical Table with {gt}</a></strong></li> <li><strong><a href="#summing-up-clinical-tables-with-gt">Summing up Clinical Tables with {gt}</a></strong></li></ul>
<hr />
<h2 id="exploring-clinical-tables-with-gt">Exploring Clinical Tables with gt</h2>
<a href="https://insightsengineering.github.io/chevron/latest-tag/articles/chevron_catalog.html#tables" target="_blank" rel="noopener noreferrer">{chevron} package documentation</a> provides an immense catalogue of standard clinical tables. While numerous formats exist, we will concentrate on two primary types: the Demographics and Baseline Characteristics Table and the Adverse Events Table.
<h3 id="demographics-and-baseline-characteristics-table">Demographics and Baseline Characteristics Table</h3>
<strong>This table format is crucial for displaying key demographic and baseline characteristics of study participants.</strong> It typically includes age, sex, race, and other relevant baseline information that can influence the outcome of the study.
By presenting this data, researchers can ensure that the study population is well-defined and that findings apply to the intended patient groups. It also aids in identifying any imbalances between treatment groups that could affect the results.
<h3 id="adverse-events-table">Adverse Events Table</h3>
The <a href="https://cran.r-project.org/web/packages/sassy/vignettes/sassy-ae.html" target="_blank" rel="noopener">Adverse Events Table</a> is essential for reporting any negative outcomes experienced by participants during the study. It groups adverse events into categories based on the body system affected. This grouping helps in the systematic presentation of adverse events, making it easier for readers to assess the safety profile of a drug or intervention.
Let's create those clinical tables using the {gt} package!
<h2 id="demographics-and-baseline-characteristics-clinical-table-with-gt">Demographics and Baseline Characteristics Clinical Table with {gt}</h2>
For creating a demographic table, we will use the <code>admiral_adsl</code> example data frame from the {<a href="https://pharmaverse.github.io/admiral/index.html" target="_blank" rel="noopener noreferrer">admiral</a>} package.
<pre><code class="language-r">
library(admiral)
library(dplyr)
library(tidyr)
library(purrr)
library(gt) #Version: 0.10.1
<br>admiral_adsl
#> # A tibble: 306 × 50
#> STUDYID USUBJID SUBJID RFSTDTC RFENDTC RFXSTDTC RFXENDTC RFICDTC RFPENDTC
#>
#> 1 CDISCPILOT… 01-701… 1015 2014-0… 2014-0… 2014-01… 2014-07… 2014-07…
#> 2 CDISCPILOT… 01-701… 1023 2012-0… 2012-0… 2012-08… 2012-09… 2013-02…
#> 3 CDISCPILOT… 01-701… 1028 2013-0… 2014-0… 2013-07… 2014-01… 2014-01…
#> 4 CDISCPILOT… 01-701… 1033 2014-0… 2014-0… 2014-03… 2014-03… 2014-09…
#> 5 CDISCPILOT… 01-701… 1034 2014-0… 2014-1… 2014-07… 2014-12… 2014-12…
#> 6 CDISCPILOT… 01-701… 1047 2013-0… 2013-0… 2013-02… 2013-03… 2013-07…
#> 7 CDISCPILOT… 01-701… 1057 2013-12…
#> 8 CDISCPILOT… 01-701… 1097 2014-0… 2014-0… 2014-01… 2014-07… 2014-07…
#> 9 CDISCPILOT… 01-701… 1111 2012-0… 2012-0… 2012-09… 2012-09… 2013-02…
#> 10 CDISCPILOT… 01-701… 1115 2012-1… 2013-0… 2012-11… 2013-01… 2013-05…
#> # ℹ 296 more rows
#> # ℹ 41 more variables: DTHDTC , DTHFL , SITEID , AGE ,
#> # AGEU , SEX , RACE , ETHNIC , ARMCD , ARM ,
#> # ACTARMCD , ACTARM , COUNTRY , DMDTC , DMDY ,
#> # TRT01P , TRT01A , TRTSDTM , TRTSTMF , TRTEDTM ,
#> # TRTETMF , TRTSDT , TRTEDT , TRTDURD , SCRFDT ,
#> # EOSDT , EOSSTT , FRVDT , RANDDT , DTHDT , …
</code></pre>
Let’s clean the data by only including randomized subjects who have taken at least one dose of study medication according to the SAFFL (Safety Population Flag) and improve readability on the SEX and ETHNIC columns. We will also save treatments in the dataset.
<pre><code class="language-r">
safety_subjects <- admiral_adsl |>
filter(SAFFL == "Y") |>
mutate(
SEX = case_when(
SEX == "F" ~ "Female",
SEX == "M" ~ "Male"
),
ETHNIC = stringr::str_to_sentence(ETHNIC)
)
<br>
treatments <- unique(safety_subjects$ACTARM)
</code></pre>
The first step of creating any {gt} table is to summarize the data frame. The goal is to create a data frame such that every row represents a row in the final output, and each column helps us to group, format, or merge information.
In a demographic table, each row is a statistic (rowname_col argument in gt::gt) related to a demographic group (groupname_col argument in gt::gt) across treatments. For each of those rows, we have the value for the statistic and for some of them, we also have helper values that will be displayed alongside the actual value such as percentages and standard deviations.
Let's create two functions for summarising numerical and categorical data in our data. Notice that in both functions, we create separate columns for the helper values we are going to merge with <code>cols_merge</code>.
<pre><code class="language-r">
categorical_summary <- function(categorical_column_name, groupname) { safety_subjects |>
count(ACTARM, .data[[categorical_column_name]], name = "value") |>
group_by(ACTARM) |>
mutate(pct = value / sum(value)) |>
pivot_wider(names_from = ACTARM, values_from = c(value, pct)) |>
rename(rowname = all_of(categorical_column_name)) |>
mutate(groupname = paste0(groupname, ", n (%)"))
}
<br>numerical_summary <- function(numerical_column_name, groupname) {
summary_stats <- safety_subjects |>
group_by(ACTARM) |>
summarise(
n = n(),
`Mean (SD)` = mean(.data[[numerical_column_name]]),
Median = median(.data[[numerical_column_name]]),
`Min - Max` = NA
) |>
pivot_longer(n:`Min - Max`, names_to = "rowname", values_to = "value") |>
pivot_wider(names_from = ACTARM, values_from = value, names_prefix = "value_") # nolint
<br> column_min_max <- safety_subjects |>
group_by(ACTARM) |>
summarise(
min = min(.data[[numerical_column_name]]),
max = max(.data[[numerical_column_name]])
) |>
mutate(rowname = "Min - Max") |>
pivot_wider(names_from = ACTARM, values_from = c(min, max))
<br> column_sd <- safety_subjects |>
group_by(ACTARM) |>
summarise(
sd = sd(.data[[numerical_column_name]])
) |>
mutate(rowname = "Mean (SD)") |>
pivot_wider(names_from = ACTARM, values_from = sd, names_prefix = "sd_")
<br> summary_stats |>
left_join(column_sd, by = "rowname") |>
left_join(column_min_max, by = "rowname") |>
mutate(groupname = groupname)
}
</code></pre>
Now we can create the initial table with gt() function.
<pre><code class="language-r">
gt_data <- categorical_summary("SEX", "Sex") |>
bind_rows(
categorical_summary("AGEGR1", "Age Group"),
categorical_summary("RACEGR1", "Race"),
categorical_summary("ETHNIC", "Ethnicity"),
numerical_summary("AGE", "Age (Years)")
)
<br>initial_table <- gt_data |>
gt(
rowname_col = "rowname",
groupname_col = "groupname"
)
initial_table
</code></pre>
<img class="size-full wp-image-23346" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65e9e6376a5fdc930ab309dc_a09c2361_image3.webp" alt="This table summarizes the demographic breakdown of participants in a clinical trial, comparing placebo with high and low doses of Xanomeline, including details on sex, age, race, and ethnicity distributions." width="1600" height="1290" /> Demographic Distribution in Xanomeline Clinical Trial
Before merging any columns, we first format the individual columns/rows with the <code>fmt_*</code> functions.
<pre><code class="language-r">
formatted_table <- initial_table |>
fmt_percent(
columns = starts_with("pct_")
) |>
fmt_integer(
columns = starts_with("min_")
) |>
fmt_integer(
rows = all_of(c("n", "Median"))
) |>
fmt_integer(
columns = starts_with("value_"),
rows = groupname == "Sex, n (%)"
) |>
fmt_number(
rows = "Mean (SD)",
columns = starts_with("value_"),
decimals = 1
) |>
fmt_number(
columns = starts_with("sd_")
)
formatted_table
</code></pre>
<img class="size-full wp-image-23348" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65e9e637447f2063fd1e0aba_75cde0cd_image4.webp" alt="Detailed demographic data of a clinical trial, showing the distribution of participants by sex, age, race, and ethnicity across placebo and two dosage levels of Xanomeline." width="1600" height="1290" /> Demographic Data of Xanomeline Clinical Trial Participants[/caption]
We can now proceed to merge columns for each treatment. <code>cols_merge_n_pct</code> handles NA values in <code>col_pct</code> automatically by omitting them. To avoid replicating the same code for each treatment, we will use <code>purrr::reduce</code>.
<pre><code class="language-r">
merged_table <- append(list(formatted_table), treatments) |>
reduce(
\(x, treatment) {
x |>
cols_merge_n_pct(
col_n = paste0("value_", treatment),
col_pct = paste0("pct_", treatment),
rows = groupname %in% paste0(c("Sex", "Age Group", "Race", "Ethnicity"), ", n (%)")
)
}
) |>
list() |>
append(treatments) |>
reduce(
\(x, treatment) {
x |>
cols_merge(
rows = "Min - Max",
columns = c(
paste0("value_", treatment), paste0("min_", treatment), paste0("max_", treatment)
),
pattern = "{2} - {3}"
)
}
) |>
list() |>
append(treatments) |>
reduce(
\(x, treatment) {
x |>
cols_merge(
columns = c(paste0("value_", treatment), paste0("sd_", treatment)),
pattern = "{1} ({2})",
rows = "Mean (SD)"
)
}
)
merged_table
</code></pre>
<img class="size-full wp-image-23350" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65e9e638bd925799426a461f_d83a62f7_image2-1.webp" alt="Table detailing the demographic distribution of a clinical trial's participants, showing numbers and percentages for sex, age groups, race, and ethnicity, as well as age-related statistics, for placebo versus high and low drug doses." width="1600" height="1087" /> Clinical Trial Demographic Data Comparison[/caption]
Now, we have a table that has the expected rows and columns; all we have to do is add the title and some styling. 🎨
<pre><code class="language-r">
column_labels <- safety_subjects |>
count(ACTARM) |>
mutate(
label = paste0(ACTARM, " (N = ", n, ")"),
gt_column_name = paste0("value_", ACTARM)
)
<br>column_labels <- setNames(column_labels$label, column_labels$gt_column_name) merged_table |>
tab_header(
title = "Demographic Characteristics",
subtitle = "Safety Population",
) |>
cols_label_with(fn = \(x) column_labels[x]) |>
tab_stub_indent(
rows = everything(),
indent = 5
) |>
opt_align_table_header(align = "left") |>
cols_align(
align = "center",
columns = everything()
)
</code></pre>
And voila! We have our final table!
<img class="size-full wp-image-23352" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65e9e6389f041da2b26468ce_8a208c47_image5.webp" alt="The table provides a breakdown of demographic information and baseline characteristics of participants in a clinical trial, detailing the safety population data for placebo, high dose, and low dose groups of the drug Xanomeline." width="1600" height="1179" /> Clinical Trial Demographics and Baseline Characteristics[/caption]
<blockquote>Discover how Shiny and Quarto are transforming clinical research by exploring our article, "<a href="https://appsilon.com/interactive-clinical-reports-shiny-and-quarto/" target="_blank" rel="noopener">Revolutionizing Clinical Research with Interactive Reports</a>".</blockquote>
<h2 id="adverse-events-clinical-table-with-gt">Adverse Events Clinical Table with {gt}</h2>
For this table, we will use synthetic adae <a href="https://www.cdisc.org/standards/foundational/adam/adam-data-structure-adverse-event-analysis-v1-0" target="_blank" rel="noopener noreferrer">(ADaM-compliant Adverse Event)</a> data from {<a href="https://cran.r-project.org/web/packages/chevron/index.html" target="_blank" rel="noopener noreferrer">chevron</a>} package. We will also calculate the number of subjects for each treatment from the <a href="https://www.cdisc.org/kb/examples/adam-subject-level-analysis-adsl-dataset-80283806" target="_blank" rel="noopener noreferrer">subject-level adsl data</a>.
<pre><code class="language-r">
library(chevron)
library(dplyr)
library(tidyr)
library(gt)
<br>adsl <- syn_data$adsl
adae <- syn_data$adae
<br>number_of_subjects <- adsl |>
count(ARM, name = "number_of_subjects")
<br>number_of_subjects <- setNames(
number_of_subjects$number_of_subjects,
number_of_subjects$ARM
)
</code></pre>
Again, we first need to create the table we will supply to the gt function. Each row will represent the number of people with an event, grouped by AEBODSYS (Body System or Organ Class). For each group, we will have overview rows for displaying “Patients with at least one event" and "Total number of events”. Columns will represent the treatments.
Let's first calculate the number of people and percentages grouped by AEBODSYS. This data will serve as a skeleton for the additional overview rows.
<pre><code class="language-r">
adverse_event_table <- adae |>
group_by(ARM, AEBODSYS, AEDECOD) |>
summarise(
n = n_distinct(USUBJID),
) |>
mutate(
pct = n / number_of_subjects[ARM]
) |>
pivot_wider(names_from = ARM, values_from = c(n, pct)) |>
rename(
groupname = AEBODSYS,
rowname = AEDECOD
) |>
ungroup()
</code></pre>
For the sake of simplicity, let's write a function that calculates the overview rows either for each AEBODSYS or for all data. Then, use this function to calculate both situations and save the results.
<pre><code class="language-r">
adae_summary <- function(adae, by_aebodsys = TRUE) {
group_vars <- c("ARM")
if (by_aebodsys) {
group_vars <- append(group_vars, "AEBODSYS")
}
summarised_data <- adae |>
group_by(across(all_of(group_vars))) |>
summarise(
`Total number of events` = n(),
`Patients with at least one event` = n_distinct(USUBJID)
) |>
pivot_longer(
cols = c(`Patients with at least one event`, `Total number of events`),
names_to = "AEDECOD",
values_to = "n"
) |>
mutate(
pct = ifelse(
AEDECOD == "Patients with at least one event",
n / number_of_subjects[ARM],
NA
)
) |>
pivot_wider(names_from = ARM, values_from = c(n, pct))
<br> if (by_aebodsys) {
summarised_data |>
rename(
groupname = AEBODSYS,
rowname = AEDECOD
)
} else {
summarised_data |>
rename(
rowname = AEDECOD
) |>
mutate(
groupname = ""
)
}
}
<br>aebodsys_summary <- adae_summary(adae)
total_summary <- adae_summary(adae, by_aebodsys = FALSE)
</code></pre>
We will bind summary data frames with the skeleton data frame in the order we want it to appear in the final output.
<pre><code class="language-r">
adverse_event_table <- adverse_event_table |>
bind_rows(aebodsys_summary) |>
arrange(groupname, rowname)
<br>final_table <- total_summary |>
bind_rows(adverse_event_table)
</code></pre>
Now just like we did in the demographics table example, we first format the columns, then, merge them, and finally style the table.
<pre><code class="language-r">
final_table |>
gt(
rowname_col = "rowname",
groupname_col = "groupname"
) |>
fmt_percent(
columns = starts_with("pct"),
decimals = 1
) |>
fmt_integer(
columns = starts_with("n_"),
) |>
cols_merge_n_pct(
col_n = "n_A: Drug X",
col_pct = "pct_A: Drug X"
) |>
cols_merge_n_pct(
col_n = "n_B: Placebo",
col_pct = "pct_B: Placebo"
) |>
cols_merge_n_pct(
col_n = "n_C: Combination",
col_pct = "pct_C: Combination"
) |>
cols_label_with(
fn = \(x) {
treatment <- stringr::str_remove(x, "n_") paste0(treatment, " (N=", number_of_subjects[treatment], ")") } ) |>
tab_stub_indent(
rows = everything(),
indent = 5
) |>
opt_align_table_header(align = "left") |>
cols_align(
align = "center",
columns = everything()
) |>
tab_header(
title = "Adverse Events Table"
)
</code></pre>
<img class="wp-image-23354 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65e9e639dc84ffc94c0f6691_eece02bb_image6.webp" alt="The table provides a comparative overview of the incidence and number of adverse events recorded in a clinical trial across different treatment groups: Drug X, Placebo, and a Combination of treatments." width="1212" height="1600" /> Adverse Events in Clinical Trial
<blockquote>Elevate your regulatory submissions; delve into our guide '<a href="https://appsilon.com/reproducible-and-reliable-shiny-apps-for-regulatory-submissions/" target="_blank" rel="noopener">Reproducible and Reliable Shiny Apps for Regulatory Submissions</a>.</blockquote>
<h2 id="summing-up-clinical-tables-with-gt">Summing up Clinical Tables with {gt}</h2>
The {gt} package is essential for creating detailed and visually appealing tables in clinical trials, aiding in the clear and effective communication of complex data to researchers, clinicians, and stakeholders.
<strong>The workflow for creating clinical tables with gt is:</strong>
<ul> <li>Preparing the skeleton of the final output by summarizing and cleaning the data.</li> <li>Creating the initial table with <code>gt</code> function and specifying the <code>rowname_col</code> and <code>groupname_col</code> arguments.</li> <li>Formatting columns with <code>fmt_*</code> functions.</li> <li>Merging columns with <code>cols_merge*</code> functions.</li> <li>Adding final touches by renaming original column names and styling the table.</li></ul>
<a href="https://www.r-consortium.org/all-projects/tables-in-clinical-trials-with-r" target="_blank" rel="noopener noreferrer">The R Consortium</a> and <a href="https://gt.rstudio.com/articles/case-study-clinical-tables.html" target="_blank" rel="noopener noreferrer">{gt} documentation</a> also provide valuable examples and information on clinical tables. I encourage you to experiment with them as well to get familiar with <a href="https://appsilon.com/fda-clinical-trial-submissions-with-r-shiny-rhino/" target="_blank" rel="noopener">the topic</a>.
<blockquote>We're looking forward to seeing you at <a href="https://www.linkedin.com/posts/appsilon_bonus-download-our-free-ebook-level-up-activity-7163899334701301760-VmdR?utm_source=share&utm_medium=member_desktop" target="_blank" rel="noopener">PHUSE US Connect 2024</a>! We will be at Booth 17; join us for engaging conversations and connections with our team.</blockquote>
<a href="https://explore.appsilon.com/workshop?utm_medium=referral&utm_source=blog&utm_campaign=levelup" target="_blank" rel="noopener"><img class="aligncenter wp-image-23154 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65e9e6279f041da2b2645342_572728fc_Banner_Ebook.webp" alt="Banner for the Level Up Your R/Shiny Teams Skills ebook. " width="1070" height="447" /></a>