Definition of a gtsummary Object

This vignette is meant for those who wish to contribute to {gtsummary}, or users who wish to gain an understanding of the inner-workings of a {gtsummary} object so they may more easily modify them to suit your own needs. If this does not describe you, please refer to the {gtsummary} website to an introduction on how to use the package’s functions and tutorials on advanced use.

Introduction

Every {gtsummary} object has a few characteristics common among all objects. Here, we review those characteristics, and provide instructions on how to construct a {gtsummary} object.

library(gtsummary)
library(purrr); library(dplyr); library(tibble)
#> Warning: package 'tibble' was built under R version 4.0.3

tbl_regression_ex <-
  lm(age ~ grade + marker, trial) %>%
  tbl_regression() %>%
  bold_p(t = 0.5) 

tbl_summary_ex <-
  trial %>%
  select(trt, age, grade, response) %>%
  tbl_summary(by = trt)

Structure of a {gtsummary} object

Every {gtsummary} object is a list comprising of, at minimum, these elements:

.$table_body    .$table_header         

table_body

The .$table_body object is the data frame that will ultimately be printed as the output. The table must include columns "label", "row_type", and "variable". The "label" column is printed, and the other two are hidden from the final output.

tbl_summary_ex$table_body
#> # A tibble: 8 x 8
#>   variable var_type   var_class var_label   row_type label     stat_1   stat_2  
#>   <chr>    <chr>      <chr>     <chr>       <chr>    <chr>     <chr>    <chr>   
#> 1 age      continuous numeric   Age         label    Age       46 (37,~ 48 (39,~
#> 2 age      continuous numeric   Age         missing  Unknown   7        4       
#> 3 grade    categoric~ factor    Grade       label    Grade     <NA>     <NA>    
#> 4 grade    categoric~ factor    Grade       level    I         35 (36%) 33 (32%)
#> 5 grade    categoric~ factor    Grade       level    II        32 (33%) 36 (35%)
#> 6 grade    categoric~ factor    Grade       level    III       31 (32%) 33 (32%)
#> 7 response dichotomo~ integer   Tumor Resp~ label    Tumor Re~ 28 (29%) 33 (34%)
#> 8 response dichotomo~ integer   Tumor Resp~ missing  Unknown   3        4

table_header

The .$table_header object is a data frame containing information about each of the columns in .$table_body (one row per column in .$table_body). The table header has the following columns:

Column Description
column Column name from table_body
label Label that will be displayed (if column is displayed in output)
hide Logical indicating whether the column is hidden in the output
align Specifies the alignment/justification of the column, e.g. ‘center’ or ‘left’
missing_emdash Indicates the rows to include an em-dash for missing cells. For example row_ref == TRUE in tbl_regression()
indent String of R code that results in a logical vector that specifies rows to indent, e.g. row_type != 'label'
text_interpret the {gt} function that is used to interpret the column label
bold For columns that bold rows conditionally, the column includes a string of R code that results in a logical vector indicating the rows to bold For example, row_type == 'label'
italic For columns that italicize rows conditionally, the column includes a string of R code that results in a logical vector indicating the rows to italicize. For example, row_type == 'label'
fmt_fun If the column needs to be formatted, this list column contains the function that performs the formatting. Note, this is the function object; not the character name of a function.
footnote_abbrev Lists the abbreviation footnotes for a table. All abbreviation footnotes are collated into a single footnote. For example, ‘OR = Odds Ratio’ and ‘CI = Confidence Interval’ appear in a single footnote.
footnote Lists the footnotes that will appear for each column.
spanning_header Includes text printed above columns as spanning headers. See tbl_merge(...)$table_header output for example of use.

NOTE: Columns ‘hide’, ‘align’, ‘missing_emdash’, ‘indent’, ‘bold’, and ‘italic’ MUST follow the tidyverse style guidelines and include spaces around any variable names, e.g. row_type == 'label' (NOT row_type=='label').

Example from tbl_regression()

tbl_regression_ex$table_header
#> # A tibble: 20 x 13
#>    column label hide  align missing_emdash indent text_interpret bold  italic
#>    <chr>  <chr> <lgl> <chr> <chr>          <chr>  <chr>          <chr> <chr> 
#>  1 varia~ vari~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#>  2 var_l~ var_~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#>  3 var_t~ var_~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#>  4 refer~ refe~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#>  5 row_t~ row_~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#>  6 label  **Ch~ FALSE left  <NA>           row_t~ gt::md         <NA>  <NA>  
#>  7 N      **N** TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#>  8 term   term  TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#>  9 var_c~ var_~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 10 var_n~ var_~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 11 heade~ head~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 12 contr~ cont~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 13 contr~ cont~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 14 estim~ **Be~ FALSE cent~ reference_row~ <NA>   gt::md         <NA>  <NA>  
#> 15 std.e~ **SE~ TRUE  cent~ reference_row~ <NA>   gt::md         <NA>  <NA>  
#> 16 stati~ **St~ TRUE  cent~ reference_row~ <NA>   gt::md         <NA>  <NA>  
#> 17 conf.~ conf~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 18 conf.~ conf~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 19 ci     **95~ FALSE cent~ reference_row~ <NA>   gt::md         <NA>  <NA>  
#> 20 p.val~ **p-~ FALSE cent~ <NA>           <NA>   gt::md         p.va~ <NA>  
#> # ... with 4 more variables: fmt_fun <list>, footnote_abbrev <chr>,
#> #   footnote <chr>, spanning_header <chr>

Constructing a {gtsummary} object

table_body

When constructing a {gtsummary} object, the author will begin with the .$table_body object. Recall the .$table_body data frame must include columns "label", "row_type", and "variable". Of these columns, only the "label" column will be printed with the final results. The "row_type" column typically will control whether or not the label column is indented. The "variable" column is often used in the inline_text() family of functions, and merging {gtsummary} tables with tbl_merge().

tbl_regression_ex %>%
  pluck("table_body") %>%
  select(variable, row_type, label)
#> # A tibble: 5 x 3
#>   variable row_type label               
#>   <chr>    <chr>    <chr>               
#> 1 grade    label    Grade               
#> 2 grade    level    I                   
#> 3 grade    level    II                  
#> 4 grade    level    III                 
#> 5 marker   label    Marker Level (ng/mL)

The other columns in .$table_body are created by the user and are likely printed in the output. Formatting and printing instructions for these columns is stored in .$table_header.

table_header

The .$table_header has one row for every column in .$table_body containing instructions how to format each column, the column headers, and more. There are a few internal {gtsummary} functions to assist in constructing and modifying a .$table_header data frame.

First is the table_header_fill_missing() function. This function ensures .$table_header contains a row for every column of .$table_body. If a column does not exist, it is populated with appropriate default values.

gtsummary:::table_header_fill_missing(
  table_header = tibble(column = names(tbl_regression_ex$table_body))
) 
#> # A tibble: 20 x 13
#>    column label hide  align missing_emdash indent text_interpret bold  italic
#>    <chr>  <chr> <lgl> <chr> <chr>          <chr>  <chr>          <chr> <chr> 
#>  1 varia~ vari~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#>  2 var_l~ var_~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#>  3 var_t~ var_~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#>  4 refer~ refe~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#>  5 row_t~ row_~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#>  6 label  label TRUE  left  <NA>           row_t~ gt::md         <NA>  <NA>  
#>  7 N      N     TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#>  8 term   term  TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#>  9 var_c~ var_~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 10 var_n~ var_~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 11 heade~ head~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 12 contr~ cont~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 13 contr~ cont~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 14 estim~ esti~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 15 std.e~ std.~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 16 stati~ stat~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 17 conf.~ conf~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 18 conf.~ conf~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 19 ci     ci    TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> 20 p.val~ p.va~ TRUE  cent~ <NA>           <NA>   gt::md         <NA>  <NA>  
#> # ... with 4 more variables: fmt_fun <list>, footnote_abbrev <chr>,
#> #   footnote <chr>, spanning_header <chr>

The modify_header_internal() is useful for assigning column headers. The function accepts a complete {gtsummary} object as its input, and returns an updated version where the column labels have been added to .$table_header. The function also switches the default .$table_header$hide from TRUE to FALSE, resulting in column with labels being printed.

Printing a {gtsummary} object

All {gtsummary} objects are printed with print.gtsummary(). But before a {gtsummary} object is printed, it is converted to a {gt} object using as_gt(). This function takes the {gtsummary} object as its sole input, and uses the information in .$table_header to construct a list of {gt} calls that will be executed on .$table_body. After the {gtsummary} object is converted to {gt}, it is then printed as any other {gt} object.

In some cases, the package defaults to printing with knitr::kable() utilizing the as_kable() function.

While the actual print function is slightly more involved, it is basically this:

print.gtsummary <- function(x) {
  if (getOption("gtsummary.print_engine") == "gt") {
    return(as_gt(x) %>% print())
  }
  else if (getOption("gtsummary.print_engine") == "kable") {
    return(as_kable(x) %>% print())
  }
}