## Regression tables with huxreg

Huxtable includes the function huxreg to build a table of regressions.

You call huxreg with a list of models. These models can be of any class which has a tidy method defined in the broom package. The method should return a list of regression coefficients with names term, estimate, std.error and p.value. That covers most standard regression packages.

Let’s start by running some regressions to predict a diamond’s price.

data(diamonds, package = "ggplot2")
diamonds <- diamonds[1:100,]

lm1 <- lm(price ~ carat + depth, diamonds)
lm2 <- lm(price ~ depth + factor(color, ordered = FALSE), diamonds)
lm3 <- lm(log(price) ~ carat + depth, diamonds)

Now, we use huxreg to display the regression output side by side.

huxreg(lm1, lm2, lm3)
(1)(2)(3)
(Intercept)981.607    900.067 6.269 ***
(720.175)   (2431.815)(0.782)
carat4328.324 ***     3.531 ***
(136.755)        (0.149)
depth-27.785 *  -6.804 -0.019
(11.656)   (39.293)(0.013)
factor(color, ordered = FALSE)E        449.490
(239.388)
factor(color, ordered = FALSE)F        391.705
(290.880)
factor(color, ordered = FALSE)G        583.111
(308.513)
factor(color, ordered = FALSE)H        126.916
(256.367)
factor(color, ordered = FALSE)I        -47.220
(253.092)
factor(color, ordered = FALSE)J        -123.430
(269.157)
N100        100     100
R20.912    0.123 0.854
logLik-675.703    -790.788 6.822
AIC1359.405    1599.576 -5.644
*** p < 0.001; ** p < 0.01; * p < 0.05.

The basic output includes estimates, standard errors and summary statistics.

Some of those variable names are hard to read. We can change them by providing a named vector of variables in the coefs argument.

color_names <- grep("factor", names(coef(lm2)), value = TRUE)
names(color_names) <- gsub(".*)(.)", "Color: \\1", color_names)

huxreg(lm1, lm2, lm3, coefs = c("Carat" = "carat", "Depth" = "depth", color_names))
(1)(2)(3)
Carat4328.324 ***     3.531 ***
(136.755)        (0.149)
Depth-27.785 *  -6.804 -0.019
(11.656)   (39.293)(0.013)
Color: E        449.490
(239.388)
Color: F        391.705
(290.880)
Color: G        583.111
(308.513)
Color: H        126.916
(256.367)
Color: I        -47.220
(253.092)
Color: J        -123.430
(269.157)
N100        100     100
R20.912    0.123 0.854
logLik-675.703    -790.788 6.822
AIC1359.405    1599.576 -5.644
*** p < 0.001; ** p < 0.01; * p < 0.05.

Or, since the output from huxreg is just a huxtable, we could just edit its contents directly.

diamond_regs <- huxreg(lm1, lm2, lm3)
diamond_regs[seq(8, 18, 2), 1] <- paste("Color:", LETTERS[5:10])

# prints the same as above

Of course, we aren’t limited to just changing names. We can also make our table prettier. Let’s put our footnote in italic, add a caption, and highlight the cell background of significant coefficients. All of these are just standard huxtable commands.

suppressPackageStartupMessages(library(dplyr))

diamond_regs %>%
map_background_color(-1, -1, by_regex(
"\\*" = "yellow"
)) %>%
set_italic(final(1), 1) %>%
set_caption("Linear regressions of diamond prices")
Linear regressions of diamond prices
(1)(2)(3)
(Intercept)981.607    900.067 6.269 ***
(720.175)   (2431.815)(0.782)
carat4328.324 ***     3.531 ***
(136.755)        (0.149)
depth-27.785 *  -6.804 -0.019
(11.656)   (39.293)(0.013)
Color: E        449.490
(239.388)
Color: F        391.705
(290.880)
Color: G        583.111
(308.513)
Color: H        126.916
(256.367)
Color: I        -47.220
(253.092)
Color: J        -123.430
(269.157)
N100        100     100
R20.912    0.123 0.854
logLik-675.703    -790.788 6.822
AIC1359.405    1599.576 -5.644
*** p < 0.001; ** p < 0.01; * p < 0.05.

By default, standard errors are shown below coefficient estimates. To display them in a column to the right, use error_pos = "right":

huxreg(lm1, lm3, error_pos = "right")
(1)(2)
(Intercept)981.607    (720.175)6.269 ***(0.782)
carat4328.324 ***(136.755)3.531 ***(0.149)
depth-27.785 *  (11.656)-0.019    (0.013)
N100             100
R20.912         0.854
logLik-675.703         6.822
AIC1359.405         -5.644
*** p < 0.001; ** p < 0.01; * p < 0.05.

This will give column headings a column span of 2.

To display standard errors in the same cell as estimates, use error_pos = "same":

huxreg(lm1, lm3, error_pos = "same")
(1)(2)
(Intercept)981.607 (720.175)6.269 *** (0.782)
carat4328.324 *** (136.755)3.531 *** (0.149)
depth-27.785 * (11.656)-0.019 (0.013)
N100     100
R20.912 0.854
logLik-675.703 6.822
AIC1359.405 -5.644
*** p < 0.001; ** p < 0.01; * p < 0.05.

You can change the default column headings by naming the model arguments:

huxreg("Price" = lm1, "Log price" = lm3)
PriceLog price
(Intercept)981.607    6.269 ***
(720.175)   (0.782)
carat4328.324 ***3.531 ***
(136.755)   (0.149)
depth-27.785 *  -0.019
(11.656)   (0.013)
N100        100
R20.912    0.854
logLik-675.703    6.822
AIC1359.405    -5.644
*** p < 0.001; ** p < 0.01; * p < 0.05.

To display a particular row of summary statistics, use the statistics parameter. This should be a character vector. Valid values are anything returned from your models by broom::glance:

gl <- as_hux(broom::glance(lm1))

gl %>%
restack_down(cols = 3, on_remainder = "fill") %>%
set_bold(odds, everywhere)
0.912   0.91    211
statisticp.valuedf
504       5.65e-522
logLikAICBIC
-676       1.36e+031.37e+03
deviancedf.residualnobs
4.33e+0697100

Another value you can use is "nobs", which returns the number of observations from the regression. If the statistics vector has names, these will be used for row headings:

huxreg(lm1, lm3, statistics = c("N. obs." = "nobs",
"R squared" = "r.squared", "F statistic" = "statistic",
"P value" = "p.value"))
(1)(2)
(Intercept)981.607    6.269 ***
(720.175)   (0.782)
carat4328.324 ***3.531 ***
(136.755)   (0.149)
depth-27.785 *  -0.019
(11.656)   (0.013)
N. obs.100        100
R squared0.912    0.854
F statistic504.082    283.881
P value0.000    0.000
*** p < 0.001; ** p < 0.01; * p < 0.05.

By default, huxreg displays significance stars. You can alter the symbols used and significance levels with the stars parameter, or set stars = NULL to turn off significance stars completely.

huxreg(lm1, lm3, stars = c(* = 0.1, ** = 0.05, *** = 0.01)) # a little boastful?
(1)(2)
(Intercept)981.607    6.269 ***
(720.175)   (0.782)
carat4328.324 ***3.531 ***
(136.755)   (0.149)
depth-27.785 ** -0.019
(11.656)   (0.013)
N100        100
R20.912    0.854
logLik-675.703    6.822
AIC1359.405    -5.644
*** p < 0.01; ** p < 0.05; * p < 0.1.

You aren’t limited to displaying standard errors of the estimates. If you prefer, you can display t statistics or p values, using the error_format option. Any column from tidy can be used by putting it in curly brackets:

# Another useful column: p.value
huxreg(
lm1, lm3,
error_format = "[{statistic}]",
note         = "{stars}. T statistics in brackets."
)
(1)(2)
(Intercept)981.607    6.269 ***
[1.363]   [8.016]
carat4328.324 ***3.531 ***
[31.650]   [23.773]
depth-27.785 *  -0.019
[-2.384]   [-1.499]
N100        100
R20.912    0.854
logLik-675.703    6.822
AIC1359.405    -5.644
*** p < 0.001; ** p < 0.01; * p < 0.05. T statistics in brackets.

Here we also changed the footnote, using note. If note contains the string "{stars}" it will be replaced by a description of the significance stars used. If you don’t want a footnote, just set note = NULL.

Alternatively, you can display confidence intervals. Use ci_level to set the confidence level for the interval, then use {conf.low} and {conf.high} in error_format:

huxreg(lm1, lm3, ci_level = .99, error_format = "({conf.low} -- {conf.high})")
(1)(2)
(Intercept)981.607    6.269 ***
(-910.629 -- 2873.844)   (4.214 -- 8.324)
carat4328.324 ***3.531 ***
(3969.004 -- 4687.643)   (3.140 -- 3.921)
depth-27.785 *  -0.019
(-58.411 -- 2.842)   (-0.052 -- 0.014)
N100        100
R20.912    0.854
logLik-675.703    6.822
AIC1359.405    -5.644
*** p < 0.001; ** p < 0.01; * p < 0.05.

To change number formatting, set the number_format parameter. This works the same as the number_format property for a huxtable - if it is numeric, numbers will be rounded to that many decimal places; if it is character, it will be taken as a format to the base R sprintf function. huxreg tries to be smart and to format summary statistics like nobs as integers.

huxreg(lm1, lm3, number_format = 2)
(1)(2)
(Intercept)981.61    6.27 ***
(720.17)   (0.78)
carat4328.32 ***3.53 ***
(136.75)   (0.15)
depth-27.78 *  -0.02
(11.66)   (0.01)
N100       100
R20.91    0.85
logLik-675.70    6.82
AIC1359.41    -5.64
*** p < 0.001; ** p < 0.01; * p < 0.05.

Lastly, if you want to bold all significant coefficients, set the parameter bold_signif to a maximum significance level:

huxreg(lm1, lm3, bold_signif = 0.05)
(1)(2)
(Intercept)981.607    6.269 ***
(720.175)   (0.782)
carat4328.324 ***3.531 ***
(136.755)   (0.149)
depth-27.785 *  -0.019
(11.656)   (0.013)
N100        100
R20.912    0.854
logLik-675.703    6.822
AIC1359.405    -5.644
*** p < 0.001; ** p < 0.01; * p < 0.05.

## Altering data

Sometimes, you want to report different statistics for a model. For example, you might want to use robust standard errors.

One way to do this is to pass a tidy-able test object into huxreg. The function coeftest in the “lmtest” package has tidy methods defined:

library(lmtest)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
##     as.Date, as.Date.numeric
library(sandwich)
lm_robust <- coeftest(lm1, vcov = vcovHC, save = TRUE)
huxreg("Normal SEs" = lm1, "Robust SEs" = lm_robust)
Normal SEsRobust SEs
(Intercept)981.607    981.607
(720.175)   (1117.654)
carat4328.324 ***4328.324 ***
(136.755)   (293.929)
depth-27.785 *  -27.785
(11.656)   (17.995)
N100        100
R20.912    0.912
logLik-675.703    -675.703
AIC1359.405    1359.405
*** p < 0.001; ** p < 0.01; * p < 0.05.

If that is not possible, you can compute statistics yourself and add them to your model using the tidy_override function:

lm_fixed <- tidy_override(lm1, p.value = c(0.5, 0.2, 0.06))
huxreg("Normal p values" = lm1, "Supplied p values" = lm_fixed)
Normal p valuesSupplied p values
(Intercept)981.607    981.607
(720.175)   (720.175)
carat4328.324 ***4328.324
(136.755)   (136.755)
depth-27.785 *  -27.785
(11.656)   (11.656)
N100        100
R20.912    0.912
logLik-675.703    -675.703
AIC1359.405    1359.405
*** p < 0.001; ** p < 0.01; * p < 0.05.

You can override any statistics returned by tidy or glance.

If you want to completely replace the output of tidy, use the tidy_replace() function. For example, here’s how to print different coefficients for a multinomial model.

mnl <- nnet::multinom(gear ~ mpg, mtcars)
## # weights:  9 (4 variable)
## initial  value 35.155593
## iter  10 value 23.131901
## final  value 23.129234
## converged
tidied <- broom::tidy(mnl)
models <- list()
models[["4 gears"]] <- tidy_replace(mnl, tidied[tidied$y.level == 4, ]) models[["5 gears"]] <- tidy_replace(mnl, tidied[tidied$y.level == 5, ])
huxreg(models, statistics = "AIC")
4 gears5 gears
(Intercept)-9.502 **-7.691 *
(3.262)  (3.232)
mpg0.475 **0.358 *
(0.168)  (0.168)
AIC54.258   54.258
*** p < 0.001; ** p < 0.01; * p < 0.05.