Skip to contents

[Stable] Summarize categorical data and handle output format.


  format = "n_/N_ (pc_%)",
  digits = 0,
  pad_width = 12,
  ncat_max = 10,
  export_raw_values = FALSE



A data.frame, where vf are column names of categorical variables


A character vector


A character string, formatting options.


A numeric. Number of digits for the percentage (passed to interval formatting function).


A numeric. Minimum character length of value output (passed to stringr::str_pad()).


A numeric. How many levels should be allowed for all variables? See details.


A logical. Should the raw values be exported?


A data.frame with columns

  • var the variable name

  • level the level of the variable

  • value the formatted value with possible number of cases n_, number of available cases N_, and percentage pc_, depending on format argument.

  • n_avail the number of cases with available data for this variable.


Many other packages provide tools to summarize data. This one is just the package author's favorite. Important format inputs are

  • n_ number of patients with the categorical variable at said level

  • N_ the first quartile number of patients with an available value for this variable

  • pc_ percentage of n / N

The format argument should contain at least the words "n_", "N_", and optionally "pc_". ncat_max ensures that you didn't provided a continuous variable to desc_facvar(). If you have many levels for one of your variables, set to Inf or high value. Equivalent for continuous data is desc_cont().

See also


df1 <-
    smoke_status = c("smoker", "non-smoker",
           "smoker", "smoker",
           "smoker", "smoker",
   hypertension = c(1, 1, 0, 1, 1, 1, 1),
    age = c(60, 50, 56, 49, 75, 69, 85),
    bmi = c(18, 30, 25, 22, 23, 21, 22)

# Use default formatting
desc_facvar(.data = df1, vf = c("hypertension", "smoke_status"))
#> # A tibble: 4 × 4
#>   var          level      value          n_avail
#>   <chr>        <chr>      <chr>            <int>
#> 1 hypertension 0          " 1/7 (14%)  "       7
#> 2 hypertension 1          " 6/7 (86%)  "       7
#> 3 smoke_status non-smoker " 2/7 (29%)  "       7
#> 4 smoke_status smoker     " 5/7 (71%)  "       7

# Use custom formatting
desc_facvar(.data = df1,
           vf = c("hypertension", "smoke_status"),
           format = "n_ out of N_, pc_%",
           digits = 1)
#> # A tibble: 4 × 4
#>   var          level      value             n_avail
#>   <chr>        <chr>      <chr>               <int>
#> 1 hypertension 0          1 out of 7, 14.3%       7
#> 2 hypertension 1          6 out of 7, 85.7%       7
#> 3 smoke_status non-smoker 2 out of 7, 28.6%       7
#> 4 smoke_status smoker     5 out of 7, 71.4%       7

# You might want to export raw values, to run plotting or
# other formatting functions

desc_facvar(.data = df1,
            vf = c("hypertension", "smoke_status"),
            export_raw_values = TRUE)
#> # A tibble: 4 × 6
#>   var          level      value          n_avail     n    pc
#>   <chr>        <chr>      <chr>            <int> <int> <dbl>
#> 1 hypertension 0          " 1/7 (14%)  "       7     1  14.3
#> 2 hypertension 1          " 6/7 (86%)  "       7     6  85.7
#> 3 smoke_status non-smoker " 2/7 (29%)  "       7     2  28.6
#> 4 smoke_status smoker     " 5/7 (71%)  "       7     5  71.4