Decode Tables

library(dmcognigen)
library(dplyr)

data("dmcognigen_pk_requirements")
requirements <- dmcognigen_pk_requirements %>% 
  select(variable_name, variable_label, format_decode)

data("dmcognigen_cov")
cov <- dmcognigen_cov

data("dmcognigen_pk")
pk <- dmcognigen_pk

Introduction

Decode tables (decode_tbls) are defined based on variable names and their levels and labels. The level is generally a shorthand representation of the label and is ideally numeric. The label is intended to be a more detailed description, or the text to display on outputs like graphs and tables. These decode_tbls can be used to provide specifications to functions like set_decode_factors(), join_decode_labels(), and join_decode_levels().

For example, below is a decode table for a variable RFCAT (Baseline Renal Fx Category). The values on the left-hand-side of the equal sign are the levels and the values on the right-hand-side are the labels.

#> 
#> ── RFCAT ──
#> 
#> 1=Normal Function (>=90 mL/min)
#> 2=Mild Impairment (60-89 mL/min)
#> 3=Moderate Impairment (30-59 mL/min)
#> 4=Severe Impairment (15-29 mL/min)
#> 5=End Stage Disease (<15 mL/min or Dialysis)

Constructing `decode_tbls`

A decode table (decode_tbl) or a collection of decode tables (decode_tbls) can be constructed in 3 main ways:

Based on a character string that describes the levels and labels.
Based on variables in data.
By manually constructing data frame(s).

While decode_tbls can be defined many ways, the primary intention is that read_requirements() is used to define this and other attributes.

Extracting from character strings

This method is expected to be used when variables and their decodes are presented together in a table and is used in read_requirements() to define the "decode_tbls" attribute.

Like for the below variable descriptions:

#> # A tibble: 9 × 3
#>   variable_name variable_label              format_decode                       
#>   <chr>         <chr>                       <chr>                               
#> 1 DVID          Observation Type            "0=Dose\n1=Xanomeline Concentration…
#> 2 EVID          Event ID                    "0=PK or PD measure\n1=Dose\n2=Othe…
#> 3 MDV           Missing Dependent Variable  "0=PK or PD measure\n1=Dose or Othe…
#> 4 BLQFN         BLQ Flag                    "0=No\n1=Yes"                       
#> 5 FED           Fed                         "0=Fasted\n1=Fed"                   
#> 6 RACEN         Race                        "1=White/Caucasian\n2=Black/African…
#> 7 SEXF          Sex                         "0=Male\n1=Female"                  
#> 8 RFCAT         Baseline Renal Fx Category  "1=Normal Function (>=90 mL/min)\n2…
#> 9 NCILIV        Baseline NCI Liver Fx Group "0=Normal Group A\n1=Mild Group B1\…

For this method, the decodes are expected to be of type character. Each entry on a new line defines one level-to-label relationship separated by an equal sign. When an entry contains more than one equal sign, the first equal sign is considered the separator.

#> DVID:
#> 0=Dose
#> 1=Xanomeline Concentration (ug/mL)
#> 
#> EVID:
#> 0=PK or PD measure
#> 1=Dose
#> 2=Other
#> 
#> MDV:
#> 0=PK or PD measure
#> 1=Dose or Other
#> 
#> BLQFN:
#> 0=No
#> 1=Yes
#> 
#> FED:
#> 0=Fasted
#> 1=Fed
#> 
#> RACEN:
#> 1=White/Caucasian
#> 2=Black/African American
#> 3=Asian
#> 4=American Indian or Alaska Native
#> 
#> SEXF:
#> 0=Male
#> 1=Female
#> 
#> RFCAT:
#> 1=Normal Function (>=90 mL/min)
#> 2=Mild Impairment (60-89 mL/min)
#> 3=Moderate Impairment (30-59 mL/min)
#> 4=Severe Impairment (15-29 mL/min)
#> 5=End Stage Disease (<15 mL/min or Dialysis)
#> 
#> NCILIV:
#> 0=Normal Group A
#> 1=Mild Group B1
#> 2=Mild Group B2
#> 3=Moderate Group C
#> 4=Severe Group D

To extract the decode tables from these types of strings:

extract_decode_tbls(
  variable_name = requirements$variable_name,
  decode = requirements$format_decode
)
#> 
#> ── Decode tables ───────────────────────────────────────────────────────────────
#> 
#> ── DVID ──
#> 
#> 0=Dose
#> 1=Xanomeline Concentration (ug/mL)
#> 
#> ── EVID ──
#> 
#> 0=PK or PD measure
#> 1=Dose
#> 2=Other
#> 
#> ── MDV ──
#> 
#> 0=PK or PD measure
#> 1=Dose or Other
#> 
#> ── BLQFN ──
#> 
#> 0=No
#> 1=Yes
#> 
#> ── FED ──
#> 
#> 0=Fasted
#> 1=Fed
#> 
#> ── RACEN ──
#> 
#> 1=White/Caucasian
#> 2=Black/African American
#> 3=Asian
#> 4=American Indian or Alaska Native
#> 
#> ── SEXF ──
#> 
#> 0=Male
#> 1=Female
#> 
#> ── RFCAT ──
#> 
#> 1=Normal Function (>=90 mL/min)
#> 2=Mild Impairment (60-89 mL/min)
#> 3=Moderate Impairment (30-59 mL/min)
#> 4=Severe Impairment (15-29 mL/min)
#> 5=End Stage Disease (<15 mL/min or Dialysis)
#> 
#> ── NCILIV ──
#> 
#> 0=Normal Group A
#> 1=Mild Group B1
#> 2=Mild Group B2
#> 3=Moderate Group C
#> 4=Severe Group D

Extracting from a dataset

This method is expected to be used when a data set already includes level and label variables.

Like RACEN and RACEC below:

cov %>%
  cnt(RACEN, RACEC)
#> # A tibble: 3 × 4
#>   RACEN RACEC                                n n_cumulative
#>   <dbl> <chr>                            <int>        <int>
#> 1     1 White/Caucasian                    230          230
#> 2     2 Black/African American              23          253
#> 3     4 American Indian or Alaska Native     1          254

cov %>%
  extract_decode_tbls_from_data(
    lvl_to_lbl = list(RACEN = "RACEC")
  )
#> 
#> ── Decode tables ───────────────────────────────────────────────────────────────
#> 
#> ── RACEN ──
#> 
#> 1=White/Caucasian
#> 2=Black/African American
#> 4=American Indian or Alaska Native

The lvl_to_lbl argument maps the names of level variables to the names of label variables.

cov %>%
  extract_decode_tbls_from_data(
    lvl_to_lbl = list(
      # map individual variables
      SEXF = "SEXFC",
      RACEN = "RACEC",
      # map other lvl to lbl by removing CD at the end of variable names
      ~ stringr::str_remove(.x, "CD$")
    )
  )
#> 
#> ── Decode tables ───────────────────────────────────────────────────────────────
#> 
#> ── RACEN ──
#> 
#> 1=White/Caucasian
#> 2=Black/African American
#> 4=American Indian or Alaska Native
#> 
#> ── SEXF ──
#> 
#> 0=Male
#> 1=Female
#> 
#> ── ARMCD ──
#> 
#> Pbo=Placebo
#> Xan_Hi=Xanomeline High Dose
#> Xan_Lo=Xanomeline Low Dose
#> 
#> ── ACTARMCD ──
#> 
#> Pbo=Placebo
#> Xan_Hi=Xanomeline High Dose
#> Xan_Lo=Xanomeline Low Dose

Sometimes, more than one representation of a variable is in a data set. Like how this cov data set includes the pair of variables RACEN & RACEC, along with the original source variable RACE.

cov %>%
  extract_decode_tbls_from_data(
    lvl_to_lbl = list(
      RACEN = "RACE"
    )
  )
#> 
#> ── Decode tables ───────────────────────────────────────────────────────────────
#> 
#> ── RACEN ──
#> 
#> 1=WHITE
#> 2=BLACK OR AFRICAN AMERICAN
#> 4=AMERICAN INDIAN OR ALASKA NATIVE

Ideally, the level variable is numeric. But other data types are accepted. One way to review merged content would be to map a label variable to the original source variable it was derived from.

cov %>%
  extract_decode_tbls_from_data(
    lvl_to_lbl = list(
      RACE = "RACEC"
    )
  )
#> 
#> ── Decode tables ───────────────────────────────────────────────────────────────
#> 
#> ── RACE ──
#> 
#> AMERICAN INDIAN OR ALASKA NATIVE=American Indian or Alaska Native
#> BLACK OR AFRICAN AMERICAN=Black/African American
#> WHITE=White/Caucasian

Constructing manually from a data frame

This can be done many different ways. In general, use as_decode_tbls() with a named list of data frames that contain variables var, lvl, and lbl. One example is below, where decodes are defined for multiple variables, then the named list is generated with split().

tibble::tribble(
  ~var, ~lvl, ~lbl,
  "RACEN", 1, "White/Caucasian",
  "RACEN", 2, "Black/African American",
  "RACEN", 3, "Asian",
  "RACEN", 4, "American Indian or Alaska Native",
  
  "SEXF",  0, "Male",
  "SEXF",  1, "Female"
  ) %>% 
  split(~ var) %>% 
  as_decode_tbls()
#> 
#> ── Decode tables ───────────────────────────────────────────────────────────────
#> 
#> ── RACEN ──
#> 
#> 1=White/Caucasian
#> 2=Black/African American
#> 3=Asian
#> 4=American Indian or Alaska Native
#> 
#> ── SEXF ──
#> 
#> 0=Male
#> 1=Female

Incorporating `decode_tbls` as variables in data sets

To demonstrate the automation features of these utilities, consider a data set that contains only numeric variables.

pk_numeric <- pk %>% 
  select(where(is.numeric))

`set_decode_factors()`

Modify or create new factor variables based on decode_tbls or requirements objects. The order of the levels is defined based on the sort order of the level in the decode definition. This is useful for other functions that consider the order of factor levels.

pk_numeric %>% 
  set_decode_factors(requirements) %>% 
  cnt(RACEN, SEXF, n_distinct_vars = ID)
#> ✔ Modified variable `BLQFN` as a factor of `BLQFN`.
#> ✔ Modified variable `DVID` as a factor of `DVID`.
#> ✔ Modified variable `EVID` as a factor of `EVID`.
#> ✔ Modified variable `MDV` as a factor of `MDV`.
#> ✔ Modified variable `NCILIV` as a factor of `NCILIV`.
#> ✔ Modified variable `RACEN` as a factor of `RACEN`.
#> ✔ Modified variable `RFCAT` as a factor of `RFCAT`.
#> ✔ Modified variable `SEXF` as a factor of `SEXF`.
#> # A tibble: 5 × 5
#>   RACEN                            SEXF    n_ID     n n_cumulative
#>   <fct>                            <fct>  <int> <int>        <int>
#> 1 White/Caucasian                  Male     104  1456         1456
#> 2 White/Caucasian                  Female   126  1764         3220
#> 3 Black/African American           Male       6    84         3304
#> 4 Black/African American           Female    17   238         3542
#> 5 American Indian or Alaska Native Male       1    14         3556

Since the resulting variables are factors, they are easy to summarize with across(). This example provides a summary of categorical covariates by ID.

pk_numeric %>% 
  select(all_of(stationary_variables(., ID))) %>% 
  set_decode_factors(requirements) %>% 
  cnt(across(where(is.factor)), n_distinct_vars = ID)
#> ✔ Modified variable `NCILIV` as a factor of `NCILIV`.
#> ✔ Modified variable `RACEN` as a factor of `RACEN`.
#> ✔ Modified variable `RFCAT` as a factor of `RFCAT`.
#> ✔ Modified variable `SEXF` as a factor of `SEXF`.
#> # A tibble: 22 × 7
#>    RACEN           SEXF   RFCAT                  NCILIV  n_ID     n n_cumulative
#>    <fct>           <fct>  <fct>                  <fct>  <int> <int>        <int>
#>  1 White/Caucasian Male   Mild Impairment (60-8… Norma…    26   364          364
#>  2 White/Caucasian Male   Mild Impairment (60-8… Mild …     3    42          406
#>  3 White/Caucasian Male   Mild Impairment (60-8… Mild …     1    14          420
#>  4 White/Caucasian Male   Mild Impairment (60-8… Moder…     1    14          434
#>  5 White/Caucasian Male   Moderate Impairment (… Norma…    65   910         1344
#>  6 White/Caucasian Male   Moderate Impairment (… Mild …     1    14         1358
#>  7 White/Caucasian Male   Moderate Impairment (… Mild …     4    56         1414
#>  8 White/Caucasian Male   NA                     Norma…     3    42         1456
#>  9 White/Caucasian Female Mild Impairment (60-8… Norma…    30   420         1876
#> 10 White/Caucasian Female Mild Impairment (60-8… Mild …     3    42         1918
#> # ℹ 12 more rows

The new_names argument works similarly to the lvl_to_lbl argument in joining functions, but mapping an existing variable to itself is allowed by set_decode_factors().

pk_numeric %>% 
  set_decode_factors(
    decode_tbls = requirements, 
    new_names = list(
      "{var}FCT",
      RACEN = "RACEN", 
      SEXF = "SEXFC" 
    )
  ) %>% 
  cnt(RACEN, SEXFC, RFCATFCT, n_distinct_vars = ID)
#> ✔ Created new variable `BLQFNFCT` as a factor of `BLQFN`.
#> ✔ Created new variable `DVIDFCT` as a factor of `DVID`.
#> ✔ Created new variable `EVIDFCT` as a factor of `EVID`.
#> ✔ Created new variable `MDVFCT` as a factor of `MDV`.
#> ✔ Created new variable `NCILIVFCT` as a factor of `NCILIV`.
#> ✔ Modified variable `RACEN` as a factor of `RACEN`.
#> ✔ Created new variable `RFCATFCT` as a factor of `RFCAT`.
#> ✔ Created new variable `SEXFC` as a factor of `SEXF`.
#> # A tibble: 12 × 6
#>    RACEN                            SEXFC  RFCATFCT      n_ID     n n_cumulative
#>    <fct>                            <fct>  <fct>        <int> <int>        <int>
#>  1 White/Caucasian                  Male   Mild Impair…    31   434          434
#>  2 White/Caucasian                  Male   Moderate Im…    70   980         1414
#>  3 White/Caucasian                  Male   NA               3    42         1456
#>  4 White/Caucasian                  Female Mild Impair…    36   504         1960
#>  5 White/Caucasian                  Female Moderate Im…    84  1176         3136
#>  6 White/Caucasian                  Female NA               6    84         3220
#>  7 Black/African American           Male   Mild Impair…     4    56         3276
#>  8 Black/African American           Male   Moderate Im…     1    14         3290
#>  9 Black/African American           Male   NA               1    14         3304
#> 10 Black/African American           Female Mild Impair…     8   112         3416
#> 11 Black/African American           Female Moderate Im…     9   126         3542
#> 12 American Indian or Alaska Native Male   Mild Impair…     1    14         3556

`join_decode_labels()`

Create new label variables for variables with matching names in data and decode_tbls.

pk_numeric_with_label_vars <- pk_numeric %>% 
  join_decode_labels(requirements)
#> ✔ Joined `BLQFNC` by `BLQFN`.
#> Warning: Missing values for `BLQFNC` where `BLQFN` is: NA
#> 
#> ── BLQFN ──
#> 
#> 0=No
#> 1=Yes
#> NA=NA
#> ✔ Joined `DVIDC` by `DVID`.
#> 
#> ── DVID ──
#> 
#> 0=Dose
#> 1=Xanomeline Concentration (ug/mL)
#> ✔ Joined `EVIDC` by `EVID`.
#> 
#> ── EVID ──
#> 
#> 0=PK or PD measure
#> 1=Dose
#> ✔ Joined `MDVC` by `MDV`.
#> 
#> ── MDV ──
#> 
#> 0=PK or PD measure
#> 1=Dose or Other
#> ✔ Joined `NCILIVC` by `NCILIV`.
#> 
#> ── NCILIV ──
#> 
#> 0=Normal Group A
#> 1=Mild Group B1
#> 2=Mild Group B2
#> 3=Moderate Group C
#> ✔ Joined `RACENC` by `RACEN`.
#> 
#> ── RACEN ──
#> 
#> 1=White/Caucasian
#> 2=Black/African American
#> 4=American Indian or Alaska Native
#> ✔ Joined `RFCATC` by `RFCAT`.
#> Warning: Missing values for `RFCATC` where `RFCAT` is: NA
#> 
#> ── RFCAT ──
#> 
#> 2=Mild Impairment (60-89 mL/min)
#> 3=Moderate Impairment (30-59 mL/min)
#> NA=NA
#> ✔ Joined `SEXFC` by `SEXF`.
#> 
#> ── SEXF ──
#> 
#> 0=Male
#> 1=Female

The default label name will be the level name appended with a C. Or, use lvl_to_lbl to provide a named list that can include other glue specifications or functions to map the level names to the label names. One unnamed element can be included in lvl_to_lbl to provide default behavior.

pk_numeric %>% 
  select(ID, RACEN, SEXF) %>% 
  join_decode_labels(requirements, lvl_to_lbl = list(RACEN = "RACEC", "{var}C")) %>% 
  cnt(RACEN, RACEC, SEXF, SEXFC, n_distinct_vars = ID)
#> ✔ Joined `RACEC` by `RACEN`.
#> 
#> ── RACEN ──
#> 
#> 1=White/Caucasian
#> 2=Black/African American
#> 4=American Indian or Alaska Native
#> ✔ Joined `SEXFC` by `SEXF`.
#> 
#> ── SEXF ──
#> 
#> 0=Male
#> 1=Female
#> # A tibble: 5 × 7
#>   RACEN RACEC                             SEXF SEXFC   n_ID     n n_cumulative
#>   <dbl> <chr>                            <dbl> <chr>  <int> <int>        <int>
#> 1     1 White/Caucasian                      0 Male     104  1456         1456
#> 2     1 White/Caucasian                      1 Female   126  1764         3220
#> 3     2 Black/African American               0 Male       6    84         3304
#> 4     2 Black/African American               1 Female    17   238         3542
#> 5     4 American Indian or Alaska Native     0 Male       1    14         3556

Variables will not be joined if they already exist.

pk %>% 
  select(ID, RACEN, RACEC, SEXF) %>% 
  join_decode_labels(requirements, lvl_to_lbl = list(RACEN = "RACEC", "{var}C")) %>% 
  cnt(RACEN, RACEC, SEXF, SEXFC, n_distinct_vars = ID)
#> Warning: `RACEC` already exists in `.data`. Skipping join for `RACEN`.
#> ✔ Joined `SEXFC` by `SEXF`.
#> 
#> ── SEXF ──
#> 
#> 0=Male
#> 1=Female
#> # A tibble: 5 × 7
#>   RACEN RACEC                             SEXF SEXFC   n_ID     n n_cumulative
#>   <dbl> <chr>                            <dbl> <chr>  <int> <int>        <int>
#> 1     1 White/Caucasian                      0 Male     104  1456         1456
#> 2     1 White/Caucasian                      1 Female   126  1764         3220
#> 3     2 Black/African American               0 Male       6    84         3304
#> 4     2 Black/African American               1 Female    17   238         3542
#> 5     4 American Indian or Alaska Native     0 Male       1    14         3556

`join_decode_levels()`

For scenarios where a label variable is already defined in a data set, the corresponding levels can be joined with join_decode_levels().

The lvl_to_lbl structure is identical whether joining labels or levels.

pk %>% 
  select(ID, RACEC, SEXFC) %>% 
  join_decode_levels(requirements, lvl_to_lbl = list(RACEN = "RACEC", "{var}C")) %>% 
  cnt(RACEN, RACEC, SEXF, SEXFC, n_distinct_vars = ID)
#> ✔ Joined `RACEN` by `RACEC`.
#> 
#> ── RACEN ──
#> 
#> 1=White/Caucasian
#> 2=Black/African American
#> 4=American Indian or Alaska Native
#> ✔ Joined `SEXF` by `SEXFC`.
#> 
#> ── SEXF ──
#> 
#> 0=Male
#> 1=Female
#> # A tibble: 5 × 7
#>   RACEN RACEC                             SEXF SEXFC   n_ID     n n_cumulative
#>   <dbl> <chr>                            <dbl> <chr>  <int> <int>        <int>
#> 1     1 White/Caucasian                      0 Male     104  1456         1456
#> 2     1 White/Caucasian                      1 Female   126  1764         3220
#> 3     2 Black/African American               0 Male       6    84         3304
#> 4     2 Black/African American               1 Female    17   238         3542
#> 5     4 American Indian or Alaska Native     0 Male       1    14         3556

Introduction

Constructing decode_tbls

Extracting from character strings

Extracting from a dataset

Constructing manually from a data frame

Incorporating decode_tbls as variables in data sets

set_decode_factors()

join_decode_labels()

join_decode_levels()

Constructing `decode_tbls`

Incorporating `decode_tbls` as variables in data sets

`set_decode_factors()`

`join_decode_labels()`

`join_decode_levels()`