library(dmcognigen)
library(dplyr)
data("dmcognigen_pk_requirements")
requirements <- dmcognigen_pk_requirements %>% 
  select(variable_name, variable_label, format_decode)
data("dmcognigen_cov")
cov <- dmcognigen_cov
data("dmcognigen_pk")
pk <- dmcognigen_pkIntroduction
Decode tables (decode_tbls) are defined based on
variable names and their levels and labels. The level is generally a
shorthand representation of the label and is ideally numeric. The label
is intended to be a more detailed description, or the text to display on
outputs like graphs and tables. These decode_tbls can be
used to provide specifications to functions like
set_decode_factors(), join_decode_labels(),
and join_decode_levels().
For example, below is a decode table for a variable
RFCAT (Baseline Renal Fx Category). The values on the
left-hand-side of the equal sign are the levels and the values on the
right-hand-side are the labels.
#> 
#> ── RFCAT ──
#> 
#> 1=Normal Function (>=90 mL/min)
#> 2=Mild Impairment (60-89 mL/min)
#> 3=Moderate Impairment (30-59 mL/min)
#> 4=Severe Impairment (15-29 mL/min)
#> 5=End Stage Disease (<15 mL/min or Dialysis)
Constructing decode_tbls
A decode table (decode_tbl) or a collection of decode
tables (decode_tbls) can be constructed in 3 main ways:
- Based on a character string that describes the levels and labels.
 - Based on variables in data.
 - By manually constructing data frame(s).
 
While decode_tbls can be defined many ways, the primary
intention is that read_requirements() is used to define
this and other attributes.
Extracting from character strings
This method is expected to be used when variables and their decodes
are presented together in a table and is used in
read_requirements() to define the
"decode_tbls" attribute.
Like for the below variable descriptions:
#> # A tibble: 9 × 3
#>   variable_name variable_label              format_decode                       
#>   <chr>         <chr>                       <chr>                               
#> 1 DVID          Observation Type            "0=Dose\n1=Xanomeline Concentration…
#> 2 EVID          Event ID                    "0=PK or PD measure\n1=Dose\n2=Othe…
#> 3 MDV           Missing Dependent Variable  "0=PK or PD measure\n1=Dose or Othe…
#> 4 BLQFN         BLQ Flag                    "0=No\n1=Yes"                       
#> 5 FED           Fed                         "0=Fasted\n1=Fed"                   
#> 6 RACEN         Race                        "1=White/Caucasian\n2=Black/African…
#> 7 SEXF          Sex                         "0=Male\n1=Female"                  
#> 8 RFCAT         Baseline Renal Fx Category  "1=Normal Function (>=90 mL/min)\n2…
#> 9 NCILIV        Baseline NCI Liver Fx Group "0=Normal Group A\n1=Mild Group B1\…
For this method, the decodes are expected to be of type
character. Each entry on a new line defines one
level-to-label relationship separated by an equal sign. When an entry
contains more than one equal sign, the first equal sign is considered
the separator.
#> DVID:
#> 0=Dose
#> 1=Xanomeline Concentration (ug/mL)
#> 
#> EVID:
#> 0=PK or PD measure
#> 1=Dose
#> 2=Other
#> 
#> MDV:
#> 0=PK or PD measure
#> 1=Dose or Other
#> 
#> BLQFN:
#> 0=No
#> 1=Yes
#> 
#> FED:
#> 0=Fasted
#> 1=Fed
#> 
#> RACEN:
#> 1=White/Caucasian
#> 2=Black/African American
#> 3=Asian
#> 4=American Indian or Alaska Native
#> 
#> SEXF:
#> 0=Male
#> 1=Female
#> 
#> RFCAT:
#> 1=Normal Function (>=90 mL/min)
#> 2=Mild Impairment (60-89 mL/min)
#> 3=Moderate Impairment (30-59 mL/min)
#> 4=Severe Impairment (15-29 mL/min)
#> 5=End Stage Disease (<15 mL/min or Dialysis)
#> 
#> NCILIV:
#> 0=Normal Group A
#> 1=Mild Group B1
#> 2=Mild Group B2
#> 3=Moderate Group C
#> 4=Severe Group D
To extract the decode tables from these types of strings:
extract_decode_tbls(
  variable_name = requirements$variable_name,
  decode = requirements$format_decode
)
#> 
#> ── Decode tables ───────────────────────────────────────────────────────────────
#> 
#> ── DVID ──
#> 
#> 0=Dose
#> 1=Xanomeline Concentration (ug/mL)
#> 
#> ── EVID ──
#> 
#> 0=PK or PD measure
#> 1=Dose
#> 2=Other
#> 
#> ── MDV ──
#> 
#> 0=PK or PD measure
#> 1=Dose or Other
#> 
#> ── BLQFN ──
#> 
#> 0=No
#> 1=Yes
#> 
#> ── FED ──
#> 
#> 0=Fasted
#> 1=Fed
#> 
#> ── RACEN ──
#> 
#> 1=White/Caucasian
#> 2=Black/African American
#> 3=Asian
#> 4=American Indian or Alaska Native
#> 
#> ── SEXF ──
#> 
#> 0=Male
#> 1=Female
#> 
#> ── RFCAT ──
#> 
#> 1=Normal Function (>=90 mL/min)
#> 2=Mild Impairment (60-89 mL/min)
#> 3=Moderate Impairment (30-59 mL/min)
#> 4=Severe Impairment (15-29 mL/min)
#> 5=End Stage Disease (<15 mL/min or Dialysis)
#> 
#> ── NCILIV ──
#> 
#> 0=Normal Group A
#> 1=Mild Group B1
#> 2=Mild Group B2
#> 3=Moderate Group C
#> 4=Severe Group DExtracting from a dataset
This method is expected to be used when a data set already includes level and label variables.
Like RACEN and RACEC below:
cov %>%
  cnt(RACEN, RACEC)
#> # A tibble: 3 × 4
#>   RACEN RACEC                                n n_cumulative
#>   <dbl> <chr>                            <int>        <int>
#> 1     1 White/Caucasian                    230          230
#> 2     2 Black/African American              23          253
#> 3     4 American Indian or Alaska Native     1          254
cov %>%
  extract_decode_tbls_from_data(
    lvl_to_lbl = list(RACEN = "RACEC")
  )
#> 
#> ── Decode tables ───────────────────────────────────────────────────────────────
#> 
#> ── RACEN ──
#> 
#> 1=White/Caucasian
#> 2=Black/African American
#> 4=American Indian or Alaska NativeThe lvl_to_lbl argument maps the names of level
variables to the names of label variables.
cov %>%
  extract_decode_tbls_from_data(
    lvl_to_lbl = list(
      # map individual variables
      SEXF = "SEXFC",
      RACEN = "RACEC",
      # map other lvl to lbl by removing CD at the end of variable names
      ~ stringr::str_remove(.x, "CD$")
    )
  )
#> 
#> ── Decode tables ───────────────────────────────────────────────────────────────
#> 
#> ── RACEN ──
#> 
#> 1=White/Caucasian
#> 2=Black/African American
#> 4=American Indian or Alaska Native
#> 
#> ── SEXF ──
#> 
#> 0=Male
#> 1=Female
#> 
#> ── ARMCD ──
#> 
#> Pbo=Placebo
#> Xan_Hi=Xanomeline High Dose
#> Xan_Lo=Xanomeline Low Dose
#> 
#> ── ACTARMCD ──
#> 
#> Pbo=Placebo
#> Xan_Hi=Xanomeline High Dose
#> Xan_Lo=Xanomeline Low DoseSometimes, more than one representation of a variable is in a data
set. Like how this cov data set includes the pair of
variables RACEN & RACEC, along with the
original source variable RACE.
cov %>%
  extract_decode_tbls_from_data(
    lvl_to_lbl = list(
      RACEN = "RACE"
    )
  )
#> 
#> ── Decode tables ───────────────────────────────────────────────────────────────
#> 
#> ── RACEN ──
#> 
#> 1=WHITE
#> 2=BLACK OR AFRICAN AMERICAN
#> 4=AMERICAN INDIAN OR ALASKA NATIVEIdeally, the level variable is numeric. But other data types are accepted. One way to review merged content would be to map a label variable to the original source variable it was derived from.
cov %>%
  extract_decode_tbls_from_data(
    lvl_to_lbl = list(
      RACE = "RACEC"
    )
  )
#> 
#> ── Decode tables ───────────────────────────────────────────────────────────────
#> 
#> ── RACE ──
#> 
#> AMERICAN INDIAN OR ALASKA NATIVE=American Indian or Alaska Native
#> BLACK OR AFRICAN AMERICAN=Black/African American
#> WHITE=White/CaucasianConstructing manually from a data frame
This can be done many different ways. In general, use
as_decode_tbls() with a named list of data frames that
contain variables var, lvl, and
lbl. One example is below, where decodes are defined for
multiple variables, then the named list is generated with
split().
tibble::tribble(
  ~var, ~lvl, ~lbl,
  "RACEN", 1, "White/Caucasian",
  "RACEN", 2, "Black/African American",
  "RACEN", 3, "Asian",
  "RACEN", 4, "American Indian or Alaska Native",
  
  "SEXF",  0, "Male",
  "SEXF",  1, "Female"
  ) %>% 
  split(~ var) %>% 
  as_decode_tbls()
#> 
#> ── Decode tables ───────────────────────────────────────────────────────────────
#> 
#> ── RACEN ──
#> 
#> 1=White/Caucasian
#> 2=Black/African American
#> 3=Asian
#> 4=American Indian or Alaska Native
#> 
#> ── SEXF ──
#> 
#> 0=Male
#> 1=FemaleIncorporating decode_tbls as variables in data
sets
To demonstrate the automation features of these utilities, consider a data set that contains only numeric variables.
set_decode_factors()
Modify or create new factor variables based on
decode_tbls or requirements objects. The order
of the levels is defined based on the sort order of the level in the
decode definition. This is useful for other functions that consider the
order of factor levels.
pk_numeric %>% 
  set_decode_factors(requirements) %>% 
  cnt(RACEN, SEXF, n_distinct_vars = ID)
#> ✔ Modified variable `BLQFN` as a factor of `BLQFN`.
#> ✔ Modified variable `DVID` as a factor of `DVID`.
#> ✔ Modified variable `EVID` as a factor of `EVID`.
#> ✔ Modified variable `MDV` as a factor of `MDV`.
#> ✔ Modified variable `NCILIV` as a factor of `NCILIV`.
#> ✔ Modified variable `RACEN` as a factor of `RACEN`.
#> ✔ Modified variable `RFCAT` as a factor of `RFCAT`.
#> ✔ Modified variable `SEXF` as a factor of `SEXF`.
#> # A tibble: 5 × 5
#>   RACEN                            SEXF    n_ID     n n_cumulative
#>   <fct>                            <fct>  <int> <int>        <int>
#> 1 White/Caucasian                  Male     104  1456         1456
#> 2 White/Caucasian                  Female   126  1764         3220
#> 3 Black/African American           Male       6    84         3304
#> 4 Black/African American           Female    17   238         3542
#> 5 American Indian or Alaska Native Male       1    14         3556Since the resulting variables are factors, they are easy to summarize
with across(). This example provides a summary of
categorical covariates by ID.
pk_numeric %>% 
  select(all_of(stationary_variables(., ID))) %>% 
  set_decode_factors(requirements) %>% 
  cnt(across(where(is.factor)), n_distinct_vars = ID)
#> ✔ Modified variable `NCILIV` as a factor of `NCILIV`.
#> ✔ Modified variable `RACEN` as a factor of `RACEN`.
#> ✔ Modified variable `RFCAT` as a factor of `RFCAT`.
#> ✔ Modified variable `SEXF` as a factor of `SEXF`.
#> # A tibble: 22 × 7
#>    RACEN           SEXF   RFCAT                  NCILIV  n_ID     n n_cumulative
#>    <fct>           <fct>  <fct>                  <fct>  <int> <int>        <int>
#>  1 White/Caucasian Male   Mild Impairment (60-8… Norma…    26   364          364
#>  2 White/Caucasian Male   Mild Impairment (60-8… Mild …     3    42          406
#>  3 White/Caucasian Male   Mild Impairment (60-8… Mild …     1    14          420
#>  4 White/Caucasian Male   Mild Impairment (60-8… Moder…     1    14          434
#>  5 White/Caucasian Male   Moderate Impairment (… Norma…    65   910         1344
#>  6 White/Caucasian Male   Moderate Impairment (… Mild …     1    14         1358
#>  7 White/Caucasian Male   Moderate Impairment (… Mild …     4    56         1414
#>  8 White/Caucasian Male   NA                     Norma…     3    42         1456
#>  9 White/Caucasian Female Mild Impairment (60-8… Norma…    30   420         1876
#> 10 White/Caucasian Female Mild Impairment (60-8… Mild …     3    42         1918
#> # ℹ 12 more rowsThe new_names argument works similarly to the
lvl_to_lbl argument in joining functions, but mapping an
existing variable to itself is allowed by
set_decode_factors().
pk_numeric %>% 
  set_decode_factors(
    decode_tbls = requirements, 
    new_names = list(
      "{var}FCT",
      RACEN = "RACEN", 
      SEXF = "SEXFC" 
    )
  ) %>% 
  cnt(RACEN, SEXFC, RFCATFCT, n_distinct_vars = ID)
#> ✔ Created new variable `BLQFNFCT` as a factor of `BLQFN`.
#> ✔ Created new variable `DVIDFCT` as a factor of `DVID`.
#> ✔ Created new variable `EVIDFCT` as a factor of `EVID`.
#> ✔ Created new variable `MDVFCT` as a factor of `MDV`.
#> ✔ Created new variable `NCILIVFCT` as a factor of `NCILIV`.
#> ✔ Modified variable `RACEN` as a factor of `RACEN`.
#> ✔ Created new variable `RFCATFCT` as a factor of `RFCAT`.
#> ✔ Created new variable `SEXFC` as a factor of `SEXF`.
#> # A tibble: 12 × 6
#>    RACEN                            SEXFC  RFCATFCT      n_ID     n n_cumulative
#>    <fct>                            <fct>  <fct>        <int> <int>        <int>
#>  1 White/Caucasian                  Male   Mild Impair…    31   434          434
#>  2 White/Caucasian                  Male   Moderate Im…    70   980         1414
#>  3 White/Caucasian                  Male   NA               3    42         1456
#>  4 White/Caucasian                  Female Mild Impair…    36   504         1960
#>  5 White/Caucasian                  Female Moderate Im…    84  1176         3136
#>  6 White/Caucasian                  Female NA               6    84         3220
#>  7 Black/African American           Male   Mild Impair…     4    56         3276
#>  8 Black/African American           Male   Moderate Im…     1    14         3290
#>  9 Black/African American           Male   NA               1    14         3304
#> 10 Black/African American           Female Mild Impair…     8   112         3416
#> 11 Black/African American           Female Moderate Im…     9   126         3542
#> 12 American Indian or Alaska Native Male   Mild Impair…     1    14         3556
join_decode_labels()
Create new label variables for variables with matching names in data
and decode_tbls.
pk_numeric_with_label_vars <- pk_numeric %>% 
  join_decode_labels(requirements)
#> ✔ Joined `BLQFNC` by `BLQFN`.
#> Warning: Missing values for `BLQFNC` where `BLQFN` is: NA
#> 
#> ── BLQFN ──
#> 
#> 0=No
#> 1=Yes
#> NA=NA
#> ✔ Joined `DVIDC` by `DVID`.
#> 
#> ── DVID ──
#> 
#> 0=Dose
#> 1=Xanomeline Concentration (ug/mL)
#> ✔ Joined `EVIDC` by `EVID`.
#> 
#> ── EVID ──
#> 
#> 0=PK or PD measure
#> 1=Dose
#> ✔ Joined `MDVC` by `MDV`.
#> 
#> ── MDV ──
#> 
#> 0=PK or PD measure
#> 1=Dose or Other
#> ✔ Joined `NCILIVC` by `NCILIV`.
#> 
#> ── NCILIV ──
#> 
#> 0=Normal Group A
#> 1=Mild Group B1
#> 2=Mild Group B2
#> 3=Moderate Group C
#> ✔ Joined `RACENC` by `RACEN`.
#> 
#> ── RACEN ──
#> 
#> 1=White/Caucasian
#> 2=Black/African American
#> 4=American Indian or Alaska Native
#> ✔ Joined `RFCATC` by `RFCAT`.
#> Warning: Missing values for `RFCATC` where `RFCAT` is: NA
#> 
#> ── RFCAT ──
#> 
#> 2=Mild Impairment (60-89 mL/min)
#> 3=Moderate Impairment (30-59 mL/min)
#> NA=NA
#> ✔ Joined `SEXFC` by `SEXF`.
#> 
#> ── SEXF ──
#> 
#> 0=Male
#> 1=FemaleThe default label name will be the level name appended with a C. Or,
use lvl_to_lbl to provide a named list that can include
other glue specifications or functions to map the level names to the
label names. One unnamed element can be included in
lvl_to_lbl to provide default behavior.
pk_numeric %>% 
  select(ID, RACEN, SEXF) %>% 
  join_decode_labels(requirements, lvl_to_lbl = list(RACEN = "RACEC", "{var}C")) %>% 
  cnt(RACEN, RACEC, SEXF, SEXFC, n_distinct_vars = ID)
#> ✔ Joined `RACEC` by `RACEN`.
#> 
#> ── RACEN ──
#> 
#> 1=White/Caucasian
#> 2=Black/African American
#> 4=American Indian or Alaska Native
#> ✔ Joined `SEXFC` by `SEXF`.
#> 
#> ── SEXF ──
#> 
#> 0=Male
#> 1=Female
#> # A tibble: 5 × 7
#>   RACEN RACEC                             SEXF SEXFC   n_ID     n n_cumulative
#>   <dbl> <chr>                            <dbl> <chr>  <int> <int>        <int>
#> 1     1 White/Caucasian                      0 Male     104  1456         1456
#> 2     1 White/Caucasian                      1 Female   126  1764         3220
#> 3     2 Black/African American               0 Male       6    84         3304
#> 4     2 Black/African American               1 Female    17   238         3542
#> 5     4 American Indian or Alaska Native     0 Male       1    14         3556Variables will not be joined if they already exist.
pk %>% 
  select(ID, RACEN, RACEC, SEXF) %>% 
  join_decode_labels(requirements, lvl_to_lbl = list(RACEN = "RACEC", "{var}C")) %>% 
  cnt(RACEN, RACEC, SEXF, SEXFC, n_distinct_vars = ID)
#> Warning: `RACEC` already exists in `.data`. Skipping join for `RACEN`.
#> ✔ Joined `SEXFC` by `SEXF`.
#> 
#> ── SEXF ──
#> 
#> 0=Male
#> 1=Female
#> # A tibble: 5 × 7
#>   RACEN RACEC                             SEXF SEXFC   n_ID     n n_cumulative
#>   <dbl> <chr>                            <dbl> <chr>  <int> <int>        <int>
#> 1     1 White/Caucasian                      0 Male     104  1456         1456
#> 2     1 White/Caucasian                      1 Female   126  1764         3220
#> 3     2 Black/African American               0 Male       6    84         3304
#> 4     2 Black/African American               1 Female    17   238         3542
#> 5     4 American Indian or Alaska Native     0 Male       1    14         3556
join_decode_levels()
For scenarios where a label variable is already defined in a data
set, the corresponding levels can be joined with
join_decode_levels().
The lvl_to_lbl structure is identical whether joining
labels or levels.
pk %>% 
  select(ID, RACEC, SEXFC) %>% 
  join_decode_levels(requirements, lvl_to_lbl = list(RACEN = "RACEC", "{var}C")) %>% 
  cnt(RACEN, RACEC, SEXF, SEXFC, n_distinct_vars = ID)
#> ✔ Joined `RACEN` by `RACEC`.
#> 
#> ── RACEN ──
#> 
#> 1=White/Caucasian
#> 2=Black/African American
#> 4=American Indian or Alaska Native
#> ✔ Joined `SEXF` by `SEXFC`.
#> 
#> ── SEXF ──
#> 
#> 0=Male
#> 1=Female
#> # A tibble: 5 × 7
#>   RACEN RACEC                             SEXF SEXFC   n_ID     n n_cumulative
#>   <dbl> <chr>                            <dbl> <chr>  <int> <int>        <int>
#> 1     1 White/Caucasian                      0 Male     104  1456         1456
#> 2     1 White/Caucasian                      1 Female   126  1764         3220
#> 3     2 Black/African American               0 Male       6    84         3304
#> 4     2 Black/African American               1 Female    17   238         3542
#> 5     4 American Indian or Alaska Native     0 Male       1    14         3556