library(dmcognigen)
library(dplyr)
data("dmcognigen_pk_requirements")
requirements <- dmcognigen_pk_requirements %>%
select(variable_name, variable_label, format_decode)
data("dmcognigen_cov")
cov <- dmcognigen_cov
data("dmcognigen_pk")
pk <- dmcognigen_pk
Introduction
Decode tables (decode_tbls
) are defined based on
variable names and their levels and labels. The level is generally a
shorthand representation of the label and is ideally numeric. The label
is intended to be a more detailed description, or the text to display on
outputs like graphs and tables. These decode_tbls
can be
used to provide specifications to functions like
set_decode_factors()
, join_decode_labels()
,
and join_decode_levels()
.
For example, below is a decode table for a variable
RFCAT
(Baseline Renal Fx Category). The values on the
left-hand-side of the equal sign are the levels and the values on the
right-hand-side are the labels.
#>
#> ── RFCAT ──
#>
#> 1=Normal Function (>=90 mL/min)
#> 2=Mild Impairment (60-89 mL/min)
#> 3=Moderate Impairment (30-59 mL/min)
#> 4=Severe Impairment (15-29 mL/min)
#> 5=End Stage Disease (<15 mL/min or Dialysis)
Constructing decode_tbls
A decode table (decode_tbl
) or a collection of decode
tables (decode_tbls
) can be constructed in 3 main ways:
- Based on a character string that describes the levels and labels.
- Based on variables in data.
- By manually constructing data frame(s).
While decode_tbls
can be defined many ways, the primary
intention is that read_requirements()
is used to define
this and other attributes.
Extracting from character strings
This method is expected to be used when variables and their decodes
are presented together in a table and is used in
read_requirements()
to define the
"decode_tbls"
attribute.
Like for the below variable descriptions:
#> # A tibble: 9 × 3
#> variable_name variable_label format_decode
#> <chr> <chr> <chr>
#> 1 DVID Observation Type "0=Dose\n1=Xanomeline Concentration…
#> 2 EVID Event ID "0=PK or PD measure\n1=Dose\n2=Othe…
#> 3 MDV Missing Dependent Variable "0=PK or PD measure\n1=Dose or Othe…
#> 4 BLQFN BLQ Flag "0=No\n1=Yes"
#> 5 FED Fed "0=Fasted\n1=Fed"
#> 6 RACEN Race "1=White/Caucasian\n2=Black/African…
#> 7 SEXF Sex "0=Male\n1=Female"
#> 8 RFCAT Baseline Renal Fx Category "1=Normal Function (>=90 mL/min)\n2…
#> 9 NCILIV Baseline NCI Liver Fx Group "0=Normal Group A\n1=Mild Group B1\…
For this method, the decodes are expected to be of type
character
. Each entry on a new line defines one
level-to-label relationship separated by an equal sign. When an entry
contains more than one equal sign, the first equal sign is considered
the separator.
#> DVID:
#> 0=Dose
#> 1=Xanomeline Concentration (ug/mL)
#>
#> EVID:
#> 0=PK or PD measure
#> 1=Dose
#> 2=Other
#>
#> MDV:
#> 0=PK or PD measure
#> 1=Dose or Other
#>
#> BLQFN:
#> 0=No
#> 1=Yes
#>
#> FED:
#> 0=Fasted
#> 1=Fed
#>
#> RACEN:
#> 1=White/Caucasian
#> 2=Black/African American
#> 3=Asian
#> 4=American Indian or Alaska Native
#>
#> SEXF:
#> 0=Male
#> 1=Female
#>
#> RFCAT:
#> 1=Normal Function (>=90 mL/min)
#> 2=Mild Impairment (60-89 mL/min)
#> 3=Moderate Impairment (30-59 mL/min)
#> 4=Severe Impairment (15-29 mL/min)
#> 5=End Stage Disease (<15 mL/min or Dialysis)
#>
#> NCILIV:
#> 0=Normal Group A
#> 1=Mild Group B1
#> 2=Mild Group B2
#> 3=Moderate Group C
#> 4=Severe Group D
To extract the decode tables from these types of strings:
extract_decode_tbls(
variable_name = requirements$variable_name,
decode = requirements$format_decode
)
#>
#> ── Decode tables ───────────────────────────────────────────────────────────────
#>
#> ── DVID ──
#>
#> 0=Dose
#> 1=Xanomeline Concentration (ug/mL)
#>
#> ── EVID ──
#>
#> 0=PK or PD measure
#> 1=Dose
#> 2=Other
#>
#> ── MDV ──
#>
#> 0=PK or PD measure
#> 1=Dose or Other
#>
#> ── BLQFN ──
#>
#> 0=No
#> 1=Yes
#>
#> ── FED ──
#>
#> 0=Fasted
#> 1=Fed
#>
#> ── RACEN ──
#>
#> 1=White/Caucasian
#> 2=Black/African American
#> 3=Asian
#> 4=American Indian or Alaska Native
#>
#> ── SEXF ──
#>
#> 0=Male
#> 1=Female
#>
#> ── RFCAT ──
#>
#> 1=Normal Function (>=90 mL/min)
#> 2=Mild Impairment (60-89 mL/min)
#> 3=Moderate Impairment (30-59 mL/min)
#> 4=Severe Impairment (15-29 mL/min)
#> 5=End Stage Disease (<15 mL/min or Dialysis)
#>
#> ── NCILIV ──
#>
#> 0=Normal Group A
#> 1=Mild Group B1
#> 2=Mild Group B2
#> 3=Moderate Group C
#> 4=Severe Group D
Extracting from a dataset
This method is expected to be used when a data set already includes level and label variables.
Like RACEN
and RACEC
below:
cov %>%
cnt(RACEN, RACEC)
#> # A tibble: 3 × 4
#> RACEN RACEC n n_cumulative
#> <dbl> <chr> <int> <int>
#> 1 1 White/Caucasian 230 230
#> 2 2 Black/African American 23 253
#> 3 4 American Indian or Alaska Native 1 254
cov %>%
extract_decode_tbls_from_data(
lvl_to_lbl = list(RACEN = "RACEC")
)
#>
#> ── Decode tables ───────────────────────────────────────────────────────────────
#>
#> ── RACEN ──
#>
#> 1=White/Caucasian
#> 2=Black/African American
#> 4=American Indian or Alaska Native
The lvl_to_lbl
argument maps the names of level
variables to the names of label variables.
cov %>%
extract_decode_tbls_from_data(
lvl_to_lbl = list(
# map individual variables
SEXF = "SEXFC",
RACEN = "RACEC",
# map other lvl to lbl by removing CD at the end of variable names
~ stringr::str_remove(.x, "CD$")
)
)
#>
#> ── Decode tables ───────────────────────────────────────────────────────────────
#>
#> ── RACEN ──
#>
#> 1=White/Caucasian
#> 2=Black/African American
#> 4=American Indian or Alaska Native
#>
#> ── SEXF ──
#>
#> 0=Male
#> 1=Female
#>
#> ── ARMCD ──
#>
#> Pbo=Placebo
#> Xan_Hi=Xanomeline High Dose
#> Xan_Lo=Xanomeline Low Dose
#>
#> ── ACTARMCD ──
#>
#> Pbo=Placebo
#> Xan_Hi=Xanomeline High Dose
#> Xan_Lo=Xanomeline Low Dose
Sometimes, more than one representation of a variable is in a data
set. Like how this cov
data set includes the pair of
variables RACEN
& RACEC
, along with the
original source variable RACE
.
cov %>%
extract_decode_tbls_from_data(
lvl_to_lbl = list(
RACEN = "RACE"
)
)
#>
#> ── Decode tables ───────────────────────────────────────────────────────────────
#>
#> ── RACEN ──
#>
#> 1=WHITE
#> 2=BLACK OR AFRICAN AMERICAN
#> 4=AMERICAN INDIAN OR ALASKA NATIVE
Ideally, the level variable is numeric. But other data types are accepted. One way to review merged content would be to map a label variable to the original source variable it was derived from.
cov %>%
extract_decode_tbls_from_data(
lvl_to_lbl = list(
RACE = "RACEC"
)
)
#>
#> ── Decode tables ───────────────────────────────────────────────────────────────
#>
#> ── RACE ──
#>
#> AMERICAN INDIAN OR ALASKA NATIVE=American Indian or Alaska Native
#> BLACK OR AFRICAN AMERICAN=Black/African American
#> WHITE=White/Caucasian
Constructing manually from a data frame
This can be done many different ways. In general, use
as_decode_tbls()
with a named list of data frames that
contain variables var
, lvl
, and
lbl
. One example is below, where decodes are defined for
multiple variables, then the named list is generated with
split()
.
tibble::tribble(
~var, ~lvl, ~lbl,
"RACEN", 1, "White/Caucasian",
"RACEN", 2, "Black/African American",
"RACEN", 3, "Asian",
"RACEN", 4, "American Indian or Alaska Native",
"SEXF", 0, "Male",
"SEXF", 1, "Female"
) %>%
split(~ var) %>%
as_decode_tbls()
#>
#> ── Decode tables ───────────────────────────────────────────────────────────────
#>
#> ── RACEN ──
#>
#> 1=White/Caucasian
#> 2=Black/African American
#> 3=Asian
#> 4=American Indian or Alaska Native
#>
#> ── SEXF ──
#>
#> 0=Male
#> 1=Female
Incorporating decode_tbls
as variables in data
sets
To demonstrate the automation features of these utilities, consider a data set that contains only numeric variables.
set_decode_factors()
Modify or create new factor variables based on
decode_tbls
or requirements
objects. The order
of the levels is defined based on the sort order of the level in the
decode definition. This is useful for other functions that consider the
order of factor levels.
pk_numeric %>%
set_decode_factors(requirements) %>%
cnt(RACEN, SEXF, n_distinct_vars = ID)
#> ✔ Modified variable `BLQFN` as a factor of `BLQFN`.
#> ✔ Modified variable `DVID` as a factor of `DVID`.
#> ✔ Modified variable `EVID` as a factor of `EVID`.
#> ✔ Modified variable `MDV` as a factor of `MDV`.
#> ✔ Modified variable `NCILIV` as a factor of `NCILIV`.
#> ✔ Modified variable `RACEN` as a factor of `RACEN`.
#> ✔ Modified variable `RFCAT` as a factor of `RFCAT`.
#> ✔ Modified variable `SEXF` as a factor of `SEXF`.
#> # A tibble: 5 × 5
#> RACEN SEXF n_ID n n_cumulative
#> <fct> <fct> <int> <int> <int>
#> 1 White/Caucasian Male 104 1456 1456
#> 2 White/Caucasian Female 126 1764 3220
#> 3 Black/African American Male 6 84 3304
#> 4 Black/African American Female 17 238 3542
#> 5 American Indian or Alaska Native Male 1 14 3556
Since the resulting variables are factors, they are easy to summarize
with across()
. This example provides a summary of
categorical covariates by ID
.
pk_numeric %>%
select(all_of(stationary_variables(., ID))) %>%
set_decode_factors(requirements) %>%
cnt(across(where(is.factor)), n_distinct_vars = ID)
#> ✔ Modified variable `NCILIV` as a factor of `NCILIV`.
#> ✔ Modified variable `RACEN` as a factor of `RACEN`.
#> ✔ Modified variable `RFCAT` as a factor of `RFCAT`.
#> ✔ Modified variable `SEXF` as a factor of `SEXF`.
#> # A tibble: 22 × 7
#> RACEN SEXF RFCAT NCILIV n_ID n n_cumulative
#> <fct> <fct> <fct> <fct> <int> <int> <int>
#> 1 White/Caucasian Male Mild Impairment (60-8… Norma… 26 364 364
#> 2 White/Caucasian Male Mild Impairment (60-8… Mild … 3 42 406
#> 3 White/Caucasian Male Mild Impairment (60-8… Mild … 1 14 420
#> 4 White/Caucasian Male Mild Impairment (60-8… Moder… 1 14 434
#> 5 White/Caucasian Male Moderate Impairment (… Norma… 65 910 1344
#> 6 White/Caucasian Male Moderate Impairment (… Mild … 1 14 1358
#> 7 White/Caucasian Male Moderate Impairment (… Mild … 4 56 1414
#> 8 White/Caucasian Male NA Norma… 3 42 1456
#> 9 White/Caucasian Female Mild Impairment (60-8… Norma… 30 420 1876
#> 10 White/Caucasian Female Mild Impairment (60-8… Mild … 3 42 1918
#> # ℹ 12 more rows
The new_names
argument works similarly to the
lvl_to_lbl
argument in joining functions, but mapping an
existing variable to itself is allowed by
set_decode_factors()
.
pk_numeric %>%
set_decode_factors(
decode_tbls = requirements,
new_names = list(
"{var}FCT",
RACEN = "RACEN",
SEXF = "SEXFC"
)
) %>%
cnt(RACEN, SEXFC, RFCATFCT, n_distinct_vars = ID)
#> ✔ Created new variable `BLQFNFCT` as a factor of `BLQFN`.
#> ✔ Created new variable `DVIDFCT` as a factor of `DVID`.
#> ✔ Created new variable `EVIDFCT` as a factor of `EVID`.
#> ✔ Created new variable `MDVFCT` as a factor of `MDV`.
#> ✔ Created new variable `NCILIVFCT` as a factor of `NCILIV`.
#> ✔ Modified variable `RACEN` as a factor of `RACEN`.
#> ✔ Created new variable `RFCATFCT` as a factor of `RFCAT`.
#> ✔ Created new variable `SEXFC` as a factor of `SEXF`.
#> # A tibble: 12 × 6
#> RACEN SEXFC RFCATFCT n_ID n n_cumulative
#> <fct> <fct> <fct> <int> <int> <int>
#> 1 White/Caucasian Male Mild Impair… 31 434 434
#> 2 White/Caucasian Male Moderate Im… 70 980 1414
#> 3 White/Caucasian Male NA 3 42 1456
#> 4 White/Caucasian Female Mild Impair… 36 504 1960
#> 5 White/Caucasian Female Moderate Im… 84 1176 3136
#> 6 White/Caucasian Female NA 6 84 3220
#> 7 Black/African American Male Mild Impair… 4 56 3276
#> 8 Black/African American Male Moderate Im… 1 14 3290
#> 9 Black/African American Male NA 1 14 3304
#> 10 Black/African American Female Mild Impair… 8 112 3416
#> 11 Black/African American Female Moderate Im… 9 126 3542
#> 12 American Indian or Alaska Native Male Mild Impair… 1 14 3556
join_decode_labels()
Create new label variables for variables with matching names in data
and decode_tbls
.
pk_numeric_with_label_vars <- pk_numeric %>%
join_decode_labels(requirements)
#> ✔ Joined `BLQFNC` by `BLQFN`.
#> Warning: Missing values for `BLQFNC` where `BLQFN` is: NA
#>
#> ── BLQFN ──
#>
#> 0=No
#> 1=Yes
#> NA=NA
#> ✔ Joined `DVIDC` by `DVID`.
#>
#> ── DVID ──
#>
#> 0=Dose
#> 1=Xanomeline Concentration (ug/mL)
#> ✔ Joined `EVIDC` by `EVID`.
#>
#> ── EVID ──
#>
#> 0=PK or PD measure
#> 1=Dose
#> ✔ Joined `MDVC` by `MDV`.
#>
#> ── MDV ──
#>
#> 0=PK or PD measure
#> 1=Dose or Other
#> ✔ Joined `NCILIVC` by `NCILIV`.
#>
#> ── NCILIV ──
#>
#> 0=Normal Group A
#> 1=Mild Group B1
#> 2=Mild Group B2
#> 3=Moderate Group C
#> ✔ Joined `RACENC` by `RACEN`.
#>
#> ── RACEN ──
#>
#> 1=White/Caucasian
#> 2=Black/African American
#> 4=American Indian or Alaska Native
#> ✔ Joined `RFCATC` by `RFCAT`.
#> Warning: Missing values for `RFCATC` where `RFCAT` is: NA
#>
#> ── RFCAT ──
#>
#> 2=Mild Impairment (60-89 mL/min)
#> 3=Moderate Impairment (30-59 mL/min)
#> NA=NA
#> ✔ Joined `SEXFC` by `SEXF`.
#>
#> ── SEXF ──
#>
#> 0=Male
#> 1=Female
The default label name will be the level name appended with a C. Or,
use lvl_to_lbl
to provide a named list that can include
other glue specifications or functions to map the level names to the
label names. One unnamed element can be included in
lvl_to_lbl
to provide default behavior.
pk_numeric %>%
select(ID, RACEN, SEXF) %>%
join_decode_labels(requirements, lvl_to_lbl = list(RACEN = "RACEC", "{var}C")) %>%
cnt(RACEN, RACEC, SEXF, SEXFC, n_distinct_vars = ID)
#> ✔ Joined `RACEC` by `RACEN`.
#>
#> ── RACEN ──
#>
#> 1=White/Caucasian
#> 2=Black/African American
#> 4=American Indian or Alaska Native
#> ✔ Joined `SEXFC` by `SEXF`.
#>
#> ── SEXF ──
#>
#> 0=Male
#> 1=Female
#> # A tibble: 5 × 7
#> RACEN RACEC SEXF SEXFC n_ID n n_cumulative
#> <dbl> <chr> <dbl> <chr> <int> <int> <int>
#> 1 1 White/Caucasian 0 Male 104 1456 1456
#> 2 1 White/Caucasian 1 Female 126 1764 3220
#> 3 2 Black/African American 0 Male 6 84 3304
#> 4 2 Black/African American 1 Female 17 238 3542
#> 5 4 American Indian or Alaska Native 0 Male 1 14 3556
Variables will not be joined if they already exist.
pk %>%
select(ID, RACEN, RACEC, SEXF) %>%
join_decode_labels(requirements, lvl_to_lbl = list(RACEN = "RACEC", "{var}C")) %>%
cnt(RACEN, RACEC, SEXF, SEXFC, n_distinct_vars = ID)
#> Warning: `RACEC` already exists in `.data`. Skipping join for `RACEN`.
#> ✔ Joined `SEXFC` by `SEXF`.
#>
#> ── SEXF ──
#>
#> 0=Male
#> 1=Female
#> # A tibble: 5 × 7
#> RACEN RACEC SEXF SEXFC n_ID n n_cumulative
#> <dbl> <chr> <dbl> <chr> <int> <int> <int>
#> 1 1 White/Caucasian 0 Male 104 1456 1456
#> 2 1 White/Caucasian 1 Female 126 1764 3220
#> 3 2 Black/African American 0 Male 6 84 3304
#> 4 2 Black/African American 1 Female 17 238 3542
#> 5 4 American Indian or Alaska Native 0 Male 1 14 3556
join_decode_levels()
For scenarios where a label variable is already defined in a data
set, the corresponding levels can be joined with
join_decode_levels()
.
The lvl_to_lbl
structure is identical whether joining
labels or levels.
pk %>%
select(ID, RACEC, SEXFC) %>%
join_decode_levels(requirements, lvl_to_lbl = list(RACEN = "RACEC", "{var}C")) %>%
cnt(RACEN, RACEC, SEXF, SEXFC, n_distinct_vars = ID)
#> ✔ Joined `RACEN` by `RACEC`.
#>
#> ── RACEN ──
#>
#> 1=White/Caucasian
#> 2=Black/African American
#> 4=American Indian or Alaska Native
#> ✔ Joined `SEXF` by `SEXFC`.
#>
#> ── SEXF ──
#>
#> 0=Male
#> 1=Female
#> # A tibble: 5 × 7
#> RACEN RACEC SEXF SEXFC n_ID n n_cumulative
#> <dbl> <chr> <dbl> <chr> <int> <int> <int>
#> 1 1 White/Caucasian 0 Male 104 1456 1456
#> 2 1 White/Caucasian 1 Female 126 1764 3220
#> 3 2 Black/African American 0 Male 6 84 3304
#> 4 2 Black/African American 1 Female 17 238 3542
#> 5 4 American Indian or Alaska Native 0 Male 1 14 3556