Skip to contents

distinct_stationary_variables() subsets data to one row per group containing only the variables that are constant in all groups. stationary_variables() returns the names of these variables.

The main difference compared to distinct is that this function keeps all stationary variables, while distinct keeps either the variables in ... or all variables.

Usage

distinct_stationary_variables(.data, ...)

stationary_variables(.data, ...)

Arguments

.data

data frame

...

grouping variables to use to determine whether other variables are stationary. If omitted, will subset to variables that are stationary throughout the entire data frame. Any existing groups are ignored.

Value

An object of the same type as .data with fewer or equal columns, fewer or equal rows, and no groups.

Examples

library(dplyr)

# already one row per group, no change.
identical(
  dmcognigen_cov,
  dmcognigen_cov |>
    distinct_stationary_variables(USUBJID)
)
#> distinct_stationary_variables: no variables removed
#> distinct_stationary_variables: no rows removed
#> [1] TRUE

# all constant variables (no groups)
dmcognigen_cov |>
  distinct_stationary_variables()
#> distinct_stationary_variables: removed 46 variables (USUBJID, ID, RACEN, …, DMDTC, and DMDY)
#> distinct_stationary_variables: removed 253 rows, 1 row remaining
#> # A tibble: 1 × 7
#>   DOMAIN STUDYID      TBILULN EGFRSCHW RFICDTC AGEU  COUNTRY
#>   <chr>  <chr>          <dbl>    <dbl> <chr>   <chr> <chr>  
#> 1 DM     CDISCPILOT01     1.2       NA NA      YEARS USA    

# names of all constant variable within groups
dmcognigen_cov |>
  stationary_variables(RACEN, SEXF)
#>  [1] "DOMAIN"   "STUDYID"  "RACEN"    "RACEC"    "SEXF"     "SEXFC"   
#>  [7] "ASTULN"   "SCRULN"   "TBILULN"  "EGFRSCHW" "RFICDTC"  "AGEU"    
#> [13] "SEX"      "RACE"     "COUNTRY" 

# magrittr pipe lets us pass data as `.`
# so we can modify data in a pipeline and reference the resulting stationary variables.
dmcognigen_dose %>%
  select(1:10) %>%
  cnt(across(stationary_variables(.)), n_distinct_vars = USUBJID)
#> # A tibble: 1 × 10
#>   DOMAIN STUDYID       DVID DVIDC  EVID   MDV ROUTE n_USUBJID     n n_cumulative
#>   <chr>  <chr>        <dbl> <chr> <dbl> <dbl> <chr>     <int> <int>        <int>
#> 1 EX     CDISCPILOT01     0 Dose      1     1 TRAN…       254   591          591

# or can reference the stationary variables in another data frame.
# 
# count all stationary dose-related variables
dmcognigen_pk %>%
  select(STUDYID, USUBJID, any_of(stationary_variables(dmcognigen_dose))) %>%
  cnt(across(stationary_variables(., DVID)), n_distinct_vars = USUBJID)
#> # A tibble: 2 × 14
#>   STUDYID  DOMAIN  DVID DVIDC  EVID   MDV ROUTE EXDOSU EXDOSFRM EXDOSFRQ EXROUTE
#>   <chr>    <chr>  <dbl> <chr> <dbl> <dbl> <chr> <chr>  <chr>    <chr>    <chr>  
#> 1 CDISCPI… EX         0 Dose      1     1 TRAN… mg     PATCH    QD       TRANSD…
#> 2 CDISCPI… PC         1 Xano…     0     0 TRAN… NA     NA       NA       NA     
#> # ℹ 3 more variables: n_USUBJID <int>, n <int>, n_cumulative <int>

# count all stationary concentration-related variables
dmcognigen_pk %>%
  select(STUDYID, USUBJID, any_of(stationary_variables(dmcognigen_conc))) %>%
  cnt(across(stationary_variables(., DVID)), n_distinct_vars = USUBJID)
#> # A tibble: 2 × 19
#>   STUDYID DOMAIN  DVID DVIDC  EVID   MDV  LLOQ PCTESTCD PCTEST PCORRESU PCSTRESU
#>   <chr>   <chr>  <dbl> <chr> <dbl> <dbl> <dbl> <chr>    <chr>  <chr>    <chr>   
#> 1 CDISCP… EX         0 Dose      1     1 NA    NA       NA     NA       NA      
#> 2 CDISCP… PC         1 Xano…     0     0  0.01 XAN      XANOM… ug/ml    ug/ml   
#> # ℹ 8 more variables: PCNAM <chr>, PCSPEC <chr>, PCLLOQ <dbl>, VISIT <chr>,
#> #   VISITNUM <dbl>, n_USUBJID <int>, n <int>, n_cumulative <int>