This family of functions allows the user to explore, import, and
leverage the contents of data requirements files. Supported files types are
Excel files (xlsx, xls, xlsm) and Word files (docx only). Note that sheets
= NULL
must be used to include docx files in searches.
read_requirements
Read the latest data requirements file.
as_requirements
Apply requirements attributes to a data frame.
available_requirements_table
Get available data requirements files. Returns a
tibble
including available data requirements paths and other information.
Usage
read_requirements(
path = ".",
pattern = "req",
sheet = "specs",
docx_header_pattern = NULL,
date_format = c("ymd", "mdy", "dmy"),
subset = NULL,
variable_name_col = "variable_name",
variable_label_col = "variable_label",
decode_col = "format_decode",
make_clean_names_fn = janitor::make_clean_names,
...
)
as_requirements(
.data,
variable_name_col = "variable_name",
variable_label_col = "variable_label",
decode_col = "format_decode"
)
available_requirements_table(
path = ".",
pattern = "req",
sheet = "specs",
date_format = c("ymd", "mdy", "dmy"),
drop_qc = TRUE
)
Arguments
- path
a single directory path or the path to a data requirements file. For
read_requirements
, providing a directory path will result in the latest matching data requirements file being selected, while providing a file path will result in that file being selected. Defaults to the working directory.- pattern
character
string containing a regular expression. Only file names which match the regular expression will be returned. Defaults to"req"
.- sheet
either a
character
vector of required Excel sheet name(s), thenumeric
index of the sheet position, orNULL
(for no required sheet names, and to include docx files). Only one sheet name or index should be provided toread_requirements
. Defaults to"specs"
, so `sheet = NULL` must be used to match the latest docx file.- docx_header_pattern
one or more patterns of required table header names. Can be
character
or a list containing any combination ofcharacter
,regex
, andfixed
patterns. For case-insensitive, use regex.- date_format
character
indicating the format of the date. Defaults to the year-month-day format"ymd"
.- subset
an expression that returns a logical value and is defined in the terms of the imported requirements table (like
filter
). If the expression results in an error, that error is reported as a warning and the subset is not applied. An example issubset = pk_ard == "x"
, which indicates to subset to variables marked for inclusion in the PK Analysis Ready Dataset.- variable_name_col, variable_label_col, decode_col
character
column names in the data requirements that describe the variable names, their labels, and their decodes. These should match results after transformations performed bymake_clean_names_fn
.- make_clean_names_fn
a function/formula that cleans/transforms the original variable names. Defaults to
make_clean_names
.- ...
- .data
data frame to apply requirements attributes to.
- drop_qc
logical
indicating whether to remove versions of the data requirements that are used for QC. These are identified by patterns of "qc" or "marked" in the filename.
Examples
if (FALSE) { # \dontrun{
# read the latest requirements file in the working directory, based on CPP defaults
reqs <- read_requirements()
# specify a particular file and sheet
reqs <- read_requirements(path = "requirements.xlsx", sheet = 1)
# read the latest docx requirements file
reqs <- read_requirements(
pattern = "req.*docx",
sheet = NULL,
docx_header_pattern = stringr::regex("variable", ignore_case = TRUE)
)
# apply attributes to an existing data frame
reqs <- as_requirements(reqs_df)
# get all available requirements files
available_requirements_table()
# include docx files in search
available_requirements_table(sheet = NULL)
# only include requirements with a specs sheet
available_requirements_table(sheet = "specs")
# include QC versions
available_requirements_table(drop_qc = FALSE)
} # }