Get available and latest data requirements files

This family of functions allows the user to explore, import, and leverage the contents of data requirements files. Supported files types are Excel files (xlsx, xls, xlsm) and Word files (docx only). Note that sheets = NULL must be used to include docx files in searches.

read_requirements: Read the latest data requirements file.
as_requirements: Apply requirements attributes to a data frame.
available_requirements_table: Get available data requirements files. Returns a tibble including available data requirements paths and other information.

Usage

read_requirements(
  path = ".",
  pattern = "req",
  sheet = "specs",
  docx_header_pattern = NULL,
  date_format = c("ymd", "mdy", "dmy"),
  subset = NULL,
  variable_name_col = "variable_name",
  variable_label_col = "variable_label",
  decode_col = "format_decode",
  make_clean_names_fn = janitor::make_clean_names,
  ...
)

as_requirements(
  .data,
  variable_name_col = "variable_name",
  variable_label_col = "variable_label",
  decode_col = "format_decode"
)

available_requirements_table(
  path = ".",
  pattern = "req",
  sheet = "specs",
  date_format = c("ymd", "mdy", "dmy"),
  drop_qc = TRUE
)

Arguments

path: a single directory path or the path to a data requirements file. For read_requirements, providing a directory path will result in the latest matching data requirements file being selected, while providing a file path will result in that file being selected. Defaults to the working directory.
pattern: character string containing a regular expression. Only file names which match the regular expression will be returned. Defaults to "req".
sheet: either a character vector of required Excel sheet name(s), the numeric index of the sheet position, or NULL (for no required sheet names, and to include docx files). Only one sheet name or index should be provided to read_requirements. Defaults to "specs", so `sheet = NULL` must be used to match the latest docx file.
docx_header_pattern: one or more patterns of required table header names. Can be character or a list containing any combination of character, regex, and fixed patterns. For case-insensitive, use regex.
date_format: character indicating the format of the date. Defaults to the year-month-day format "ymd".
subset: an expression that returns a logical value and is defined in the terms of the imported requirements table (like filter). If the expression results in an error, that error is reported as a warning and the subset is not applied. An example is subset = pk_ard == "x", which indicates to subset to variables marked for inclusion in the PK Analysis Ready Dataset.
variable_name_col, variable_label_col, decode_col: character column names in the data requirements that describe the variable names, their labels, and their decodes. These should match results after transformations performed by make_clean_names_fn.
make_clean_names_fn: a function/formula that cleans/transforms the original variable names. Defaults to make_clean_names.
...: optional arguments passed to either read.xlsx or read_docx.
.data: data frame to apply requirements attributes to.
drop_qc: logical indicating whether to remove versions of the data requirements that are used for QC. These are identified by patterns of "qc" or "marked" in the filename.

Examples

if (FALSE) { # \dontrun{
# read the latest requirements file in the working directory, based on CPP defaults
reqs <- read_requirements()

# specify a particular file and sheet
reqs <- read_requirements(path = "requirements.xlsx", sheet = 1)

# read the latest docx requirements file
reqs <- read_requirements(
  pattern = "req.*docx",
  sheet = NULL, 
  docx_header_pattern = stringr::regex("variable", ignore_case = TRUE)
)

# apply attributes to an existing data frame
reqs <- as_requirements(reqs_df)

# get all available requirements files
available_requirements_table()

# include docx files in search
available_requirements_table(sheet = NULL)

# only include requirements with a specs sheet
available_requirements_table(sheet = "specs")

# include QC versions
available_requirements_table(drop_qc = FALSE)
} # }