prepare_thresholds: Calculate (personalised) thresholds based on CIPs.

Description

This function prepares input for estimate_liability by calculating thresholds based on stratified cumulative incidence proportions (CIPs) with options for interpolation for ages between CIP values. Given a tibble with families and family members and (stratified) CIPs, personalised thresholds will be calculated for each individual present in .tbl. An individual may be in multiple families, but only once in the same family.

Usage

prepare_thresholds(
  .tbl,
  CIP,
  age_col,
  CIP_merge_columns = c("sex", "birth_year", "age"),
  CIP_cip_col = "cip",
  Kpop = "useMax",
  status_col = "status",
  lower_equal_upper = FALSE,
  personal_thr = FALSE,
  fid_col = "fid",
  personal_id_col = "pid",
  interpolation = NULL,
  bst.params = list(max_depth = 10, base_score = 0, nthread = 4, min_child_weight = 10),
  min_CIP_value = 1e-05,
  xgboost_itr = 30
)

Value

Tibble with (personlised) thresholds for each family member (lower & upper), the calculated cumulative incidence proportion for each individual (K_i), and population prevalence within an individuals CIP strata (K_pop; max value in stratum). The threshold and other potentially relevant information can be added to the family graphs with familywise_attach_attributes.

Arguments

.tbl: Tibble with family and personal id columns, as well as CIP_merge_columns and status.
CIP: Tibble with population representative cumulative incidence proportions. CIP must contain columns from CIP_merge_columns and cIP_cip_col.
age_col: Name of column with age at the end of follow-up or age at diagnosis for cases.
CIP_merge_columns: The columns the CIPs are subset by, e.g. CIPs by birth_year, sex.
CIP_cip_col: Name of column with CIP values.
Kpop: Takes either "useMax" to use the maximum value in the CIP strata as population prevalence, or a tibble with population prevalence values based on other information. If a tibble is provided, it must contain columns from .tbl and a column named "K_pop" with population prevalence values. Defaults to "UseMax".
status_col: Column that contains the status of each family member. Coded as 0 or FALSE (control) and 1 or TRUE (case).
lower_equal_upper: Should the upper and lower threshold be the same for cases? Can be used if CIPs are detailed, e.g. stratified by birth year and sex.
personal_thr: Should thresholds be based on stratified CIPs or population prevalence?
fid_col: Column that contains the family ID.
personal_id_col: Column that contains the personal ID.
interpolation: Type of interpolation, defaults to NULL.
bst.params: List of parameters to pass on to xgboost. See xgboost documentation for details.
min_CIP_value: Minimum cip value to allow. Too low values may lead to numerical instabilities.
xgboost_itr: Number of iterations to run xgboost for.

Examples

Run this code

tbl = data.frame(
fid = c(1, 1, 1, 1),
pid = c(1, 2, 3, 4),
role = c("o", "m", "f", "pgf"),
sex = c(1, 0, 1, 1),
status = c(0, 0, 1, 1),
age = c(22, 42, 48, 78),
birth_year = 2023 - c(22, 42, 48, 78),
aoo = c(NA, NA, 43, 45))

cip = data.frame(
age = c(22, 42, 43, 45, 48, 78),
birth_year = c(2001, 1981, 1975, 1945, 1975, 1945),
sex = c(1, 0, 1, 1, 1, 1),
cip = c(0.1, 0.2, 0.3, 0.3, 0.3, 0.4))

prepare_thresholds(.tbl = tbl, CIP = cip, age_col = "age", interpolation = NA)

Run the code above in your browser using DataLab