Learn R Programming

LTFGRS (version 1.0.0)

prepare_thresholds: Calculate (personalised) thresholds based on CIPs.

Description

This function prepares input for estimate_liability by calculating thresholds based on stratified cumulative incidence proportions (CIPs) with options for interpolation for ages between CIP values. Given a tibble with families and family members and (stratified) CIPs, personalised thresholds will be calculated for each individual present in .tbl. An individual may be in multiple families, but only once in the same family.

Usage

prepare_thresholds(
  .tbl,
  CIP,
  age_col,
  CIP_merge_columns = c("sex", "birth_year", "age"),
  CIP_cip_col = "cip",
  Kpop = "useMax",
  status_col = "status",
  lower_equal_upper = FALSE,
  personal_thr = FALSE,
  fid_col = "fid",
  personal_id_col = "pid",
  interpolation = NULL,
  bst.params = list(max_depth = 10, base_score = 0, nthread = 4, min_child_weight = 10),
  min_CIP_value = 1e-05,
  xgboost_itr = 30
)

Value

Tibble with (personlised) thresholds for each family member (lower & upper), the calculated cumulative incidence proportion for each individual (K_i), and population prevalence within an individuals CIP strata (K_pop; max value in stratum). The threshold and other potentially relevant information can be added to the family graphs with familywise_attach_attributes.

Arguments

.tbl

Tibble with family and personal id columns, as well as CIP_merge_columns and status.

CIP

Tibble with population representative cumulative incidence proportions. CIP must contain columns from CIP_merge_columns and cIP_cip_col.

age_col

Name of column with age at the end of follow-up or age at diagnosis for cases.

CIP_merge_columns

The columns the CIPs are subset by, e.g. CIPs by birth_year, sex.

CIP_cip_col

Name of column with CIP values.

Kpop

Takes either "useMax" to use the maximum value in the CIP strata as population prevalence, or a tibble with population prevalence values based on other information. If a tibble is provided, it must contain columns from .tbl and a column named "K_pop" with population prevalence values. Defaults to "UseMax".

status_col

Column that contains the status of each family member. Coded as 0 or FALSE (control) and 1 or TRUE (case).

lower_equal_upper

Should the upper and lower threshold be the same for cases? Can be used if CIPs are detailed, e.g. stratified by birth year and sex.

personal_thr

Should thresholds be based on stratified CIPs or population prevalence?

fid_col

Column that contains the family ID.

personal_id_col

Column that contains the personal ID.

interpolation

Type of interpolation, defaults to NULL.

bst.params

List of parameters to pass on to xgboost. See xgboost documentation for details.

min_CIP_value

Minimum cip value to allow. Too low values may lead to numerical instabilities.

xgboost_itr

Number of iterations to run xgboost for.

Examples

Run this code
tbl = data.frame(
fid = c(1, 1, 1, 1),
pid = c(1, 2, 3, 4),
role = c("o", "m", "f", "pgf"),
sex = c(1, 0, 1, 1),
status = c(0, 0, 1, 1),
age = c(22, 42, 48, 78),
birth_year = 2023 - c(22, 42, 48, 78),
aoo = c(NA, NA, 43, 45))

cip = data.frame(
age = c(22, 42, 43, 45, 48, 78),
birth_year = c(2001, 1981, 1975, 1945, 1975, 1945),
sex = c(1, 0, 1, 1, 1, 1),
cip = c(0.1, 0.2, 0.3, 0.3, 0.3, 0.4))

prepare_thresholds(.tbl = tbl, CIP = cip, age_col = "age", interpolation = NA)

Run the code above in your browser using DataLab