.check_input_identifier_column: Internal function for checking consistency of the identifier columns

Description

This function checks whether an identifier column is consistent, i.e. appears it exists, there is only one, and there is no overlap with any user-provided feature columns, identifiers, or

Usage

.check_input_identifier_column(
  id_column,
  data,
  signature = NULL,
  exclude_features = NULL,
  include_features = NULL,
  other_id_column = NULL,
  outcome_column = NULL,
  col_type,
  check_stringency = "strict"
)

Arguments

id_column

Character string indicating the currently inspected identifier column.

data

Data set as loaded using the .load_data function.

signature

(optional) One or more names of feature columns that are considered part of a specific signature. Features specified here will always be used for modelling. Ranking from feature selection has no effect for these features.

exclude_features

(optional) Feature columns that will be removed from the data set. Cannot overlap with features in signature, novelty_features or include_features.

include_features

(optional) Feature columns that are specifically included in the data set. By default all features are included. Cannot overlap with exclude_features, but may overlap signature. Features in signature and novelty_features are always included. If both exclude_features and include_features are provided, include_features takes precedence, provided that there is no overlap between the two.

other_id_column

Character string indicating another identifier column.

outcome_column

Character string indicating the outcome column(s).

col_type

Character string indicating the type of column, i.e. sample or batch.

check_stringency

Specifies stringency of various checks. This is mostly:

strict: default value used for summon_familiar. Thoroughly checks input data. Used internally for checking development data.
external_warn: value used for extract_data and related methods. Less stringent checks, but will warn for possible issues. Used internally for checking data for evaluation and explanation.
external: value used for external methods such as predict. Less stringent checks, particularly for identifier and outcome columns, which may be completely absent. Used internally for predict.