SurvivalUnstable: The survival function with exponential model

Description

The survival function with exponential model (as proposed in our paper) is representing the probability that true values of a pair of records referring to the same entity coincide. For a linked pair of record (i,j) from sources A, B respectively, P(HAik = HBjk | t_ij, ...) = exp( - exp(X.ALPHA) t_ij ). This function is only used if the PIV is unstable and evolve over time. If so the true values of a linked pair of records may not coincide. If you want to use a different survival function to model instability, you can change this function 'SurvivalUnstable' as well as the associated log(likelihood) function 'loglikSurvival'. Also see ?FlexRL::loglikSurvival.

Usage

SurvivalUnstable(Xlinksk, alphask, times)

Value

A vector (for an unstable PIV) of size number of linked records with the probabilities that true values coincide (e.g. 1 - proba to move if the PIV is postal code) defined according to the survival function with exponential model proposed in the paper

Arguments

Xlinksk: A matrix with number of linked records rows and 1+cov in A+cov in B columns, with a first column filled with 1 (intercept), and following columns filled with the values of the covariates useful for modelling instability for the linked records
alphask: A vector of size 1+cov in A+cov in B, with as first element the baseline hazard and following elements being the coefficient of the conditional hazard associated with the covariates given in X
times: A vector of size number of linked records with the time gaps between the record from each sources

Details

The simplest model (without covariates) just writes P(HAik = HBjk | t_ij, ...) = exp( - exp(alpha) . t_ij ) The more complex model (with covariates) writes P(HAik = HBjk | t_ij, ...) = exp( - exp(X.ALPHA) . t_ij ) and uses a matrix X (nrow=nbr of linked records, ncol=1 + nbr of cov from A + nbr of cov from B) where the first column is filled with 1 (intercept) and the subsequent columns are the covariates values from source A and/or from source B to be used. The ALPHA in this case is a vector of parameters, the first one being associated with the intercept is the same one than for the simplest model, the subsequent ones are associated with the covariates from A and/or from B.

Examples

Run this code

nCoefUnstable = 1
intercept = rep(1,5)
cov_k = cbind( intercept )
times = c(0.001,0.2,1.3,1.5,2)
survivalpSameH = SurvivalUnstable(cov_k, log(0.28), times)

Run the code above in your browser using DataLab