np_lpd: Nonparametric estimation of linear personalized diagnostic rules.

Description

Nonparametric estimation of the personalized diagnostics rule to find subgroup-specific biomarkers according to linear combination of predictors.

Usage

np_lpd(D,YA,YB,X,dirA,dirB,eps,plot,A,B,c,d)

Value

A list of class np.lin.persDx:

df: Data frame with D, YA, YB, X, tau, YC, where tau=A or B for recommending YA or YB, respectively.
AUCA: AUC for YA.
AUCB: AUC for YB.
AUC: AUC for YC.
tpfp: Data frame with cutoff, tp, fp, where tp and fp are true and false positive positives at the cutoff values of YC.
theta: Estimated regression parameters.
theta0: Estimated threshold parameter.
PLOT: TRUE or FALSE to show ROC curves.

Arguments

D: Binary outcome with D=1 for disease (or case) and D=0 for non-diseased (or control) (n X 1 vector).
YA: Biomarker A, measured on a continuous scale (n X 1 vector).
YB: Biomarker B, measured on a continuous scale (n X 1 vector).
X: Predictors (n x p matrix).
dirA: Direction of YA to D, where dirA="<" (or dirA=">") indicates higher (or lower) YA is assoicated with Pr(D=1)). Default is dirA="<".
dirB: Direction of YB to D, where dirB="<" (or dirB=">") indicates higher (or lower) YB is assoicated with Pr(D=1)). Default is dirB="<".
eps: Tuning parameter for predictor selections. Default is eps=0.01.
plot: plot=TRUE (or FALSE) shows (or does not show) the receiver operating charactriestics (ROC) curve.
A: Grid search parameter (Discrete). Default is A=0
B: Grid search parameter (Discrete). Default is B=0
c: Grid search parameter. Default is c=2
d: Grid search parameter. Default is d=2

Author

Yunro Chung [aut, cre]

Details

The np.lin.persDx function estimates the personalized diagnostics rule \(\tau(X)\), where \(\tau(X)\)=A recommends \(YA\) if \(\theta_1 X_1+...+\theta_p X_p > \theta_0\) or \(\tau(X)\)=B recommends \(YB\) otherwise by maximizing (empirical) area under the ROC curve (AUC). Here, the AUC is computed based on \(YC\) with the direction of "<", i.e. higher \(YC\) is associated with Pr(D=1), where \(YC=YA\) if \(\tau(X)\)=A and dirA="<", or \(YC=YB\) if \(\tau(X)\)=B and dirB="<". If dirA=">" (or dirB=">"), negative YA (or YB) is used.

A forward grid rotation algorithm (FGR) is used to estimate \(\theta_0,\theta_1,...,\theta_p\) by sequentially adding each of the predictors to \(\tau(X)\) that increases the AUC the most. The stopping criteria is AUC increasement is less than or equal to eps. The eps controls the model complexity. The cross-validation techniques can be used to find the optimal eps.

The FGR results in a suboptimal solution. The accuracy is improved by setting higher A, B, c, d, but it increases increase computational costs, or vice versa. We thus recond this function when p is small or around 10.

References

Yaliang Zhang and Yunro Chung, Nonparametric estimation of linear personalized diagnostics rules via efficient grid algorithm (submitted)

Examples

Run this code

#simulate data
set.seed(1)
n=100
D=c(rep(1,n/2),rep(0,n/2))

X1=runif(n,0,1)
X2=runif(n,0,1)
X3=runif(n,0,1)
X=data.frame(X1,X2,X3)

tau=rep("B",n)
tau[X1+X2>=1]="A"

YA=D*(rnorm(n,2,1)*(tau=="A")+rnorm(n,0,1)*(tau=="B"))+
   (1-D)*rnorm(n,0,1)
YB=D*(rnorm(n,1,1)*(tau=="B")+rnorm(n,0,1)*(tau=="A"))+
   (1-D)*rnorm(n,0,1)

#run
fit=np_lpd(D, YA, YB, X)
fit

Run the code above in your browser using DataLab