rnonRLIV: Type IV Non-Random Labeling of a Given Set of Points

Description

An object of class "SpatPatterns".

Given the set of \(n\) points, dat, in a region, this function assigns \(n_1=\)round(n*ult.prop,0) of them as cases, and the rest as controls with first selecting \(k_0=\)round(n*init.prop,0) as cases initially and assigning the label case to the remaining points with infection probabilities equal to the scaled bivariate normal density values at those points. The initial and ultimate number of cases will be \(k_0\) and \(n_1\) on the average if the argument poisson=TRUE (i.e., \(k_0=\)rpois(1,round(n*init.prop,0)) and \(n_1=\)rpois(1,round(n*ult.prop,0)) ), otherwise they will be exactly equal to \(n_1=\)round(n*ult.prop,0) and \(k_0=\)round(n*init.prop,0). More specifically, let \(z_1,\ldots,z_{k_0}\) be the initial cases and for \(j=1,2,\ldots,k_0\) let \(\phi_{G,j}(z_i)\) be the value of the pdf of the \(BVN(z_j,s_1,s_2,rho)\), which is the bivariate normal distribution mean=z_j and standard deviations of the first and second components being \(s_1\) and \(s_2\) (denoted as s1 and s2 as arguments of the function) and correlation between them being \(\rho\) (denoted as rho as an argument of the function) (i.e., the covariance matrix is \(\Sigma=S\) where \(S_{11}=s_1^2\), \(S_{22}=s_2^2\), \(S_{12}=S_{21}=s_1 s_2 \rho\)). Add these pdf values as \(p_j=\sum_{j=1}^{k_0} \phi_{G,j}(z_i)\) for each \(i=1,2,\ldots,n\) and find \(p_{\max}=\max p_j\). Then label the points (other than the initial cases) as cases with infection probabilities prob equal to the value of the \(p_j/p_{\max}\) values at these points. We stop when we first exceed \(n_1\) cases. \(\rho\) has to be in (-1,1) for prob to be a valid probability and \(s_1\) and \(s_2\) must be positive (actually these are required for the BVN density to be nondegenerately defined). If rand.init=TRUE, first \(k_0\) entries are chosen as the initial cases in the data set, dat, otherwise, \(k_0\) initial cases are selected randomly among the data points.

Algorithmically, first all dat points are treated as non-cases (i.e., controls or healthy subjects). Then the function follows the following steps for labeling of the points:

step 0: \(n_1\) is generated randomly from a Poisson distribution with mean = round(n*ult.prop,0), so that the average number of ultimate cases will be round(n*ult.prop,0) if the argument poisson=TRUE, else \(n_1=\)round(n*ult.prop,0). And \(k_0\) is generated randomly from a Poisson distribution with mean = round(n*init.prop,0), so that the average number of initial cases will be round(n*init.prop,0) if the argument poisson=TRUE, else \(k_0=\)round(n*init.prop,0).

step 1: Initially, \(k_0\) many points from dat are selected as cases. The selection of initial cases are determined based on the argument rand.init (with default=TRUE) where if rand.init=TRUE then the initial cases are selected randomly from the data points, and if rand.init= FALSE, the first \(k_0\) entries in the data set, dat, are selected as the cases.

step 2: Then it assigns the label case to the remaining points with infection probabilities \(prob=\sum_{j=1}^{k_0} \phi_{G,j}(z_i)/p_{\max}\), which is the sum of the BVN densities scaled by the maximum of such sums. See the description for the details of the parameters in the prob.

step 3: The procedure ends when number of cases \(n_c\) exceed \(n_1\), and \(n_c-n_1\) of the cases (other than the initial cases) are randomly selected and relabeled as controls, i.e., 0s, so that the number of cases is exactly \(n_1\).

In the output cases are labeled as 1 and controls as 0, and initial contagious case is marked with a red cross in the plot of the pattern.

See ceyhan:SiM-seg-ind2014;textualnnspat for more detail where type IV non-RL pattern is the case 4 of non-RL pattern considered in Section 6 with \(n_1\) and \(k_0\) are fixed as parameters and rho is represented as \(k_{pow}\) and \(rho/k_{den}=1\) in the article.

Although the non-RL pattern is described for the case-control setting, it can be adapted for any two-class setting when it is appropriate to treat one of the classes as cases or one of the classes behave like cases and other class as controls.

Usage

rnonRLIV(
  dat,
  init.prop,
  ult.prop,
  s1,
  s2,
  rho,
  rand.init = TRUE,
  poisson = FALSE
)

Value

A list with the elements

pat.type: ="cc" for the case-control patterns for RL or non-RL of the given data points, dat
type: The type of the point pattern
parameters: initial and ultimate proportion of cases after the non-RL procedure is applied to the data, s1, s2 and rho which are standard deviations and the correlation for the components of the bivariate normal distribution.
dat.points: The set of points non-RL procedure is applied to obtain cases and controls randomly in the type IV fashion
lab: The labels of the points as 1 for cases and 0 for controls after the type IV nonRL procedure is applied to the data set, dat. Cases are denoted as red dots and controls as black circles in the plot.
init.cases: The initial cases in the data set, dat. Marked with red crosses in the plot of the points.
gen.points,ref.points: Both are NULL for this function, as initial set of points, dat, are provided for the non-RL procedure.
desc.pat: Description of the point pattern
mtitle: The "main" title for the plot of the point pattern
num.points: The vector of two numbers, which are the number of cases and controls.
xlimit,ylimit: The possible ranges of the \(x\)- and \(y\)-coordinates of the generated and the reference points

Arguments

dat: A set of points the non-RL procedure is applied to obtain cases and controls randomly in the type IV fashion (see the description).
init.prop: A real number between 0 and 1 representing the initial proportion of cases in the data set, dat. The selection of the initial cases depends on the parameter rand.init and the number of initial cases depends on the parameter poisson (see the description).
ult.prop: A real number between 0 and 1 representing the ultimate proportion of cases in the data set, dat after the non-RL assignment. The number of ultimate cases depends on the parameter poisson (see the description).
s1, s2: Positive real numbers representing the standard deviations of the first and second components of the bivariate normal distribution.
rho: A real number between -1 and 1 representing the correlation between the first and second components of the bivariate normal distribution.
rand.init: A logical argument (default is TRUE) to determine the choice of the initial case in the data set, dat. If rand.init=TRUE then the initial case is selected randomly from the data points, and if rand.init= FALSE, the first \(k_0\) entries in the data set, dat, is labeled as the initial case.
poisson: A logical argument (default is FALSE) to determine whether the number of initial and ultimate cases, \(k_0\) and \(n_1\), will be random or fixed. If poisson=TRUE then the \(k_0\) and \(n_1\) are from a Poisson distribution, \(k_0=\)rpois(1,round(n*init.prop,0)) and \(n_1=\)rpois(1,round(n*ult.prop,0)) otherwise they are fixed, \(k_0=\)round(n*init.prop,0) and \(n_1=\)round(n*ult.prop,0).

Author

Elvan Ceyhan

References

Examples

Run this code

n<-40;  #try also n<-20; n<-100;
ult<-.5; #try also .25, .75
#data generation
dat<-cbind(runif(n,0,1),runif(n,0,1))

int<-.1
s1<-s2<-.4
rho<- .1

Xdat<-rnonRLIV(dat,int,ult,s1,s2,rho,poisson=FALSE) #labeled data, try also with poisson=TRUE
Xdat

table(Xdat$lab)

summary(Xdat)
plot(Xdat,asp=1)
plot(Xdat)

#normal original data
n<-40;  #try also n<-20; n<-100;
dat<-cbind(rnorm(n,0,1),rnorm(n,0,1))
ult<-.5; #try also .25, .75

int<-.1
s1<-s2<-.4
rho<-0.1

Xdat<-rnonRLIV(dat,int,ult,s1,s2,rho,poisson=FALSE) #labeled data, try also with poisson=TRUE
Xdat

table(Xdat$lab)

summary(Xdat)
plot(Xdat,asp=1)
plot(Xdat)

Run the code above in your browser using DataLab