Fit a ctree
model
tarv(pheno, geno, formula, method = c("entropy", "gini"),
family = c("binomial", "gaussian"),
direction = c("both", "positive", "negative"),
alpha = 0.01, cost = NULL)
defines the phenotypic data, serving as the response variable.
defines the genetic variants.
specifies the model structure such as 'disease~sex+race
'.
is the same as that in rtree
.
specifies the regression method: 'binomial
' for the logistic regression for a binary phenotype and 'gaussian
' for the linear regression with a normally distributed phenotype.
offers the option for the inclusion of genetic markers. When tarv
processes the original genetic markers, it may rank their univariate effects according to their t-values in three ways, namely t (positive
), -t (negative
), and |t| (both
). We recommend both
to avoid any potential confusion which means that more statistically significant genetic markers would be considered favorably regardless of their directions of the effect on the phenotype.
is the same as that in rtree
.
is the same as that in rtree
.
the total number of nodes in the tree.
the sequence number of a left daughter node for each internal node.
the sequence number of the parent node for any daughter node.
the splitting variable used to split a given node.
the cut-off value of the splitting variable above.
the table that contains the number of observations in each node.
a numerical indicator for the category of each variable. Value '-1' points to the response variable, '1' to oridinal variables, an integer greater than 1 to a nominal variable with the number of levels equal to the integer.
the class membership of a terminal node which depends on the choice of the misclassification cost.
the p-value of the chi-square test performed at each internal node. It forms the basis to prune the offspring nodes of any internal node. More details in Recursive Partitioning and Applications [Zhang and Singer].
the call by which this object is generated.
'pheno' and 'geno' are processed via TRAV[Song and Zhang 2014] to create new covariates and then these new virables are used to create a tree construction. So it has a similar input and output form with rtree. The object derives from rtree and ctree function are both 'ctree' object, and can be used to plot.ctree and predict.ctree.
Zhang, H. and Singer, B. (1999), Recursive partitioning in the health sciences, Springer Verlag.
Song C. and Zhang H.(2014), Tree-based Analysis of Rare Variants Identifying Risk Modifying Variants in CTNNA2 and CNTNAP2 for Alcohol Addiction.
# NOT RUN {
library(macs)
set.seed(123)
sex <- rbinom(1000, 1, 0.5)
race <- sample(1:3, 1000, replace = TRUE, prob = c(0.4, 0.4, 0.2))
gg <- replicate(100, rbinom(1000, 1, runif(1, 0.005, 0.05)))
annotation <- paste("gene", rep(1:5, each = 20), sep = "")
causal <- rbinom(40, 1, 0.8)
x1 <- rowSums(gg[, 1:20][, causal[1:20] > 0]) > 0
x2 <- rowSums(gg[, 21:40][, causal[21:40] > 0]) > 0
xb <- sex * 0.2 + (race == 2) * 0.2 + x1 * 0.6 + x2 * 0.8 - 0.7
r <- rbinom(1000, 1, exp(xb)/(1+exp(xb)))
pheno <- data.frame(disease = r, sex = sex, race = as.factor(race))
geno <- t(gg)
rownames(geno) <- annotation
result <- tarv(pheno, geno, formula = "disease~sex+race",
method = "entropy", family = "binomial",
direction = "both", alpha = 0.01, cost = c(1, 1))
plot(result)
# }
Run the code above in your browser using DataLab