# Zhi Jin

#### 8 packages on CRAN

We perform linear, logistic, and cox regression using the base functions lm(), glm(), and coxph() in the R software and the 'survival' package. Likewise, we can use ols(), lrm() and cph() from the 'rms' package for the same functionality. Each of these two sets of commands has a different focus. In many cases, we need to use both sets of commands in the same situation, e.g. we need to filter the full subset model using AIC, and we need to build a visualization graph for the final model. 'base.rms' package can help you to switch between the two sets of commands easily.

Seek the significant cutoff value for a continuous variable, which will be transformed into a classification, for linear regression, logistic regression, logrank analysis and cox regression. First of all, all combinations will be gotten by combn() function. Then n.per argument, abbreviated of total number percentage, will be used to remove the combination of smaller data group. In logistic, Cox regression and logrank analysis, we will also use p.per argument, patient percentage, to filter the lower proportion of patients in each group. Finally, p value in regression results will be used to get the significant combinations and output relevant parameters. In this package, there is no limit to the number of cutoff points, which can be 1, 2, 3 or more. Still, we provide 2 methods, typical Bonferroni and Duglas G (1994) <doi: 10.1093/jnci/86.11.829>, to adjust the p value, Missing values will be deleted by na.omit() function before analysis.

Flexibly convert data between long and wide format using just two functions: reshape_toLong() and reshape_toWide().

When we do statistic work, we need to see the structure of the data. list.str() function will help you see the structure of the data quickly. list.plot() function can help you check every variable in your dataframe. table_one() function will make it easy to make a baseline table including difference tests. uv_linear(), uv_logit(), uv_cox(), uv_logrank() will give you a hand to do univariable regression analysis, while mv_linear(), mv_logit() and mv_cox() will carry out multivariable regression analysis.

It is not very easy to define segments for y-axis in a 'ggplot2' plot. gg.gap() function in this package can carry it out.

The risk plot may be one of the most commonly used figures in tumor genetic data analysis. We can conclude the following two points: Comparing the prediction results of the model with the real survival situation to see whether the survival rate of the high-risk group is lower than that of the low-level group, and whether the survival time of the high-risk group is shorter than that of the low-risk group. The other is to compare the heat map and scatter plot to see the correlation between the predictors and the outcome.

A nomogram, which can be carried out in 'rms' package, provides a graphical explanation of a prediction process. However, it is not very easy to draw straight lines, read points and probabilities accurately. Even, it is hard for users to calculate total points and probabilities for all subjects. This package provides formula_rd() and formula_lp() functions to fit the formula of total points with raw data and linear predictors respectively by polynomial regression. Function points_cal() will help you calculate the total points. prob_cal() can be used to calculate the probabilities after lrm(), cph() or psm() regression. For more complex condition, interaction or restricted cubic spine, TotalPoints.rms() can be used.