This is a versatile function to plot the relationship between a predictor variable and the outcome. It supports numeric (linear or RCS) and categorical predictors for logistic, linear, and Cox models. It can display the distribution of the predictor variable as a histogram (for numeric) or bar plot (for categorical).
predictor_effect_plot(
data,
x,
y,
time = NULL,
time2 = NULL,
covars = NULL,
cluster = NULL,
method = "auto",
knot = 4,
add_hist = TRUE,
ref = "x_median",
ref_digits = 3,
show_total_n = TRUE,
group_by_ref = TRUE,
group_title = NULL,
group_labels = NULL,
group_colors = NULL,
breaks = 20,
line_color = "#e23e57",
print_p_ph = TRUE,
trans = "identity",
save_plot = FALSE,
create_dir = FALSE,
filename = NULL,
y_lim = NULL,
hist_max = NULL,
xlim = NULL,
height = 6,
width = 6,
return_details = FALSE
)A ggplot object, or a list with the plot and details if return_details is TRUE.
A data frame.
A character string of the predictor variable.
A character string of the outcome variable.
A character string of the time variable for Cox models. If NULL, logistic or linear regression is used.
A character string of the ending time for interval-censored or counting process data.
A character vector of covariate names.
A character string of the cluster variable for robust variance estimation.
A character string specifying the method for handling the predictor x.
Can be "auto", "rcs", "linear", or "categorical". If "auto", the function decides based on the type of x.
The number of knots for RCS. If NULL, AIC is used to find the optimal number.
A logical value. If TRUE, add a distribution plot (histogram or bar plot).
The reference value for numeric predictors, or the reference level for categorical predictors.
For numeric x, can be "x_median", "x_mean", "ratio_min", or a numeric value.
The number of digits for the reference value label.
A logical value. If TRUE, show the total number of samples.
A logical value. If TRUE and x is numeric, split the histogram at the reference value.
A character string for the group legend title.
A character vector for group labels.
A character vector of colors for the distribution plot. If NULL, the default colors are used.
If group_by_ref is FALSE, the first color is used as fill color.
The number of breaks for the histogram.
The color for the effect line/points.
A logical value. If TRUE (and model is Cox), print the p-value for the proportional hazards test.
The transformation for the y-axis. Passed to ggplot2::scale_y_continuous(transform = trans).
A logical value indicating whether to save the plot.
A logical value for creating the save directory.
A character string for the saved plot filename.
The y-axis limits.
The maximum value for the histogram y-axis.
The x-axis limits for numeric predictors. If NULL, the limits are the 0.025 and 0.975 quantiles.
The actual plot range might be slightly larger than this range to fit the histogram.
The height of the saved plot.
The width of the saved plot.
A logical value indicating whether to return plot details.
data(cancer, package = "survival")
cancer$dead <- cancer$status == 2
cancer <- cancer[!is.na(cancer$inst), ]
predictor_effect_plot(
data = cancer,
x = "age",
y = "dead",
method = "linear",
covars = "ph.karno",
add_hist = FALSE,
trans = "log2",
save_plot = FALSE,
cluster = "inst"
)
Run the code above in your browser using DataLab