probstat: probstat

Description

Computes marginal, conditional, and information-theoretic summaries for a binary outcome `y` against one or more predictors in `x`. Performs either Fisher's exact test or a generalized linear mixed model (GLMM) for inference.

Usage

probstat(y, x, test = "Fisher", ri, nfolds, seed = 10101)

Value

A data frame with one row per evaluated predictor (or pair) and the following columns:

xprob: Marginal probability of \(X=1\).
yprob: Marginal probability of \(Y=1\).
cprob: Conditional probability \(P(Y=1 \mid X=1)\).
cprobx: Conditional probability \(P(X=1 \mid Y=1)\).
cprobi: Inverse conditional probability \(P(Y=1 \mid X=0)\).
cpdif: Difference \(P(Y=1 \mid X=1) - P(Y=1)\).
cpdifper: Percent difference relative to \(P(Y=1)\).
xent: Entropy of \(X\).
yent: Entropy of \(Y\).
ce: Conditional entropy of \(Y \mid X\).
cedif: Difference between marginal and conditional entropy of \(Y\).
cedifper: Percent difference in entropy.
p: p-value from Fisher's exact test or the GLMM (as applicable).

Arguments

y: A binary outcome vector (logical or numeric coded as 0/1). Length `n`.
x: A data frame of predictors (typically the expanded data returned by [pairmi()]). Must have `n` rows; columns are treated as candidate predictors.
test: Character string selecting the inferential method; one of `c("fisher", "glmm")`. Defaults to `"fisher"` if missing.
ri: Optional vector/factor giving the grouping variable for a random intercept in the GLMM. Must be length `n`. Ignored if `test = "fisher"`.
nfolds: Integer; number of folds used for cross-validation.
seed: Integer seed for fold randomization.

Examples

Run this code

pairmiresult = pairmi(misimdata[,2:6])
probstat(misimdata$y,pairmiresult$expanded.data,nfolds=5)

Run the code above in your browser using DataLab