Internal function to validate and process both individual-level and summary-level input data
colocboost_validate_input_data(
X = NULL,
Y = NULL,
sumstat = NULL,
LD = NULL,
dict_YX = NULL,
dict_sumstatLD = NULL,
effect_est = NULL,
effect_se = NULL,
effect_n = NULL,
overlap_variables = FALSE,
M = 500,
min_abs_corr = 0.5
)A list containing:
Processed list of genotype matrices
Processed list of phenotype vectors
Dictionary mapping Y to X
List of variable names for each X matrix
Processed list of summary statistics data.frames
Processed list of LD matrices
Dictionary mapping sumstat to LD
List of variant names for each sumstat
List of z-scores for each outcome
List of sample sizes for each outcome
List of phenotype variances for each outcome
List of standard errors for each outcome
Updated M value (may be changed if LD not provided)
Updated min_abs_corr value (may be changed if LD not provided)
Updated jk_equiv_corr value
Updated jk_equiv_loglik value
Updated func_simplex value
A list of genotype matrices for different outcomes, or a single matrix if all outcomes share the same genotypes.
A list of vectors of outcomes or an N by L matrix if it is considered for the same X and multiple outcomes.
A list of data.frames of summary statistics.
A list of correlation matrices indicating the LD matrix for each genotype.
A L by 2 matrix of dictionary for X and Y if there exist subsets of outcomes corresponding to the same X matrix.
A L by 2 matrix of dictionary for sumstat and LD if there exist subsets of outcomes corresponding to the same sumstat.
Matrix of variable regression coefficients (i.e. regression beta values) in the genomic region
Matrix of standard errors associated with the beta values
A scalar or a vector of sample sizes for estimating regression coefficients.
If overlap_variables = TRUE, only perform colocalization in the overlapped region.
The maximum number of gradient boosting rounds for each outcome (default is 500).
Minimum absolute correlation allowed in a confidence set.