SparseGRMFile
for GRAB.NullModel
.If the sample size in analysis is greater than 100,000, we recommend using sparse GRM (instead of dense GRM) to adjust for sample relatedness.
This function is to use GCTA
(link) to make a SparseGRMFile
to be passed to function GRAB.NullModel
.
This function can only support Linux
and PLINK
files as required by GCTA
software. To make a SparseGRMFile
, two steps are needed. Please check Details
section for more details.
getSparseGRM(
PlinkFile,
nPartsGRM,
SparseGRMFile,
tempDir = NULL,
relatednessCutoff = 0.05,
minMafGRM = 0.01,
maxMissingGRM = 0.1,
rm.tempFiles = FALSE
)
A character string containing a message with the path to the output file where the sparse Genetic Relationship Matrix (SparseGRM) has been stored.
a path to PLINK binary files (without file extension). Note that the current version (gcta_1.93.1beta) of GCTA
software does not support different prefix names for BIM, BED, and FAM files.
a numeric value (e.g. 250): GCTA
software can split subjects to multiple parts. For UK Biobank data analysis, it is recommended to set nPartsGRM=250
.
a path to file of output to be passed to GRAB.NullModel
.
a path to store temp files from getTempFilesFullGRM
. This should be consistent to the input of getTempFilesFullGRM
. Default is system.file("SparseGRM", "temp", package = "GRAB")
.
a cutoff for sparse GRM, only kinship coefficient greater than this cutoff will be retained in sparse GRM. (default=0.05)
Minimal value of MAF cutoff to select markers (from PLINK files) to make sparse GRM. (default=0.01)
Maximal value of missing rate to select markers (from PLINK files) to make sparse GRM. (default=0.1)
a logical value indicating if the temp files generated in getTempFilesFullGRM
will be deleted. (default=FALSE)
# Input data (We recommend setting nPartsGRM=250 for UKBB with N=500K):
GenoFile = system.file("extdata", "simuPLINK.bed", package = "GRAB")
PlinkFile = tools::file_path_sans_ext(GenoFile)
nPartsGRM = 2
# For Linux, get the file path of gcta64 by which command:
gcta64File <- system("which gcta64", intern = TRUE)
# For Windows, set the file path directly:
gcta64File <- "C:\\path\\to\\gcta64.exe"
# The temp outputs (may be large) will be in system.file("SparseGRM", "temp", package = "GRAB") by default:
for(partParallel in 1:nPartsGRM) getTempFilesFullGRM(PlinkFile, nPartsGRM, partParallel, gcta64File)
tempDir = system.file("SparseGRM", "temp", package = "GRAB")
SparseGRMFile = gsub("temp", "SparseGRM.txt", tempDir)
getSparseGRM(PlinkFile, nPartsGRM, SparseGRMFile)
Step 1
: Run getTempFilesFullGRM
to save temporary files to tempDir
.
Step 2
: Run getSparseGRM
to combine the temporary files to make a SparseGRMFile
to be passed to function GRAB.NullModel
.
Users can customize parameters including (minMafGRM, maxMissingGRM, nPartsGRM)
, but functions getTempFilesFullGRM
and getSparseGRM
should use the same ones.
Otherwise, package GRAB
cannot accurately identify temporary files.