load_GWAS: Easy loading of GWAS results files

Description

load_GWAS is wrapper-function of read.table that makes loading large GWAS results files less of a hassle. It automatically unpacks .zip and .gz files and uses load_test to determine which column separator the file uses.

Usage

load_GWAS(filename, dir = getwd(),
          column_separators = c("\t", " ", "", ",", ";"),
          test_nrows = 1000,
          header = TRUE, nrows = -1,
          comment.char = "", na.strings = c("NA", "."),
          stringsAsFactors = FALSE, ...)
load_test(filename, dir = getwd(),
          column_separators = c("\t", " ", "", ",", ";"),
          test_nrows = 1000, ...)

Arguments

filename

character string; the complete filename of the file to be loaded. Note that compressed files (.gz or .zip files) can only be unpacked if the filename of the archive contains the extension of the archived file. For example, if the archived file is named "data1.csv", the archive should be "data1.csv.zip".

dir

character string; the directory containing the file. Note that R uses forward slash (/) where Windows uses backslash (\).

column_separators

character string or vector of the column-separators to be tried by load_test. White-space can be specified by "", but it is recommended you try tab ("\t") and space (" ") first.

test_nrows

integer; the number of lines that load_test checks in the trail-load. A smaller number means faster loading, but also makes it more likely that errors slip through. To check the entire dataset, set to -1.

header, nrows, comment.char, na.strings, stringsAsFactors, …

Arguments passed to read.table.

Value

load_GWAS returns the table imported from the specified file.

load_test returns a list with 4 components:

success

logical; whether load_test was able to load a dataset with five or more columns.

error

character string; if unable to load the file, this returns the error-message of the last column separator to be tried.

file_type

character string; the last three characters of filename.

sep

the first column-separator that succeeded in loading a dataset with five or more columns.

Details

load_test determines the correct column separator simply by trying them individually until it finds one that works (that is: one that results in a dataset with an equal number of cells in every row AND at least five or more columns). If none work, it reports the error-message generated by the last column separator tried.

The column separators are tried in the order specified by the column_separators argument.

By default, load_test only checks the first 1000 lines (adjustable by the test_nrows argument); if the problem lies further down in the dataset, it will not catch it. In such a case, load_GWAS and QC_GWAS will crash when attempting to load the dataset.

A common problem is employing white-space ("") as column separator for a file that uses empty fields to indicate missing values. The separators surrounding an empty field are adjacent, so R parses them as a single column separator. In this particular example, specifying a single space (" ") or tab ("\t") as column separator solves the problem (this is why the default setting of column_separators puts these values before white-space).

Examples

Run this code

# NOT RUN {
  ## As the function requires a GWAS file to work,
  ## the following code should be adjusted before execution.
  ## Because this is a demonstration, the nrows argument is used
  ## to read only the first 100 rows.
  
  
# }
# NOT RUN {
   data_GWAS <-
      load_GWAS("GWA_results1.txt.zip",
                dir = "C:/GWAS_results",
                nrows = 100)
  
# }

Run the code above in your browser using DataLab