genDataRead: Reading the genetic data from a file

Description

This function will read in data from PED or haplin formatted file.

Usage

genDataRead(file.in = stop("Filename must be given!", call. = FALSE),
  file.out = NULL, dir.out = ".",
  format = stop("Format parameter is required!"), header = FALSE, n.vars,
  cov.header, allele.sep = ";", na.strings = "NA", col.sep = "",
  cov.file.in, overwrite = NULL)

Arguments

file.in

The name of the input file.

file.out

The base for the output filename (by default, constructed from the input file name).

dir.out

The path to the directory where the output files will be saved.

format

Format of data (will influence how data is processed) - choose from:

haplin - data already in one row per family,
ped - data from .ped file, each row represents an individual.

header

Whether the first line of the input file contains column names; default: FALSE.

n.vars

The number of columns with covariate data (if any).

cov.header

The character vector containing the names of covariate columns (both, columns in the main data and in the file with additional covariate data, if given by the 'cov.file.in' argument).

allele.sep

Character: separator between two alleles (default: ";").

na.strings

Character or NA: how the missing data is coded (default: "NA").

col.sep

Character: separator between the columns (i.e., markers; default: any whitespace character).

cov.file.in

Name of the file containing additional covariate data, if any. Caution: unless the 'cov.header' argument is used, it is assumed that the first line of this file contains the header (i.e., the column names of the additional data).

overwrite

Whether to overwrite the output files: if NULL (default), will prompt the user to give answer; set to TRUE, will automatically overwrite any existing files; and set to FALSE, will stop if the output files exist.

Value

A list object with three elements:

cov.data - a data.frame with covariate data (if available in the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters.

Details

The function reads in all the data in the file, creates ff objects to store the genetic information and data.frame to store covariate data (if any). These objects are saved in .RData and .ffData files, which can be later on easily uploaded to R (with genDataLoad) and re-used.

Examples

Run this code

# NOT RUN {
  # The argument 'overwrite' is set to TRUE!
  examples.dir <- system.file( "extdata", package = "Haplin" )
  # ped format:
  example.file2 <- paste0( examples.dir, "/exmpl_data.ped" )
  ped.data.read <- genDataRead( example.file2, file.out = "exmpl_ped_data", 
   format = "ped", overwrite = TRUE )
  ped.data.read
  # haplin format:
  example.file1 <- paste0( examples.dir, "/HAPLIN.trialdata2.txt" )
  haplin.data.read <- genDataRead( file.in = example.file1, dir.out = ".",
   file.out = "exmpl_haplin_data", format = "haplin", allele.sep = "", n.vars = 2, 
   cov.header = c( "smoking", "sex" ), overwrite = TRUE )
  haplin.data.read

# }

Run the code above in your browser using DataLab