Learn R Programming

synbreed (version 0.9-2)

create.gpData: Create genomic prediction data object

Description

This function combines all raw data sources in a single, unified data object of class gpData. This is a list with elements for phenotypic, genotypic, marker map, pedigree and further covariate data. All elements are optional.

Usage

create.gpData(pheno = NULL, geno = NULL, map = NULL, pedigree = NULL,
              family = NULL, covar = NULL, reorderMap = TRUE, 
              map.unit = "cM", repeated  = NULL, modCovar = NULL)

Arguments

pheno
data.frame with individuals organized in rows and traits organized in columns. For unrepeated measures unique rownames should identify individuals. For repeated measures, the first column identifies individuals and a second col
geno
matrix with individuals organized in rows and markers organized in columns. Genotypes could be coded arbitrarily. Missing values should be coded as NA. Colums or rows with only missing values not allowed. Unique rownames
map
data.frame with one row for each marker and two columns (named chr and pos). First columns gives the chromosome (numeric or character but not factor) and second column the posit
pedigree
Object of class pedigree.
family
data.frame assigning individuals to families with names of individuals in rownames This information could be used for replacing of missing values with function codeGeno.
covar
data.frame with further covariates for all individuals that either appear in pheno, geno or pedigree$ID, e.g. sex or age. rownames must be specified to identify individuals. Typically this
reorderMap
logical. Should markers in geno and map be reordered by chromosome number and position within chromosome according to map (default = TRUE)?
map.unit
Character. Unit of position in map, i.e. 'cM' for genetic distance or 'bp' for physical distance (default = 'cM').
repeated
This column is used to identify the replications of the phenotypic values. The unique values become the names of the third dimension of the pheno object in the gpData. This argument is only required for repeated measurements.
modCovar
vector with colnames which identify columns with covariables in pheno. This argument is only required for repeated measurements.

Value

  • Object of class gpData which is a list with the following elements
  • covardata.frame with information on individuals
  • phenoarray (individuals x traits x replications) with phenotypic data
  • genomatrix marker matrix containing genotypic data. Columns (marker) are in the same order as in map (if reorderMap=TRUE.)
  • pedigreeobject of class pedigree
  • mapdata.frame with columns 'chr' and 'pos' and markers sorted by 'pos' within 'chr'
  • phenoCovarsarray with phenotypic covariates
  • infolist with additional information on data (coding of data, unit in map)

Details

The class gpData is designed to provide a unified framework for data related to genomic prediction analysis. Every data source can be omitted. In this case, the corresponding argument must be NULL. By default (argument reorderMap), markers in geno are ordered by their position in map. Individuals are ordered in alphabetical order. An object of class gpData can contain different subsets of individuals or markers in the elements pheno, geno and pedigree. In this case the id in covar comprises all individuals that either appear in pheno, geno and pedigree. Two additional columns in covar named phenotyped and genotyped are automatically generated to identify individuals that appear in the corresponding gpData object.

See Also

codeGeno, summary.gpData, gpData2data.frame

Examples

Run this code
set.seed(123)
# 9 plants with 2 traits
n <- 9  # only for n > 6
pheno <- data.frame(Yield = rnorm(n,200,5), Height=rnorm(n,100,1))
rownames(pheno) <- letters[1:n]

# marker matrix
geno <- matrix(sample(c("AA","AB","BB",NA),size=n*12,replace=TRUE,
prob=c(0.6,0.2,0.1,0.1)),nrow=n)
rownames(geno) <-  letters[n:1]
colnames(geno) <- paste("M",1:12,sep="")

# genetic map
# one SNP is not mapped (M5) and will therefore be removed
map <- data.frame(chr=rep(1:3,each=4),pos=rep(1:12))
map  <- map[-5,]
rownames(map) <- paste("M",c(1:4,6:12),sep="") 

# simulate pedigree
ped <- simul.pedigree(3,c(3,3,n-6))

# combine in one object
gp <- create.gpData(pheno,geno,map,ped)
summary(gp)


# 9 plants with 2 traits , 3 replcations
n <- 9  #
pheno <- data.frame(ID = rep(letters[1:n],3), rep = rep(1:3,each=n), 
                    Yield = rnorm(3*n,200,5), Height=rnorm(3*n,100,1))

# combine in one object
gp2 <- create.gpData(pheno,geno,map,repeated="rep")
summary(gp2)

Run the code above in your browser using DataLab