opticont4mb: Calculates Optimum Contributions of Selection Candidates using Multi-Breed Genotype Data

Description

Calculates optimum genetic contributions for selection candidates from one breed using multi-breed genotype data. Genotype data from multiple breeds may be used in order to increase the genetic distance between the breed of interest (thisBreed) and other breeds.

Usage

opticont4mb(method, K, phen, bc, thisBreed=names(bc)[1], con=list(), 
    solver="cccp", quiet=FALSE, make.definite=solver=="csdp", ...)

Arguments

method

Possible values are "min.VAR", and "max.VAR", where VAR is the name of a column in data frame phen, or "min.KIN", or "min.KIN.acrossBreeds", where KIN is the name of a kinship as defined by function kinlist. Use help.opticont4mb to see the available objective functions. If kinship KIN is available for all animals from the multi-breed population, then "min.KIN.acrossBreeds" minimizes the kinship in the multi-breed population by optimizing the contributions of selection candidates from the breed of interest.

List created by function kinlist containing kinships of genotyped individuals.

phen

Data frame with one row for each animal from the multi-breed population that is to be included in the analysis. The animal IDs is in column 1 (named Indiv) and the sex is in column 2 (named Sex). The sex is coded as 'male' and 'female'. Column Breed contains the breed name of every genotyped animal. Further columns contain e.g. breeding values or migrant contributions that may be used for defining linear constraints.

Named vector containing the proportion of every genotyped breed in the hypothetical multi-breed offspring population. The names of the components are the breed names. Note that only the contributions of selection candidates from thisBreed will be optimized. Animals from other breeds have fixed contributions.

thisBreed

The breed to which the selection candidates belong.

con

List defining the constraints. The components are described in the Details section. If a component is missing, then the respective constraint is not applied. Use help.opticont4mb to see the available constraints.

solver

Name of the algorithm for optimization. Available solvers are "alabama", "cccp", "cccp2", "csdp", and "slsqp". The default is "cccp". The solvers are described in the Details section.

quiet

If quiet=FALSE then detailed information is shown.

make.definite

If make.definite=TRUE then all non-definite matrices are approximated by positive definite matrices before optimization. This is the default setting for the solver csdp.

...

Tuning parameters of the solver. The available parameters depend on the solver and will be printed when function opticont is used with default values. An overview is given in the Details section.

Value

A list with class "opticont" which has component parent. This is the data frame phen but includes ony the animals from the breed of interest. It has the additional column oc containing the optimum genetic contribution of each selection candidate to the next generation, lb containing the lower bounds of the optimum contributions, and ub containing the upper bounds.

Details

Computation of optimum genetic contributions for the selection candidates from one breed using multi-breed marker data. Marker data from multiple breeds may be used in order to increase the genetic distance between the breed of interest (thisBreed) and the other breeds.

In this case a hypothetical subdivided population is considered consisting of purebred offspring of genotyped individuals. That is, the offspring population consists of several breeds with specified breed proportions (e.g. 20% Angler cattle, 40% Holstein cattle, and 40% Fleckvieh cattle). Only the contributions of the selection candidates from thisBreed will be optimized. Animals from other breeds have equal contributions.

The aim is to reduce the average genomic relationship in this multi-breed population since this causes the genetic distance between thisBreed and other breeds to increase. This may increase the conservation value of the breed.

If managing diversity across breeds is not intended then function opticont could be used instead.

Constraints

A list possibly containing the following components providing the constraints:

ub.KIN: Upper bound for the mean kinship in the offspring, where KIN must be replaced by the name of a kinship as defined by function kinlist. Use help.opticont4mb to see available methods.

ub.KIN.acrossBreeds: Upper bound for the mean kinship in the next generation of the multi-breed population, where KIN must be replaced by the name of a kinship as defined by function kinlist. Use help.opticont4mb to see available methods.

lb: Either a named vecor of the form c(M=a, F=b) containing lower bounds for the contributions of males (a) and females (b) from thisBreed, or a named vector containing the minimum permissible contribution of each selection candidate. The default is c(M=0, F=0).

ub: Either a named vecor of the form c(M=a, F=b) containing upper bounds for the contributions of males (a) and females (b) from thisBreed, or a named vector containing the maximum permissible contribution of each selection candidate. For M=-1 (F=-1) it is assumed that all males (females) have equal contributions to the offspring. If a number is NA then the number of offspring for that sex/individual is not bounded. The default is c(M=NA, F=NA).

lb.VAR: Lower bound for the mean value of variable VAR from data frame phen in the offspring from thisBreed. For example lb.BV=a defines a lower bound for the mean breeding value in the offspring from thisBreed to be a if data frame phen has column BV with breeding values of the parents. Lower bounds for an arbitrary number of variables can be defined.

ub.VAR: Upper bound for the mean value of variable VAR from data frame phen in the offspring from thisBreed. For example ub.MC=a defines the upper bound for the genetic contributions from migrant breeds in the offspring of thisBreed to be a if data frame phen has column MC with migrant contributions for the parents. Upper bounds for an arbitrary number of variables can be defined.

eq.VAR: Equality constraint for the mean value of variable VAR from data frame phen in the offspring from thisBreed. For example eq.MC=a forces the genetic contribution from migrant breeds in the offspring from thisBreed to be a if data frame phen has column MC with migrant contributions for the parents. Equality constraints for an arbitrary number of variables can be defined.

Solver

"alabama": The augmented lagrangian minimization algorithm auglag from package alabama is used. That is, the method combines the objective function and a penalty for each constraint into a single function. This modified objective function is then passed to another optimization algorithm with no constraints. If the constraints are violated by the solution of this sub-problem, then the size of the penalties is increased and the process is repeated. The default methods for the uncontrained optimization in the inner loop is the quasi-Newton method called BFGS. The available parameters used for the outer loop are described in the details section of the help page of function auglag. The available parameters used for the inner loop are described in the details section of the help page of function optim.

"cccp", "cccp2": Function cccp from package cccp for solving cone constrained convex programs is used. For cccp quadratic constraints are defined as second order cone constraints. This solver is not suitable if computation of the Cholesky decomposition fails. For cccp2 quadratic constraints are defined by functions. The implemented algorithms are partially ported from CVXOPT. The parameters are those from function ctrl. They are among others the maximum count of iterations as an integer value (maxiters), the feasible level of convergence to be achieved (feastol) and whether the solver's progress during the iterations is shown (trace). If numerical problems are encountered increase the optimization parameter feastol or reduce parameter stepadj.

"csdp": The problem is reformulated as a semidefinite programming problem and solved with the CSDP library. Non-definite matrices are approximated by positive definite matrices. This solver is not suitable when the objective is to minimize kinship at native alleles. Available parameters are described in the CSDP User's Guide: https://projects.coin-or.org/Csdp/export/49/trunk/doc/csdpuser.pdf .

"slsqp": The sequential (least-squares) quadratic programming (SQP) algorithm slsqp for gradient-based optimization from package nloptr is used. The algorithm optimizes successive second-order (quadratic/least-squares) approximations of the objective function, with first-order (affine) approximations of the constraints. Available parameters are described in nl.opts.

Remark

If the function does not provide a valid result due to numerical problems then try the following modifications:

`*`	modify the optimization parameters,
`*`	use another `solver`,
`*`	change the order of the kinship constraints if more than one kinship is constrained,
`*`	define upper or lower bounds instead of equality constraints.
`*`	increase the upper bounds for the kinships.

Validity of the result can be checked with function summary.opticont. Use help.opticont4mb to see available objective functions and constraints.

References

Borchers, B. (1999). CSDP, A C Library for Semidefinite Programming Optimization Methods and Software 11(1):613-623 http://euler.nmt.edu/~brian/csdppaper.pdf

Kraft, D. (1988). A software package for sequential quadratic programming, Technical Report DFVLR-FB 88-28, Institut fuer Dynamik der Flugsysteme, Oberpfaffenhofen, July 1988.

Lange K, Optimization, 2004, Springer.

Madsen K, Nielsen HB, Tingleff O, Optimization With Constraints, 2004, IMM, Technical University of Denmark.

Examples

Run this code

# NOT RUN {
data(map) 
data(Cattle)
dir  <- system.file("extdata", package = "optiSel")
files<- file.path(dir, paste("Chr", 1:2, ".phased", sep=""))

### Compute genomic kinship and genomic kinship at native segments
G    <- segIBD(files, map, minSNP=20, minL=3.0)
GN   <- segIBDatN(files, Cattle, map, thisBreed="Angler", refBreeds="others", 
           ubFreq=0.02, minSNP=20, minL=3.0, lowMem=TRUE)
Kin  <- kinlist(G=G, GN=GN)

### Compute migrant contributions of selection candidates 
Haplo<- haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="others",
           minSNP=20, minL=3.0, ubFreq=0.02, what="match")
Comp <- segBreedComp(Haplo$match, map)
Cattle$MC <- NA
Cattle[rownames(Comp), "MC"] <- 1-Comp$native
apply(Comp[,-1],2,mean)
#     native           F           H           R 
#0.551844104 0.009739393 0.202216271 0.236200232 


########################################
#  Find optimum breed contributions    #
########################################
lb <- c(Angler=0.10, Holstein=0.20, Fleckvieh=0.20)
bc <- opticomp(G, Breed=Cattle$Breed, obj.fun="NGD", lb=lb)$bc
round(bc,3)
#   Angler Fleckvieh  Holstein   Rotbunt 
#    0.355     0.445     0.200     0.000 

########################################
#  Check available objective functions #
#  and constraints                     #
########################################

help.opticont4mb(Kin, Cattle)


##################################################################
#   Compute the minimum segment based kinship achievable         #
#  across breeds while constraining it within the breed          #
##################################################################

con  <- list(ub.G=0.05, ub=c(M=NA, F=-1))
minG <- opticont4mb("min.G.acrossBreeds", Kin, Cattle, bc, thisBreed="Angler", con=con, trace=FALSE)
minG.s <- summary(minG)
minG.s[,c("G.acrossBreeds")]
#[1] 0.02289039

##################################################################
# Compute the genetic progress achievable while constraining     #
#      segment based kinship  within and across breeds           #
#                     and migrant contributions                  #
##################################################################

con          <- list(ub.G=0.05, ub.G.acrossBreeds=0.026, ub.MC=0.32,  ub=c(M=NA, F=-1))
maxBV.G.MC   <- opticont4mb("max.BV", Kin, Cattle, bc, thisBreed="Angler", con=con, trace=FALSE)
maxBV.G.MC.s <- summary(maxBV.G.MC)
maxBV.G.MC.s$meanBV 
# [1] 0.2851194

##################################################################
#    Compute the minimum achievable kinship at native alleles    #
#    while constraining kinship within and across breeds         #
#    and migrant contributions                                   #
##################################################################

con   <- list(ub.G=0.05, ub.G.acrossBreeds=0.026, ub.MC=0.32, ub=c(M=NA, F=-1))
minGN <- opticont4mb("min.GN", Kin, Cattle, bc, thisBreed="Angler", con=con, solver="slsqp")
minGN.s <- summary(minGN)
minGN.s$GN
#[1] 0.04114953


##################################################################
# Summary statistics from different optimizations                #
# can be combined in a data frame. The most important parameters #
# are printed for comparison:                                    #
##################################################################
Res <- rbind(minG.s, maxBV.G.MC.s, minGN.s)
format(Res[,c("valid","meanBV", "meanMC", "G.acrossBreeds", "G", "GN")],digits=4)

#           valid  meanBV meanMC G.acrossBreeds       G      GN
#minG        TRUE -0.4329 0.3379        0.02289 0.03451 0.03986
#maxBV.G.MC  TRUE  0.2851 0.3200        0.02475 0.05000 0.06830
#minGN       TRUE -0.5321 0.3200        0.02312 0.03600 0.04115

cor(cbind(minG$parent$oc, maxBV.G.MC$parent$oc, minGN$parent$oc))
#          [,1]      [,2]      [,3]
#[1,] 1.0000000 0.2741736 0.8540842
#[2,] 0.2741736 1.0000000 0.3608900
#[3,] 0.8540842 0.3608900 1.0000000
# }

Run the code above in your browser using DataLab