lgmr_model: Latent Gamma Regression Model

Description

Baldur has the option to use the Latent Gamma Regression Model (LGMR). This model attempts to estimate local Mean-Variance (M-V) trend for each peptide (or protein, PTM, etc.). To this end is describes the M-V trend as a regression with a common link function and a latent one. While there may not be a latent trend in real data, it allows for a mathematical description that can estimate the local trends of each peptide. The model describes the standard deviation (S) as a gamma random variable with mean dependency described by the sample mean ($\bar{y}$): $$\boldsymbol{S}\sim\Gamma(\alpha,\beta),\quad\boldsymbol{\beta}=\frac{\alpha}{\boldsymbol{\mu}}$$ $$\boldsymbol{\mu}=\exp(\gamma_0-\gamma_{\bar{y}}\,f(\boldsymbol{\bar{y}})) + \kappa \exp(\boldsymbol{\theta}(\gamma_{0L}-\gamma_{\bar{y}L}\,f(\boldsymbol{\bar{y}})),\quad f(\boldsymbol{x})=\frac{\boldsymbol{x}-\mu_{\bar{y}}}{\sigma_{\bar{y}}}$$ Here, $\exp(\gamma_0-\gamma_{\bar{y}}f(\boldsymbol{\bar{y}}))$ is the common trend of the regression. Then, the mixing variable, $\boldsymbol{\theta}$, describes the proportion of the mixture that each peptide has, and $\kappa$ is just some small constant such that when $\theta$ is zero the latent term is small.

Arguments

Value

A stanmodel that can be used in fit_lgmr.

Code

The Stan code for this model is given by:

lgmr_model
S4 class stanmodel 'lgmr_model' coded as follows:
functions {
  // mu
  vector reg_function(
    vector x,
    vector theta,
    real I,
    real I_L,
    real S,
    real S_L,
    int N
    ) {
    vector[N] exp_beta  = .001*exp(theta .* (I_L - S_L*x));
              exp_beta +=      exp(I - S*x);
    return exp_beta;
  }
}
data {
  int<lower=0> N;       // Number of observations
  vector<lower=0>[N] y; // sd
  vector[N] x;          // mean
}
transformed data {
  real v_y = variance(y);                      // for NRMSE
  vector[N] x_star = (x - mean(x))/sd(x);      // f(y_bar)
  vector[3] lb = [0, 0, negative_infinity()]'; // Boundaries normal coefs.
}
parameters {
  real<lower = 0> alpha;                 // Shape
  vector<lower = lb>[3] eta;             // For S, S_L, I
  vector<lower = 0, upper = 1>[N] theta; // Mixture parameter
  real I_L;                              // Intercept Latent
}
transformed parameters {
  real<lower = 0> S   = eta[1]; // Slope common
  real<lower = 0> S_L = eta[2]; // Slope Latent
  real I              = eta[3]; // Intercep common
}
model {
  //Priors
  alpha         ~ cauchy(0, 25);
  eta           ~ std_normal();
  I_L           ~ skew_normal(2, 15,  35);
  theta         ~ beta(1, 1);
  {
    vector[N] exp_beta = reg_function(x_star, theta, I, I_L, S, S_L, N);
    // Likelihood
    y ~ gamma(alpha, alpha ./ exp_beta);
  }
}
generated quantities {
  // NRMSE calculations
  real nrmse;
  {
    vector[N] se = reg_function(x_star, theta, I, I_L, S, S_L, N);
    se -= y;
    se  = square(se);
    nrmse = mean(se) / v_y;
  }
  nrmse = sqrt(nrmse);
}

Details

Next we will describe the hierarchical prior setup for the regression variables. For $\alpha$ baldur uses a half-Cauchy as a weakly informative prior: $$\alpha\sim\text{Half-Cauchy}(25)$$ For the latent contribution to the i:th peptide, $\theta_i$, has an uniform distribution: $$\theta_i\sim\mathcal{U}(0, 1)$$ Further, it can be seen that Baldur assumes that S always decreases (on average) with the sample mean. To this end, both slopes ($\gamma_{\bar{y}},\gamma_{\bar{y}L}$) are defined on the real positive line. Hence, we used Half-Normal (HN) distributions to describe the slope parameters: $$\gamma_{\bar{y}}\sim HN(1)$$ $$\gamma_{\bar{y}L}\sim HN(1)$$ For the intercepts, we assume a standard normal prior for the common intercept. Then, we use a skew-normal to model the latent intercept. The reason for this is two-fold, one, $\kappa$ will decrease the value of the latent term and, two, to push the latent trend upwards. The latter reason is so that the latent intercept is larger than the common and so that the priors prioritize a shift in intercept over a increase in slope. For the intercepts, Baldur uses a standard normal prior for the common intercept. $$\gamma_0\sim\mathcal{N}(0,1)$$ While for the latent trend, it uses a skew-normal (SN) to push up the second trend and to counteract the shrinkage of $\kappa$. $$\gamma_{0L}\sim\mathcal{SN}(2,15,35)$$