Help file for writing your own scoring function for
`logreg`

!

`logreg.myown()`

Ingo Ruczinski ingo@jhu.edu and Charles Kooperberg clk@fredhutch.org.

You can write your own scoring function for `logreg`

! This may
be useful if you have a model other than those which we already
programmed in.

Essentially you need to provide two routines in the file
**Myownscoring.f**:

(i) A routine *Myownfitting* which fits your model: it
provides a coefficient (beta) for each of the logic trees and provides
a score of how good the model is. Low scores are good. (So add a minus
sign if your score is a log-likelihood.)

(ii) A routine *Myownscoring* which - given the betas -
provides the score of your model. [If you don't use cross-validation,
this second routine is not needed, though some dummy routine to
satisfy the compiler should still be provided.]

After recompilation, you can fit your model using the option ```
type = 0
```

in `logreg`

. Below we give an example for a version
of the My.own functions for conditional logistic regression which are
also provided as **inst/condlogic.ff** when you downloaded the files.

**PROGRAMMING DETAILS**

Below is a list of variables that are passed on. Most of them are as
you expect - response, predictors (binary ones and continuous ones),
number of cases, number of predictors. In addition there are two
columns - `dcph`

and `weight`

- that can either be used to pass on an
auxiliary variable for each case (discrete for
`dcph`

and continuous for
`weight`

), or even some overall auxiliary variables - as these numbers
are not used anywhere else. If you do not need any of the variables -
just ignore them!

`prtr`

:

the predictions of the logic trees in the current model: this is an
integer matrix of size `n1`

times `ntr`

- although only the
first `nop`

columns contain useful information.

`rsp`

:

the response variable: this is a real (single precision) vector of
length `n1`

.

`dcph`

:

censor times: this is an integer vector of length `n1`

this could
be used as an auxiliary (integer) vector - as it is just passed on.
(There is no check that this is a 0/1 variable, when you use your own
scoring function.) For example, you could use this to pass on
something like cluster membership.

`weight`

:

weights for the cases this is a real vector of length `n1`

. this
could be used as an auxiliary (real) vector - as it is just passed on.
There is no check that these numbers are positive, when you choose
your own scoring function.

`ordrs`

:

the order (by response size) of the cases This is an integer vector of
length `n1`

. For the case with the smallest response this one is
1, for the second smallest 2, and so on. Ties are resolved arbitrary.
Always computed, although only used for proportional hazards
models. Use it as you wish.

`n1`

:

the total number of cases in the data.

`ntr`

:

the number of logic trees ALLOWED in the tree.

`nop`

:

the number of logic trees in the CURRENT model. The subroutines
should work if `nop`

is 0.

`wh`

:

the index of the tree that has been edited in the last move - i.e. the
column of `prtr`

that has changes since the last call.

`nsep`

:

number of variables that get fit a separate parameter The subroutines
should work if `nsep`

is 0.

`seps`

:

array of the above variables - this is a single precision matrix of
size `nsep`

times `n1`

. Note that `seps`

and
`prtr`

are stored in different directions.

For *Myownfitting* you should return:

`betas`

:

a vector of parameters of the model that you fit. `betas(0)`

should be the parameter for the intercept `betas(1:nsep)`

should
be the parameters for the continuous variables in seps
`betas((nsep+1):(nsep+nop))`

should be the parameters for the
binary trees in prtr if you have more parameters, use `dcph`

, or
`weight`

; these variables will not be printed however.

`score`

:

whatever score you assign to your model small should be good
(i.e. deviance or -log.likelihood).

`reject`

:

an indicator whether or not to reject the proposed move *regardless*
of the score (for example when an iteration necessary to determine the
score failed to converge (0 = move is OK ; 1 = reject move) set this
one to 0 if there is no such condition.

You are allowed to change the values of dcph, and weight.

For *Myownscoring* additional input is:

`betas`

:
the coefficients

You should return:

`score`

:
whatever score you assign to your model small should be good
(i.e. deviance or -log.likelihood). If the model "crashes", you should
simply return a very large number.

While we try to prevent that models are singular, it is possible that
for your model a single or degenerate model is passed on for
evaluation. For Myownfitting you can pass the model back with
`reject = 1`

, for Myownscoring you can pass it on with a very large
value for `score`

. Currently Myownscoring.f contains empty frames for
the scoring functions; condlogic.ff contains an example with
conditional logistic regression.

The logic regression program is written in Fortran 77.

**CONDITIONAL LOGISTIC REGRESSION**

A function for a conditional logistic regression score function is attached as an example function on how to write your own scoring function for Logic Regression. Obviously you can also use it if you have conditional logistic data.

Conditional logistic regression is common model fitting technique for matched case-control studies, in which each case is matched with one or more controls. (In conditional logistic regression several cases could be matched to several controls, in the implementation provided here only one case can be matched with each group of controls.) Conditional logistic regression models are parameterized like regular logistic regression models, except that the intercept is no longer identifiable. See, for example, Breslow and Day - Volume 1 (1990, Statistical Methods in Cancer Research, International Agency for Research on Cancer, Lyon) for details. Conditional logistic regression models are most easily fit using a stratified proportional hazards model (if there is one-to-one case-control matching it can also be fit using logistic regression, but that method breaks down if there is more than one control per case). Each group of a case and controls is one stratum. All cases get an arbitrarily event time of 1.00, and all controls get a censoring time of 2.00.

In our implementation we use the response column to indicate the matching. For all controls this column is 0, for a case it is k, indicating that the next k records are the matched controls for the current case. Thus, we order our cases so that each case is followed by its controls. Cases with a negative response are put in a stratum -1, which is not used in any computations. This has implications for cross-validation. See below.

In *Myownfitting* and *Myownscoring* we first allocate
various vectors (strata, index, censoring variable) that are local, as
well as some work arrays that are used by our fitting routines. (We
need to set some of the parameters for that, see the help page of
`logreg`

for details.) We then define `idx(j)=j`

for
`j=1,n1`

, and we define the `strata`

and `delta`

vectors. We use slightly modified versions of the proportional
hazards routines that are already used otherwise in the Logic
Regression program, to include stratification. After the model is
fitted, we assign minus the partial likelihood to `score(1)`

and
(for *Myownfitting*) we pass on the betas.

Recompile after replacing *Myownscoring.f* by
*condlogic.ff*

The permutation and null model versions are not directly usable (we
could do some permutation tests, but they require more programming),
but we can use cross-validation. Obviously we should keep cases and
controls match. To that extend, we would run permutation with a
negative seed (see `logreg`

) and we would take care ourselves
that case-control groups are in a random order, and that every block
has the same number of records. We achieve the later by adding some
records with response -1. In particular, suppose that we have 19
pairs of case- (single) control data, and that we want to do 3-fold
cross validation. We would permute the sequence of the 19 pairs, and
add two records with response -1 after the 13th pair, and two records
with -1 at the end of the file, so that the total data file would have
42 records.

Ruczinski I, Kooperberg C, LeBlanc ML (2003). Logic Regression,
*Journal of Computational and Graphical Statistics*, **12**, 475-511.

Ruczinski I, Kooperberg C, LeBlanc ML (2002). Logic Regression -
methods and software. *Proceedings of the MSRI workshop on
Nonlinear Estimation and Classification* (Eds: D. Denison, M. Hansen,
C. Holmes, B. Mallick, B. Yu), Springer: New York, 333-344.

Kooperberg C, Ruczinski I, LeBlanc ML, Hsu L (2001). Sequence Analysis
using Logic Regression, *Genetic Epidemiology*, **21**,
S626-S631.

Selected chapters from the dissertation of Ingo Ruczinski, available from https://research.fredhutch.org/content/dam/stripe/kooperberg/ingophd-logic.pdf

`logreg`

```
logreg.myown() # displays this help file
help(logreg.myown) # equivalent
```

Run the code above in your browser using DataCamp Workspace