logreg.anneal.control: Control for Logic Regression

Description

Control of simulated annealing parameters needed in logreg.

Usage

logreg.anneal.control(start=0, end=0, iter=0, earlyout=0, update=0)

Value

A list with arguments start, end, iter, earlyout, and update, that can be used as the value of the argument anneal.control oflogreg.

Arguments

start: the upper temperature (on a log10 scale) in the annealing chain. I.e. if start = 3, the annealing chain starts at temperature 1000. The acceptance function is the usual min(1,exp(-diff(scores)/temp)), so any temperature larger than what would be expected as possible differences between any two models pretty much generates a random walk in the beginning, and means that you need to wait longer on results. A too low starting temperature means that the chain may end up in a local optimal (rather than global optimal) solution. If you select both start and end the default of 0, the program will attempt to find reasonable numbers itself (it is known to be only moderately successful in this though).
end: the lower temperature (on a log10 scale) in the annealing chain. I.e. if end is -2, the annealing chain ends at temperature 0.01. If this temperature is very low one can use the early out possibility listed below, as otherwise the chain may run longer than desired!
iter: the total number of iterations in the annealing chain. This is the total over all annealing chains, not the number of iterations of a chain at a given temperature. If this number is too small the chain may not find a good (the best) solution, if the chain is too long the program may take long...
earlyout: if the end temperature is very low, the simulated annealing algorithm may not move any more, but one still needs to wait on all possible moves being evaluated (and rejected)! An early out possibility is offered. If during consecutive five blocks of earlyout iterations, in each block 10 or fewer moves are accepted (for which the score changes), the program terminates. This is a desirable option after one is convinced the program otherwise runs fine: it can be dangerous on the first run.
update: every how many iterations there should be an update of the scores. I.e. if update = 1000, a score will get printed every 1000 iterations. So if iter = 100000 iterations, there will be 100 updates on your screen. If you update = 0, a one line summary for each fitted model is printed. If update = -1, there is virtually no printed output.

Author

Ingo Ruczinski ingo@jhu.edu and Charles Kooperberg clk@fredhutch.org.

Details

Missing arguments take defaults. If the argument start is a list with arguments start, end, iter, earlyout, and update, those values take precedent of directly specified values.

This is a rough outline how the automated simulated annealing works: The algorithm starts running at a very high temperature, and decreases the temperature until the acceptance ratio of moves is below a certain threshold (in the neighborhood of 95%). At this point we run longer chains at fixed temperatures, and stop the search when the last "n" consecutive moves have been rejected. If you think that the search was either not sufficiently long or excessively long (both of which can very well happen since it is pretty much impossible to specify default values that are appropriate for all sorts of data and models), you can over-write the default values.

If you want more detailed information continue reading....

These are some more detailed suggestions on how to set the parameters for the beginning temperature, end temperature and number of iterations for the Logic Regression routine. Note that if start temperature and end temperature are both zero, the routine uses its default values. The number of iterations iter is irrelevant in this case. In our opinion, the default values are OK, but not great, and you can usually do better if you're willing to invest time in learning how to set the parameters.

The starting temperature is the log(10) value of start - i.e., if start is 2 it means iterations start at a temperature of 100. The end temperature is again the log(10) value. The number of iterations are equidistant on a log-scale.

Considerations in setting these parameters.....

1) start temperature. If this is too high you're "wasting time", as the algorithm is effectively just making a random walk at high temperatures. If the starting temperature is too low, you may already be in a (too) localized region of the search space, and never reach a good solution. Typically a starting temperature that gives you 90% or so acceptances (ignoring the rejected attempts, see below) is good. Better a bit too high than too low. But don't waste too much time.

2) end temperature. By the time that you reach the end temperature the number of accepted iterations should be only a few per 1000, and the best score should no longer change. Even zero acceptances is fine. If there are many more acceptances, lower end. If there are zero acceptances for many cycles in a row, raise it a bit. You can set a lower end temperature than needed using the earlyout test: if in 5 consecutive cycles of 1000 iterations there are fewer than a specified number of acceptances per cycle, the program terminates.

3) number of iterations. What really counts is the number of iterations in the "crunch time", when the number of acceptances is, say, more than 5% but fewer than 40% of the iterations. If you print summary statistics in blocks of 1000, you want to see as many blocks with such acceptance numbers as possible. Obviously within what is reasonable.

Here are two examples, with my analysis....

(A) logreg.anneal.control(start = 2, end = 1, iter = 50000, update = 1000)

The first few lines are (cutting of some of the last columns...)

log-temp	current score	best score	acc /	rej /	sing	current parameters
2.000	1198.785	1198.785	0	0	0	0.508 -0.368 -0.144
1.980	1197.962	1175.311	719(18)	34	229	1.273 -0.275 -0.109
1.960	1197.909	1168.159	722(11)	38	229	0.416 -0.345 -0.173
1.940	1181.545	1168.159	715(19)	35	231	0.416 -0.345 -0.173
...
1.020	1198.258	1167.578	663(16)	128	193	1.685 -0.216 -0.024
1.000	1198.756	1167.578	641(23)	104	232	1.685 -0.216 -0.024
1.000	1198.756	1167.578	1( 0)	0	0	1.685 -0.216 -0.024

Ignore the last line. This one is just showing a refitting of the best model. Otherwise, this suggests (i) end is ***way*** too high, as there are still have more than 600 acceptances in blocks of 1000. It is hard to judge what end should be from this run. (ii) The initial number of acceptances is really high (719+18)/(719+18+34))=95% - but when 1.00 is reached it's at about 85%. One could change start to 1, or keep it at 2 and play it save.

(B) logreg.anneal.control(start = 2, end = -2, iter = 50000, update = 1000) - different dataset/problem

The first few lines are

log-temp	current score	best score	acc /	rej /	sing	current parameters
2.000	1198.785	1198.785	0( 0)	0	0	0.50847 -0.36814
1.918	1189.951	1172.615	634(23)	22	322	0.38163 -0.28031
1.837	1191.542	1166.739	651(24)	32	293	1.75646 -0.22451
1.755	1191.907	1162.902	613(30)	20	337	1.80210 -0.32276

The last few are

log-temp	current score	best score	acc /	rej /	sing	current parameters
-1.837	1132.731	1131.866	0(18)	701	281	0.00513 -0.45994
-1.918	1132.731	1131.866	0(25)	676	299	0.00513 -0.45994
-2.000	1132.731	1131.866	0(17)	718	265	0.00513 -0.45994
-2.000	1132.731	1131.866	0( 0)	0	1	0.00513 -0.45994

But there really weren't any acceptances since

log-temp	current score	best score	acc /	rej /	sing	current parameters
-0.449	1133.622	1131.866	4(21)	875	100	0.00513 -0.45994
-0.531	1133.622	1131.866	0(19)	829	152	0.00513 -0.45994
-0.612	1133.622	1131.866	0(33)	808	159	0.00513 -0.45994

Going down from 400 to fewer than 10 acceptances went pretty fast....

log-temp	current score	best score	acc /	rej /	sing	current parameters
0.776	1182.156	1156.354	464(31)	258	247	1.00543 -0.26602
0.694	1168.504	1150.931	306(17)	355	322	1.56695 -0.43351
0.612	1167.747	1150.931	230(38)	383	349	1.56695 -0.43351
0.531	1162.085	1145.920	124(12)	571	293	1.15376 -0.15223
0.449	1143.841	1142.321	63(20)	590	327	2.20150 -0.43795
0.367	1176.152	1142.321	106(21)	649	224	2.20150 -0.43795
0.286	1138.384	1131.866	62(18)	731	189	0.00513 -0.45994
0.204	1138.224	1131.866	11(27)	823	139	0.00513 -0.45994
0.122	1150.370	1131.866	15(12)	722	251	0.00513 -0.45994
0.041	1144.536	1131.866	30(19)	789	162	0.00513 -0.45994
-0.041	1137.898	1131.866	21(25)	911	43	0.00513 -0.45994
-0.122	1139.403	1131.866	12(30)	883	75	0.00513 -0.45994

What does this tell me - (i) start was probably a bit high - no real harm done, (ii) end was lower than needed. Since there really weren't any acceptances after 10log(T) was about (-0.5), an ending log-temperature of (-1) would have been fine, (iii) there were far too few runs. The crunch time didn't take more than about 10 cycles (10000 iterations). You see that this is the time the "best model" decreased quite a bit - from 1156 to 1131. I would want to spend considerably more than 10000 iterations during this period for a larger problem (how many depends very much on the size of the problem). So, I'd pick (A)logreg.anneal.control(start = 2, end = -1, iter = 200000, update = 5000). Since the total range is reduced from 2-(-2)=4 to 2-(-1)=3, over a range of 10log temperatures of 1 there will be 200000/3=67000 rather than 50000/4=12500 iterations. I would repeat this run a couple of times.

In general I may sometimes run several models, and check the scores of the best models. If those are all the same, I'm very happy, if they're similar but not identical, it's OK, though I may run one or two longer chains. If they're very different, something is wrong. For the permutation test and cross-validation I am usually less picky on convergence.

References

Ruczinski I, Kooperberg C, LeBlanc ML (2003). Logic Regression, Journal of Computational and Graphical Statistics, 12, 475-511.

Ruczinski I, Kooperberg C, LeBlanc ML (2002). Logic Regression - methods and software. Proceedings of the MSRI workshop on Nonlinear Estimation and Classification (Eds: D. Denison, M. Hansen, C. Holmes, B. Mallick, B. Yu), Springer: New York, 333-344.

Selected chapters from the dissertation of Ingo Ruczinski, available from https://research.fredhutch.org/content/dam/stripe/kooperberg/ingophd-logic.pdf

Examples

Run this code

myannealcontrol <- logreg.anneal.control(start = 2, end = -2, iter = 50000, update = 1000)

Run the code above in your browser using DataLab

Get 50% off unlimited learning