Control of simulated annealing parameters needed in
logreg
.
logreg.anneal.control(start=0, end=0, iter=0, earlyout=0, update=0)
A list with arguments start
, end
, iter
,
earlyout
, and update
, that can be used as the value of
the argument anneal.control
oflogreg
.
the upper temperature (on a log10 scale) in the annealing
chain. I.e. if start = 3
, the annealing chain starts at
temperature 1000. The acceptance function is the usual
min(1,exp(diff(scores)/temp))
, so any temperature larger than what
would be expected as possible differences between any two models
pretty much generates a random walk in the beginning, and means that
you need to wait longer on results. A too low starting temperature
means that the chain may end up in a local optimal (rather than global
optimal) solution. If you select both start
and end
the
default of 0, the program will attempt to find reasonable numbers
itself (it is known to be only moderately successful in this though).
the lower temperature (on a log10 scale) in the annealing
chain. I.e. if end
is 2, the annealing chain ends at
temperature 0.01. If this temperature is very low one can use the
early out possibility listed below, as otherwise the chain may run
longer than desired!
the total number of iterations in the annealing chain. This is the total over all annealing chains, not the number of iterations of a chain at a given temperature. If this number is too small the chain may not find a good (the best) solution, if the chain is too long the program may take long...
if the end
temperature is very low, the simulated
annealing algorithm may not move any more, but one still needs to wait
on all possible moves being evaluated (and rejected)! An early out
possibility is offered. If during consecutive five blocks of
earlyout
iterations, in each block 10 or fewer moves are
accepted (for which the score changes), the program terminates. This
is a desirable option after one is convinced the program otherwise
runs fine: it can be dangerous on the first run.
every how many iterations there should be an update of
the scores. I.e. if update = 1000
, a score will get printed
every 1000 iterations. So if iter = 100000
iterations, there
will be 100 updates on your screen. If you update = 0
, a one
line summary for each fitted model is printed. If update = 1
,
there is virtually no printed output.
Ingo Ruczinski ingo@jhu.edu and Charles Kooperberg clk@fredhutch.org.
Missing arguments take defaults. If the argument start
is a
list with arguments start
, end
, iter
,
earlyout
, and update
, those values take precedent of
directly specified values.
This is a rough outline how the automated simulated annealing works: The algorithm starts running at a very high temperature, and decreases the temperature until the acceptance ratio of moves is below a certain threshold (in the neighborhood of 95%). At this point we run longer chains at fixed temperatures, and stop the search when the last "n" consecutive moves have been rejected. If you think that the search was either not sufficiently long or excessively long (both of which can very well happen since it is pretty much impossible to specify default values that are appropriate for all sorts of data and models), you can overwrite the default values.
If you want more detailed information continue reading....
These are some more detailed suggestions on how to set the parameters
for the beginning temperature, end temperature and number of
iterations for the Logic Regression routine. Note that if start
temperature and end
temperature are both zero, the routine uses
its default values. The number of iterations iter
is irrelevant
in this case. In our opinion, the default values are OK, but not
great, and you can usually do better if you're willing to invest time
in learning how to set the parameters.
The starting temperature is the log(10) value of start

i.e., if start
is 2
it means iterations start at a temperature of 100. The
end
temperature is again the log(10) value. The number of iterations
are equidistant on a logscale.
Considerations in setting these parameters.....
1) start
temperature. If this is too high you're "wasting time", as
the algorithm is effectively just making a random walk at high
temperatures. If the starting temperature is too low, you may already
be in a (too) localized region of the search space, and never reach a
good solution. Typically a starting temperature that gives you 90%
or so acceptances (ignoring the rejected attempts, see below) is
good. Better a bit too high than too low. But don't waste too much
time.
2) end
temperature. By the time that you reach the
end
temperature the
number of accepted iterations should be only a few per 1000, and the
best score should no longer change. Even zero acceptances is fine. If
there are many more acceptances, lower end
. If there
are zero acceptances for many cycles in a row, raise it a bit. You can
set a lower end
temperature than needed using the earlyout
test: if
in 5 consecutive cycles of 1000 iterations there are fewer than a
specified number of acceptances per cycle, the program terminates.
3) number of iterations. What really counts is the number of iterations in the "crunch time", when the number of acceptances is, say, more than 5% but fewer than 40% of the iterations. If you print summary statistics in blocks of 1000, you want to see as many blocks with such acceptance numbers as possible. Obviously within what is reasonable.
Here are two examples, with my analysis....
(A) logreg.anneal.control(start = 2, end = 1, iter = 50000, update = 1000)
The first few lines are (cutting of some of the last columns...)
logtemp  current score  best score  acc /  rej /  sing  current parameters 
2.000  1198.785  1198.785  0  0  0  0.508 0.368 0.144 
1.980  1197.962  1175.311  719(18)  34  229  1.273 0.275 0.109 
1.960  1197.909  1168.159  722(11)  38  229  0.416 0.345 0.173 
1.940  1181.545  1168.159  715(19)  35  231  0.416 0.345 0.173 
...  
1.020  1198.258  1167.578  663(16)  128  193  1.685 0.216 0.024 
1.000  1198.756  1167.578  641(23)  104  232  1.685 0.216 0.024 
1.000  1198.756  1167.578  1( 0)  0  0  1.685 0.216 0.024 
Ignore the last line. This one is just showing a refitting of the best
model. Otherwise, this suggests
(i) end
is ***way*** too high, as there are still have
more than 600 acceptances in blocks of 1000. It is hard to judge what
end
should be from this run.
(ii) The initial number of acceptances is really high
(719+18)/(719+18+34))=95%
 but when 1.00
is reached it's at about
85%. One could change start
to 1, or keep it at 2 and play it save.
(B) logreg.anneal.control(start = 2, end = 2, iter = 50000, update = 1000)
 different dataset/problem
The first few lines are
logtemp  current score  best score  acc /  rej /  sing  current parameters 
2.000  1198.785  1198.785  0( 0)  0  0  0.50847 0.36814 
1.918  1189.951  1172.615  634(23)  22  322  0.38163 0.28031 
1.837  1191.542  1166.739  651(24)  32  293  1.75646 0.22451 
1.755  1191.907  1162.902  613(30)  20  337  1.80210 0.32276 
The last few are
logtemp  current score  best score  acc /  rej /  sing  current parameters 
1.837  1132.731  1131.866  0(18)  701  281  0.00513 0.45994 
1.918  1132.731  1131.866  0(25)  676  299  0.00513 0.45994 
2.000  1132.731  1131.866  0(17)  718  265  0.00513 0.45994 
2.000  1132.731  1131.866  0( 0)  0  1  0.00513 0.45994 
But there really weren't any acceptances since
logtemp  current score  best score  acc /  rej /  sing  current parameters 
0.449  1133.622  1131.866  4(21)  875  100  0.00513 0.45994 
0.531  1133.622  1131.866  0(19)  829  152  0.00513 0.45994 
0.612  1133.622  1131.866  0(33)  808  159  0.00513 0.45994 
Going down from 400 to fewer than 10 acceptances went pretty fast....
logtemp  current score  best score  acc /  rej /  sing  current parameters 
0.776  1182.156  1156.354  464(31)  258  247  1.00543 0.26602 
0.694  1168.504  1150.931  306(17)  355  322  1.56695 0.43351 
0.612  1167.747  1150.931  230(38)  383  349  1.56695 0.43351 
0.531  1162.085  1145.920  124(12)  571  293  1.15376 0.15223 
0.449  1143.841  1142.321  63(20)  590  327  2.20150 0.43795 
0.367  1176.152  1142.321  106(21)  649  224  2.20150 0.43795 
0.286  1138.384  1131.866  62(18)  731  189  0.00513 0.45994 
0.204  1138.224  1131.866  11(27)  823  139  0.00513 0.45994 
0.122  1150.370  1131.866  15(12)  722  251  0.00513 0.45994 
0.041  1144.536  1131.866  30(19)  789  162  0.00513 0.45994 
0.041  1137.898  1131.866  21(25)  911  43  0.00513 0.45994 
0.122  1139.403  1131.866  12(30)  883  75  0.00513 0.45994 
What does this tell me 
(i) start
was probably a bit high  no real harm
done,
(ii) end
was lower than needed. Since there really
weren't any acceptances after 10log(T) was about (0.5
), an ending
logtemperature of (1
) would have been fine,
(iii) there were far too few runs. The crunch time didn't take more
than about 10 cycles (10000 iterations). You see that this is the time
the "best model" decreased quite a bit  from 1156 to 1131. I would
want to spend considerably more than 10000 iterations during this
period for a larger problem (how many depends very much on the size of
the problem). So, I'd pick (A)logreg.anneal.control(start = 2,
end = 1, iter = 200000, update = 5000)
. Since the total range is
reduced from 2(2)=4
to 2(1)=3
, over a range of 10log temperatures
of 1 there will be 200000/3=67000
rather than 50000/4=12500
iterations. I would repeat this run a couple of times.
In general I may sometimes run several models, and check the scores of the best models. If those are all the same, I'm very happy, if they're similar but not identical, it's OK, though I may run one or two longer chains. If they're very different, something is wrong. For the permutation test and crossvalidation I am usually less picky on convergence.
Ruczinski I, Kooperberg C, LeBlanc ML (2003). Logic Regression, Journal of Computational and Graphical Statistics, 12, 475511.
Ruczinski I, Kooperberg C, LeBlanc ML (2002). Logic Regression  methods and software. Proceedings of the MSRI workshop on Nonlinear Estimation and Classification (Eds: D. Denison, M. Hansen, C. Holmes, B. Mallick, B. Yu), Springer: New York, 333344.
Selected chapters from the dissertation of Ingo Ruczinski, available from https://research.fredhutch.org/content/dam/stripe/kooperberg/ingophdlogic.pdf
logreg
,
logreg.mc.control
,
logreg.tree.control
myannealcontrol < logreg.anneal.control(start = 2, end = 2, iter = 50000, update = 1000)
Run the code above in your browser using DataLab