Simulate Comparisons For Use in Sequential Markov Longitudinal Clinical Trial Simulations

```
estSeqMarkovOrd(
y,
times,
initial,
absorb = NULL,
intercepts,
parameter,
looks,
g,
formula,
ppo = NULL,
yprevfactor = TRUE,
groupContrast = NULL,
cscov = FALSE,
timecriterion = NULL,
coxzph = FALSE,
sstat = NULL,
rdsample = NULL,
maxest = NULL,
maxvest = NULL,
nsim = 1,
progress = FALSE,
pfile = ""
)
```

a data frame with number of rows equal to the product of `nsim`

, the length of `looks`

, and the length of `parameter`

, with variables `sim`

, `parameter`

, `look`

, `est`

(log odds ratio for group), and `vest`

(the variance of the latter). If `timecriterion`

is specified the data frame also contains `loghr`

(Cox log hazard ratio for group), `lrchisq`

(chi-square from Cox test for group), and if `coxph=TRUE`

, `phchisq`

, the chi-square for testing proportional hazards. The attribute `etimefreq`

is also present if `timecriterion`

is present, and it probvides the frequency distribution of derived event times by group and censoring/event indicator. If `sstat`

is given, the attribute `sstat`

is also present, and it contains an array with dimensions corresponding to simulations, parameter values within simulations, `id`

, and a two-column subarray with columns `group`

and `y`

, the latter being the summary measure computed by the `sstat`

function. The returned data frame also has attribute `lrmcoef`

which are the last-look logistic regression coefficient estimates over the `nsim`

simulations and the parameter settings, and an attribute `failures`

which is a data frame containing the variables `reason`

and `frequency`

cataloging the reasons for unsuccessful model fits.

- y
vector of possible y values in order (numeric, character, factor)

- times
vector of measurement times

- initial
a vector of probabilities summing to 1.0 that specifies the frequency distribution of initial values to be sampled from. The vector must have names that correspond to values of

`y`

representing non-absorbing states.- absorb
vector of absorbing states, a subset of

`y`

. The default is no absorbing states. Observations are truncated when an absorbing state is simulated. May be numeric, character, or factor.- intercepts
vector of intercepts in the proportional odds model. There must be one fewer of these than the length of

`y`

.- parameter
vector of true parameter (effects; group differences) values. These are group 2:1 log odds ratios in the transition model, conditioning on the previous

`y`

.- looks
integer vector of ID numbers at which maximum likelihood estimates and their estimated variances are computed. For a single look specify a scalar value for

`loops`

equal to the number of subjects in the sample.- g
a user-specified function of three or more arguments which in order are

`yprev`

- the value of`y`

at the previous time, the current time`t`

, the`gap`

between the previous time and the current time, an optional (usually named) covariate vector`X`

, and optional arguments such as a regression coefficient value to simulate from. The function needs to allow`yprev`

to be a vector and`yprev`

must not include any absorbing states. The`g`

function returns the linear predictor for the proportional odds model aside from`intercepts`

. The returned value must be a matrix with row names taken from`yprev`

. If the model is a proportional odds model, the returned value must be one column. If it is a partial proportional odds model, the value must have one column for each distinct value of the response variable Y after the first one, with the levels of Y used as optional column names. So columns correspond to`intercepts`

. The different columns are used for`y`

-specific contributions to the linear predictor (aside from`intercepts`

) for a partial or constrained partial proportional odds model. Parameters for partial proportional odds effects may be included in the ... arguments.- formula
a formula object given to the

`lrm()`

function using variables with these name:`y`

,`time`

,`yprev`

, and`group`

(factor variable having values '1' and '2'). The`yprev`

variable is converted to a factor before fitting the model unless`yprevfactor=FALSE`

.- ppo
a formula specifying the part of

`formula`

for which proportional odds is not to be assumed, i.e., that specifies a partial proportional odds model. Specifying`ppo`

triggers the use of`VGAM::vglm()`

instead of`rms::lrm`

and will make the simulations run slower.- yprevfactor
see

`formula`

- groupContrast
omit this argument if

`group`

has only one regression coefficient in`formula`

. Otherwise if`ppo`

is omitted, provide`groupContrast`

as a list of two lists that are passed to`rms::contrast.rms()`

to compute the contrast of interest and its standard error. The first list corresponds to group 1, the second to group 2, to get a 2:1 contrast. If`ppo`

is given and the group effect is not just a simple regression coefficient, specify as`groupContrast`

a function of a`vglm`

fit that computes the contrast of interest and its standard error and returns a list with elements named`Contrast`

and`SE`

. For the latter type you can optionally have formal arguments`n1`

,`n2`

, and`parameter`

that are passed to`groupContrast`

to compute the standard error of the group contrast, where`n1`

and`n2`

respectively are the sample sizes for the two groups and`parameter`

is the true group effect parameter value.- cscov
applies if

`ppo`

is not used. Set to`TRUE`

to use the cluster sandwich covariance estimator of the variance of the group comparison.- timecriterion
a function of a time-ordered vector of simulated ordinal responses

`y`

that returns a vector`FALSE`

or`TRUE`

values denoting whether the current`y`

level met the condition of interest. For example`estSeqMarkovOrd`

will compute the first time at which`y >= 5`

if you specify`timecriterion=function(y) y >= 5`

. This function is only called at the last data look for each simulated study. To have more control, instead of`timecriterion`

returning a logical vector have it return a numeric 2-vector containing, in order, the event/censoring time and the 1/0 event/censoring indicator.- coxzph
set to

`TRUE`

if`timecriterion`

is specified and you want to compute a statistic for testing proportional hazards at the last look of each simulated data- sstat
set to a function of the time vector and the corresponding vector of ordinal responses for a single group if you want to compute a Wilcoxon test on a derived quantity such as the number of days in a given state.

- rdsample
an optional function to do response-dependent sampling. It is a function of these arguments, which are vectors that stop at any absorbing state:

`times`

(ascending measurement times for one subject),`y`

(vector of ordinal outcomes at these times for one subject. The function returns`NULL`

if no observations are to be dropped, returns the vector of new times to sample.- maxest
maximum acceptable absolute value of the contrast estimate, ignored if

`NULL`

. Any values exceeding`maxest`

will result in the estimate being set to`NA`

.- maxvest
like

`maxest`

but for the estimated variance of the contrast estimate- nsim
number of simulations (default is 1)

- progress
set to

`TRUE`

to send current iteration number to`pfile`

every 10 iterations. Each iteration will really involve multiple simulations, if`parameter`

has length greater than 1.- pfile
file to which to write progress information. Defaults to

`''`

which is the console. Ignored if`progress=FALSE`

.

Frank Harrell

Simulates sequential clinical trials of longitudinal ordinal outcomes using a first-order Markov model. Looks are done sequentially after subject ID numbers given in the vector `looks`

with the earliest possible look being after subject 2. At each look, a subject's repeated records are either all used or all ignored depending on the sequent ID number. For each true effect parameter value, simulation, and at each look, runs a function to compute the estimate of the parameter of interest along with its variance. For each simulation, data are first simulated for the last look, and these data are sequentially revealed for earlier looks. The user provides a function `g`

that has extra arguments specifying the true effect of `parameter`

the treatment `group`

expecting treatments to be coded 1 and 2. `parameter`

is usually on the scale of a regression coefficient, e.g., a log odds ratio. Fitting is done using the `rms::lrm()`

function, unless non-proportional odds is allowed in which case `VGAM::vglm()`

is used. If `timecriterion`

is specified, the function also, for the last data look only, computes the first time at which the criterion is satisfied for the subject or use the event time and event/censoring indicator computed by `timecriterion`

. The Cox/logrank chi-square statistic for comparing groups on the derived time variable is saved. If `coxzph=TRUE`

, the `survival`

package correlation coefficient `rho`

from the scaled partial residuals is also saved so that the user can later determine to what extent the Markov model resulted in the proportional hazards assumption being violated when analyzing on the time scale. `vglm`

is accelerated by saving the first successful fit for the largest sample size and using its coefficients as starting value for further `vglm`

fits for any sample size for the same setting of `parameter`

.

`gbayesSeqSim()`

, `simMarkovOrd()`

, https://hbiostat.org/R/Hmisc/markov/