# tweights

##### Function `tweights`

Returns a vector `p`

of resampling probabilities
such that the column means of `tboot(dataset = dataset, p = p)`

equals `target`

on average.

##### Usage

```
tweights(
dataset,
target = apply(dataset, 2, mean),
distance = "klqp",
maxit = 1000,
tol = 1e-08,
warningcut = 0.05,
silent = FALSE,
Nindependent = 0
)
```

##### Arguments

- dataset
Data frame or matrix to use to find row weights.

- target
Numeric vector of target column means. If the 'target' is named, then all elements of names(target) should be in the dataset.

- distance
The distance to minimize. Must be either 'euchlidean,' 'klqp' or 'klpq' (i.e. Kullback-Leibler). 'klqp' which is expontential tilting is recomneded.

- maxit
Defines the maximum number of iterations for optimizing 'kl' distance.

- tol
Tolerance. If the achieved mean is to far from the target (i.e. as defined by tol) an error will be thrown.

- warningcut
Sets the cutoff for determining when a large weight will trigger a warnint.

- silent
Allows silencing some messages.

- Nindependent
Assumes the input also includes 'Nindependent'samples with independent columns. See details.

##### Details

Let \(p_i = 1/n\) be probability of sampling subject \(i\) from a dataset with \(n\) individuals (i.e. rows of the dataset) in the classic resampling with replacement scheme.
Also, let \(q_i\) be the probability of sampling subject \(i\) from a dataset with \(n\) individuals in our new resampling scheme. Let \(d(q,p)\) represent a distance between the two resampling schemes. The `tweights`

function seeks to solve the problem:
$$q = argmin_p d(q,p)$$
Subject to the constraint that:
$$ sum_i q_i = 1$$ and
$$ dataset' q = target$$
where dataset is a n x K matrix of variables input to the function.

$$d_euclidian(q,p) = sqrt( sum_i (p_i-q_i)^2 )$$ $$d_kl(q,p) = sum_i (log(p_i) - log(q_i))$$

Optimization for euclidean distance is a quadratic program and utilizes the ipop function in kernLab. The euclidean based solution helps form a starting value which is used along with the constOptim function and lagrange multipliers to solve the Kullback-Leibler distance optimization. Output is the optimal porbability (p)

The 'Nindependent' option augments the dataset by assuming some additional specified number of patients. These pateints are assumed to made up of a random bootstrapped sample from the dataset for each variable marginaly leading to indepenent variables.

##### Value

An object of type `tweights`

. This object conains the following components:

- weights
tilted weights for resampling

- originalTarget
Will be null if target was not changed.

- target
Actual target that was attempted.

- achievedMean
Achieved mean from tilting.

- dataset
Inputed dataset.

- X
Reformated dataset.

- Nindependent
Inputed 'Nindependent' option.

##### See Also

##### Examples

```
# NOT RUN {
target=c(Sepal.Length=5.5, Sepal.Width=2.9, Petal.Length=3.4)
w = tweights(dataset = iris, target = target, silent = TRUE)
simulated_data = tboot(nrow = 1000, weights = w)
# }
```

*Documentation reproduced from package tboot, version 0.2.0, License: GPL-3*