truncate_seq_pair: Truncate a sequence pair to the maximum length.
Description
Truncates a sequence pair to the maximum length.
This is a simple heuristic which will always truncate the longer sequence one
token at a time (or the first sequence in case of a tie -JDB). This makes
more sense than truncating an equal percent of tokens from each, since if one
sequence is very short then each token that's truncated likely contains more
information than a longer sequence.
Usage
truncate_seq_pair(tokens_a, tokens_b, max_length)
Arguments
tokens_a
Character; a vector of tokens in the first input sequence.
tokens_b
Character; a vector of tokens in the second input sequence.
max_length
Integer; the maximum total length of the two sequences.
Value
A list containing two character vectors: trunc_a and trunc_b.
Details
The python code truncated the sequences in place, using the pass-by-reference
functionality of python. In R, we return the truncated sequences in a list.