Learn R Programming

RBERT (version 0.1.11)

truncate_seq_pair: Truncate a sequence pair to the maximum length.

Description

Truncates a sequence pair to the maximum length. This is a simple heuristic which will always truncate the longer sequence one token at a time (or the first sequence in case of a tie -JDB). This makes more sense than truncating an equal percent of tokens from each, since if one sequence is very short then each token that's truncated likely contains more information than a longer sequence.

Usage

truncate_seq_pair(tokens_a, tokens_b, max_length)

Arguments

tokens_a

Character; a vector of tokens in the first input sequence.

tokens_b

Character; a vector of tokens in the second input sequence.

max_length

Integer; the maximum total length of the two sequences.

Value

A list containing two character vectors: trunc_a and trunc_b.

Details

The python code truncated the sequences in place, using the pass-by-reference functionality of python. In R, we return the truncated sequences in a list.

Examples

Run this code
# NOT RUN {
tokens_a <- c("a", "b", "c", "d")
tokens_b <- c("w", "x", "y", "z")
truncate_seq_pair(tokens_a, tokens_b, 5)
# }

Run the code above in your browser using DataLab