The function batch-wise computes the total number of tokens in a text file. The function returns a numeric value indicating the total number of tokens in the file. The function can be used on very large text files.
num_tokens_file(filename, batch_size = 1000, encoding = "cl100k_base")
a numeric value indicating the total number of tokens in the text file
character string indicating the name of the text file to read in
integer indicating the number of lines to read in per batch (default is 1000)
character string indicating the encoding to use (default is "cl100k_base")