is_chinese_char

<p>A unicode codepoint, as an integer.</p>

<p>(R implementation of BasicTokenizer._is_chinese_char from
BERT: tokenization.py. From that file:
 This defines a "chinese character" as anything in the CJK Unicode block:
  https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)</p>

internal

Use pretrained models from Google Research's BERT in R.

Jonathan Bratt

RBERT

is_chinese_char: Check whether cp is the codepoint of a CJK character.

Description

Usage

Arguments

Value

Details