space_cjk {piecemaker} | R Documentation |
To tokenize Chinese, Japanese, and Korean (CJK) characters, it's convenient to add spaces around the characters.
space_cjk(text)
text |
A character vector to clean. |
A character vector the same length as the input text, with spaces added between ideographs.
to_space <- intToUtf8(13312:13320)
to_space
space_cjk(to_space)