,

Skip to contents

The number of tokens in a string or vector of strings

Usage

ntokens(
  x,
  model = getOption("pangoling.causal.default"),
  add_special_tokens = NULL,
  config_tokenizer = NULL
)

Arguments

x

character input

model

Name of a pre-trained model or folder.

add_special_tokens

Whether to include special tokens. It has the same default as the AutoTokenizer method in Python.

config_tokenizer

List with other arguments that control how the tokenizer from Hugging Face is accessed.

Value

The number of tokens in a string or vector of words.

See also

Other token-related functions: tokenize_lst(), transformer_vocab()

Examples

if (FALSE) { # interactive()
ntokens(x = c("The apple doesn't fall far from the tree."), model = "gpt2")
}