Skip to contents

For each mask in a sentence, get the possible tokens and their log probabilities using a masked transformer.


  model = getOption("pangoling.masked.default"),
  add_special_tokens = NULL,
  config_model = NULL,
  config_tokenizer = NULL



Masked sentences.


Name of a pre-trained model or folder.


Whether to include special tokens. It has the same default as the AutoTokenizer method in Python.


List with other arguments that control how the model from Hugging Face is accessed.


List with other arguments that control how the tokenizer from Hugging Face is accessed.


A table with the masked sentences, the tokens (token), log probability (lp), and the respective mask number (mask_n).


A masked language model (also called BERT-like, or encoder model) is a type of large language model that can be used to predict the content of a mask in a sentence.

If not specified, the masked model that will be used is the one set in specified in the global option pangoling.masked.default, this can be accessed via getOption("pangoling.masked.default") (by default "bert-base-uncased"). To change the default option use options(pangoling.masked.default = "newmaskedmodel").

A list of possible masked can be found in Hugging Face website.

Using the config_model and config_tokenizer arguments, it's possible to control how the model and tokenizer from Hugging Face is accessed, see the python method from_pretrained for details. In case of errors check the status of https://status.huggingface.co/

More examples

See the online article in pangoling website for more examples.

See also

Other masked model functions: masked_config(), masked_lp(), masked_preload()


if (FALSE) { # interactive()
masked_tokens_tbl("The [MASK] doesn't fall far from the tree.",
  model = "bert-base-uncased"