Get the log probability of a target word (or phrase) given a left and right context
Source:R/tr_masked.R
masked_lp.Rd
Get the log probability of a vector of target words (or phrase) given a vector of left and of right contexts using a masked transformer.
Usage
masked_lp(
l_contexts,
targets,
r_contexts,
ignore_regex = "",
model = getOption("pangoling.masked.default"),
add_special_tokens = NULL,
config_model = NULL,
config_tokenizer = NULL
)
Arguments
- l_contexts
Left context of the target word.
- targets
Target words.
- r_contexts
Right context of the target word.
- ignore_regex
Can ignore certain characters when calculates the log probabilities. For example
^[[:punct:]]$
will ignore all punctuation that stands alone in a token.- model
Name of a pre-trained model or folder.
- add_special_tokens
Whether to include special tokens. It has the same default as the AutoTokenizer method in Python.
- config_model
List with other arguments that control how the model from Hugging Face is accessed.
- config_tokenizer
List with other arguments that control how the tokenizer from Hugging Face is accessed.
Details
A masked language model (also called BERT-like, or encoder model) is a type of large language model that can be used to predict the content of a mask in a sentence.
If not specified, the masked model that will be used is the one set in
specified in the global option pangoling.masked.default
, this can be
accessed via getOption("pangoling.masked.default")
(by default
"bert-base-uncased"). To change the default option
use options(pangoling.masked.default = "newmaskedmodel")
.
A list of possible masked can be found in Hugging Face website.
Using the config_model
and config_tokenizer
arguments, it's possible to
control how the model and tokenizer from Hugging Face is accessed, see the
python method
from_pretrained
for details. In case of errors
check the status of
https://status.huggingface.co/
More examples
See the online article in pangoling website for more examples.
See also
Other masked model functions:
masked_config()
,
masked_preload()
,
masked_tokens_tbl()