,

Skip to contents

About

pangoling pangoling-package
pangoling: Access to Large Language Model Predictions

Causal (or GPT-like) modeling

causal_next_tokens_pred_tbl()
Generate next tokens after a context and their predictability using a causal transformer model
causal_pred_mats()
Generate a list of predictability matrices using a causal transformer model
causal_words_pred() causal_tokens_pred_lst() causal_targets_pred()
Compute predictability using a causal transformer model

Masked (or BERT-like) modeling

masked_targets_pred()
Get the predictability of a target word (or phrase) given a left and right context
masked_tokens_pred_tbl()
Get the possible tokens and their log probabilities for each mask in a sentence

Helper functions for causal and masked models

causal_config()
Returns the configuration of a causal model
causal_preload()
Preloads a causal language model
masked_config()
Returns the configuration of a masked model
masked_preload()
Preloads a masked language model
install_py_pangoling()
Install the Python packages needed for pangoling
set_cache_folder()
Set cache folder for HuggingFace transformers

Vocabulary and tokenization

ntokens()
The number of tokens in a string or vector of strings
tokenize_lst()
Tokenize an input
transformer_vocab()
Returns the vocabulary of a model

Others

perplexity_calc()
Calculates perplexity
df_jaeger14
Self-Paced Reading Dataset on Chinese Relative Clauses
df_sent
Example dataset: Two word-by-word tokenized sentences