Package index • pangoling

,

About

pangoling pangoling-package: pangoling: Access to Large Language Model Predictions

Causal (or GPT-like) modeling

causal_next_tokens_pred_tbl(): Generate next tokens after a context and their predictability using a causal transformer model

causal_pred_mats(): Generate a list of predictability matrices using a causal transformer model

causal_words_pred() causal_tokens_pred_lst() causal_targets_pred(): Compute predictability using a causal transformer model

Masked (or BERT-like) modeling

masked_targets_pred(): Get the predictability of a target word (or phrase) given a left and right context

masked_tokens_pred_tbl(): Get the possible tokens and their log probabilities for each mask in a sentence

Helper functions for causal and masked models

causal_config(): Returns the configuration of a causal model

causal_preload(): Preloads a causal language model

masked_config(): Returns the configuration of a masked model

masked_preload(): Preloads a masked language model

install_py_pangoling(): Install the Python packages needed for pangoling

set_cache_folder(): Set cache folder for HuggingFace transformers

Vocabulary and tokenization

ntokens(): The number of tokens in a string or vector of strings

tokenize_lst(): Tokenize an input

transformer_vocab(): Returns the vocabulary of a model

Others

perplexity_calc(): Calculates perplexity

df_jaeger14: Self-Paced Reading Dataset on Chinese Relative Clauses

df_sent: Example dataset: Two word-by-word tokenized sentences