Changelog • pangoling

pangoling (development version)

Added checkpoint parameter to causal_preload() and masked_preload() to allow loading models from checkpoints.
Introduced causal_next_tokens_pred_tbl(), which replaces causal_next_tokens_tbl() and provides improved predictability calculations.
Added causal_words_pred(), causal_targets_pred(), and causal_tokens_pred_lst() to compute predictability for words, phrases, or tokens, replacing causal_lp() and causal_tokens_lp_tbl().
Introduced masked_tokens_pred_tbl(), replacing masked_tokens_tbl(), for retrieving possible tokens and their log probabilities.
Introduced masked_targets_pred(), replacing masked_lp(), for calculating predictability based on left and right context.
Introduced transformer_vocab() with an optional decode parameter to return decoded tokenized words.
New dataset df_jaeger14: Self-paced reading data on Chinese relative clauses.
New dataset df_sent: Example dataset with two word-by-word tokenized sentences.
New vignette: Added a worked-out example of a causal model.

Added sep argument in causal_words_pred() to support languages without spaces between words (e.g., Chinese).
New log.p argument across multiple functions to specify how predictability is calculated (e.g., log base e, log base 2 for bits, or raw probabilities).
Improved tokenization utilities: tokenize_lst() now supports decoded outputs via the decode parameter.
Updated install_py_pangoling() to enhance Python environment handling.
Added perplexity_calc() for computing perplexity from probabilities.