Introduced transformer_vocab() with an optional decode parameter to return decoded tokenized words.
New dataset df_jaeger14: Self-paced reading data on Chinese relative clauses.
New dataset df_sent: Example dataset with two word-by-word tokenized sentences.
New vignette: Added a worked-out example of a causal model.
Enhancements:
Added sep argument in causal_words_pred() to support languages without spaces between words (e.g., Chinese).
New log.p argument across multiple functions to specify how predictability is calculated (e.g., log base e, log base 2 for bits, or raw probabilities).
Improved tokenization utilities: tokenize_lst() now supports decoded outputs via the decode parameter.