,

Skip to contents

This dataset contains tokenized words from two example sentences, split word-by-word. It is structured to demonstrate the use of the pangoling package for processing text data. package for processing text data.

Usage

df_sent

Format

A data frame with 15 rows and 2 columns:

sent_n

(integer) Sentence number, indicating which sentence each word belongs to.

word

(character) Tokenized words from the sentences.

See also

Other datasets: df_jaeger14

Examples

# Load the dataset
data("df_sent")
df_sent
#> # A tidytable: 15 × 2
#>    sent_n word   
#>     <int> <chr>  
#>  1      1 The    
#>  2      1 apple  
#>  3      1 doesn't
#>  4      1 fall   
#>  5      1 far    
#>  6      1 from   
#>  7      1 the    
#>  8      1 tree.  
#>  9      2 Don't  
#> 10      2 judge  
#> 11      2 a      
#> 12      2 book   
#> 13      2 by     
#> 14      2 its    
#> 15      2 cover.