,

Skip to contents

This dataset contains data from a self-paced reading experiment on Chinese relative clause comprehension. It is structured to support analysis of reaction times, comprehension accuracy, and surprisal values across various experimental conditions in a 2x2 fully crossed factorial design:

Usage

data(df_jaeger14)

Format

A tibble with 8,624 rows and 15 variables:

subject

Participant identifier, a character vector.

item

Trial item number, an integer.

cond

Experimental condition, a character vector indicating variations in sentence structure (e.g., "a", "b", "c", "d").

word

Chinese word presented in each trial, a character vector.

wordn

Position of the word within the sentence, an integer.

rt

Reaction time in milliseconds for reading each word, an integer.

region

Sentence region or phrase type (e.g., "hd1", "Det+CL"), a character vector.

question

Comprehension question associated with the trial, a character vector.

accuracy

Binary accuracy score for the comprehension question (1 = correct, 0 = incorrect).

correct_answer

Expected correct answer for the comprehension question, a character vector ("Y" or "N").

question_type

Type of comprehension question, a character vector.

experiment

Name of the experiment, indicating self-paced reading, a character vector.

list

Experimental list number, for counterbalancing item presentation, an integer.

sentence

Full sentence used in the trial with words marked for analysis, a character vector.

surprisal

Model-derived surprisal values for each word, a numeric vector.

Region codes in the dataset (column region):

  • N: Main clause subject (in object-modifications only)

  • V: Main clause verb (in object-modifications only)

  • Det+CL: Determiner+classifier

  • Adv: Adverb

  • VN: RC-verb+RC-object (subject relatives) or RC-subject+RC-verb (object relatives)

    • Note: These two words were merged into one region after the experiment; they were presented as separate regions during the experiment.

  • FreqP: Frequency phrase/durational phrase

  • DE: Relativizer "de"

  • head: Relative clause head noun

  • hd1: First word after the head noun

  • hd2: Second word after the head noun

  • hd3: Third word after the head noun

  • hd4: Fourth word after the head noun (only in subject-modifications)

  • hd5: Fifth word after the head noun (only in subject-modifications)

Notes on reading times (column rt):

  • The reading time of the relative clause region (e.g., "V-N" or "N-V") was computed by summing up the reading times of the relative clause verb and noun.

  • The verb and noun were presented as two separate regions during the experiment.

Source

Jäger, L., Chen, Z., Li, Q., Lin, C.-J. C., & Vasishth, S. (2015). The subject-relative advantage in Chinese: Evidence for expectation-based processing. Journal of Memory and Language, 79–80, 97-120. https://doi.org/10.1016/j.jml.2014.10.005

Details

  • Factor I: Modification type (subject modification; object modification)

  • Factor II: Relative clause type (subject relative; object relative)

Condition labels:

  • a) subject modification; subject relative

  • b) subject modification; object relative

  • c) object modification; subject relative

  • d) object modification; object relative

See also

Other datasets: df_sent

Examples

# Basic exploration
head(df_jaeger14)
#> # A tidytable: 6 × 14
#>   subject  item cond  word   wordn    rt region question accuracy correct_answer
#>   <chr>   <int> <chr> <chr>  <int> <int> <fct>  <chr>       <int>          <int>
#> 1 1m1         1 a     那個       1   360 Det+CL 那個顧客聽說過…        1              1
#> 2 1m1         1 a     昨晚       2   359 Adv    那個顧客聽說過…        1              1
#> 3 1m1         1 a     揍了服務生…     3   344 VN     那個顧客聽說過…        1              1
#> 4 1m1         1 a     一頓       4   313 FreqP  那個顧客聽說過…        1              1
#> 5 1m1         1 a     的         5   297 DE     那個顧客聽說過…        1              1
#> 6 1m1         1 a     顧客       6   312 head   那個顧客聽說過…        1              1
#> # ℹ 4 more variables: question_type <int>, experiment <chr>, list <int>,
#> #   sentence <chr>

# Summarize reaction times by region
 library(tidytable)
df_jaeger14 |>
  group_by(region) |>
  summarize(mean_rt = mean(rt, na.rm = TRUE))
#> # A tidytable: 13 × 2
#>    region mean_rt
#>    <fct>    <dbl>
#>  1 N         614.
#>  2 V         538.
#>  3 Det+CL    513.
#>  4 Adv       541.
#>  5 VN        618.
#>  6 FreqP     603.
#>  7 DE        439.
#>  8 head      653.
#>  9 hd1       604.
#> 10 hd2       538.
#> 11 hd3       704.
#> 12 hd4       479.
#> 13 hd5       865.