Chapter 13 Meta-analysis and measurement error models
In this chapter, we introduce two relatively underutilized models that are potentially very important for cognitive science: meta-analysis and measurement-error models.
Meta-analysis can be very informative when carrying out systematic reviews, and measurement-error models are able to take into account uncertainty in one’s dependent or independent variable (or both). What’s common to these two classes of model is that they both assume that the \(n\)-th measured data point \(y_n\) has a location parameter, say \(\zeta_n\) (pronounced zeta en), that is measured with some uncertainty that can be represented by the standard error \(SE_n\) of the measurement \(y_n\):
\(y_n \sim \mathit{Normal}(\zeta_n,SE_n)\)
In both classes of model, the goal is to obtain a posterior distribution of a latent parameter \(\zeta\) which is assumed to generate the \(\zeta_n\), with some standard deviation \(\tau\). The parameter \(\tau\) quantifies the noise in the measurement process or the between-study variability in a meta-analysis.
\(\zeta_n \sim \mathit{Normal}(\zeta,\tau)\)
The main parameter of interest is usually \(\zeta\), but the posterior distributions of \(\tau\) and \(\zeta_n\) can also be informative. The above model specification should remind you of the hierarchical models we saw in earlier chapters.
13.1 Meta-analysis
Once a number of studies have accumulated on a particular topic, it can be very informative to synthesize the data. Here is a commonly used approach–a random-effects meta-analysis.
13.1.1 A meta-analysis of similarity-based interference in sentence comprehension
The model is set up as follows. For each study \(n\), let effect\(_n\) be the effect of interest, and let \(SE_n\) be the standard error of the effect. A concrete example of a recent meta-analysis is the effect of similarity-based interference in sentence comprehension (Jäger, Engelmann, and Vasishth 2017); when two nouns are more similar to each other, there is greater processing difficulty (i.e., longer reading times in milliseconds) when an attempt is made to retrieve one of the nouns to complete a linguistic dependency (such as a subject-verb dependency). The estimate of the effect and its standard error is the information we have from each study \(n\).
First, load the data, and add an id variable that identifies each experiment.
data("df_sbi")
df_sbi %>%
(df_sbi <- mutate(study_id = 1:n()))
## # A tibble: 12 × 4
## publication effect SE study_id
## <chr> <int> <int> <int>
## 1 VanDyke07E1LoSem 13 30 1
## 2 VanDyke07E2LoSem 37 21 2
## 3 VanDyke07E3LoSem 20 11 3
## # ℹ 9 more rows
The effect size and standard errors were estimated from published summary statistics in the respective article. In some cases, this involved a certain amount of guesswork; the details are documented in the online material accompanying Jäger, Engelmann, and Vasishth (2017).
We begin with the assumption that there is a true (unknown) effect \(\zeta_n\) that lies behind each of these studies. Each of the observed effects has an uncertainty associated with it, \(SE_n\). We can therefore assume that each observed effect, effect\(_n\), is generated as follows:
\[\begin{equation} \text{effect}_n \sim \mathit{Normal}(\zeta_n,SE_n) \end{equation}\]
Each study is assumed to have a different true effect \(\zeta_n\) because each study will have been carried out under different conditions: in a different lab with different protocols and workflows, with different subjects, with different languages, with slightly different experimental designs, etc.
Further, each of the true underlying effects \(\zeta_n\) has behind it some true unknown value \(\zeta\). The parameter \(\zeta\) represents the underlying effect of similarity-based interference across experiments. Our goal is to obtain the posterior distribution of this overall effect.
We can write the above statement as follows:
\[\begin{equation} \zeta_n \sim\mathit{Normal}(\zeta,\tau) \end{equation}\]
\(\tau\) is the between-study standard deviation; this expresses the assumption that there will be some variability between the true effects \(\zeta_n\).
To summarize the model:
- effect\(_n\) is the observed effect (in this example, in milliseconds) in the \(n\)-th study.
- \(\zeta_n\) is the true (unknown) effect in each study.
- \(\zeta\) is the true (unknown) effect of the experimental manipulation, namely, the similarity-based interference effect.
- Each \(SE_n\) is estimated from the standard error available from study \(n\).
- The parameter \(\tau\) represents between-study standard deviation.
We can construct a hierarchical model as follows:
\[\begin{equation} \begin{aligned} \text{effect}_n \sim & \mathit{Normal}(\zeta_n, SE_n) \quad n=1,\dots, N_{studies}\\ \zeta_n \sim & \mathit{Normal}(\zeta, \tau) \\ \zeta \sim & \mathit{Normal}(0,100)\\ \tau \sim & \mathit{Normal}_+(0,100) \end{aligned} \tag{13.1} \end{equation}\]
The priors are based on domain knowledge; it seems reasonable to allow the effect to range a priori from \(-200\) to \(+200\) ms with probability \(95\)%. Of course, a sensitivity analysis is necessary (but skipped here).
This model can be implemented in brms
in a relatively straightforward way as shown below. We show the Stan version later in the chapter (section 13.1.1.2); the Stan version presents some interesting challenges that can be useful for the reader interested in deepening their Stan modeling knowledge.
13.1.1.1 brms
version of the meta-analysis model
First, define the priors:
c(prior(normal(0, 100), class = Intercept),
priors <-prior(normal(0, 100), class = sd))
Fit the model as follows. Because of our relatively uninformative priors and the few data points, the models of this chapter require us to tune the control
parameter, increasing adapt_delta
and max_treedepth
.
brm(effect | resp_se(`SE`, sigma = FALSE) ~
fit_sbi <- 1 + (1 | study_id),
data = df_sbi,
prior = priors,
control = list(adapt_delta = .99, max_treedepth = 10))
The posterior of \(\zeta\) and \(\tau\) are summarized below as Intercept
and sd(Intercept)
.
fit_sbi
## ...
## Group-Level Effects:
## ~study_id (Number of levels: 12)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept) 11.51 7.68 0.52 29.52 1.00 860
## Tail_ESS
## sd(Intercept) 1474
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 13.27 6.23 2.62 27.09 1.00 1105 966
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.00 0.00 0.00 0.00 NA NA NA
##
## ...
The sigma
parameter does not play any role in this model, but appears in the brms
output anyway. In the model specification, sigma
was explicitly removed by writing sigma = FALSE
. For this reason, we can ignore that parameter in the model summary output above. Box 13.1 explains what happens if we set sigma = TRUE
.
As theory predicts, the overall effect from these studies has a positive sign.
One advantage of such a meta-analysis is that the posterior can now be used as an informative prior for a future study. This is especially important when doing an analysis using Bayes factors. But this meta-analysis posterior could also be used as an informative prior in a future experiment; that would allow the researcher to build on what is known so far from published studies.
Box 13.1 What happens if we set sigma = TRUE
?
If we set sigma = TRUE
, we won’t be able to get estimates for \(\zeta_{n}\), since they are handled implicitly. The model presented formally in Equation (13.1) is equivalent to the following one in Equation (13.2). A critical difference is that \(\zeta_n\) does not appear any more.
\[\begin{equation} \begin{aligned} \text{effect}_n \sim &\mathit{Normal}(\zeta, \sqrt{\tau^2 + SE_n^2} )\\ \zeta \sim &\mathit{Normal}(0,100)\\ \tau \sim &\mathit{Normal}_+(0,100) \end{aligned} \tag{13.2} \end{equation}\]
This works because of the following property of normally distributed random variables:
If \(X\) and \(Y\) are two independent random variables, and
\[\begin{equation} \begin{aligned} X &\sim \mathit{Normal}(\mu_X, \sigma_X)\\ Y &\sim \mathit{Normal}(\mu_Y, \sigma_Y) \end{aligned} \tag{13.3} \end{equation}\]
then, \(Z\), the sum of these two random variables is:
\[\begin{equation} Z = X + Y \tag{13.4} \end{equation}\]
The distribution of \(Z\) has the following form:
\[\begin{equation} Z \sim\mathit{Normal}\left(\mu_X + \mu_Y, \sqrt{\sigma_X^2 + \sigma_Y^2}\right) \tag{13.5} \end{equation}\]
In our case, let
\[\begin{equation} \begin{aligned} U_{n} &\sim \mathit{Normal}(0 , SE_n)\\ \zeta_n &\sim \mathit{Normal}(\zeta, \tau) \end{aligned} \tag{13.6} \end{equation}\]
Analogous to Equations (13.4) and (13.5), effect\(_n\) can be expressed as a sum of two independent random variables:
\[\begin{equation} \text{effect}_n = U_n + \zeta_n \end{equation}\]
The distribution of effect\(_n\) will be
\[\begin{equation} \text{effect}_n \sim\mathit{Normal}\left(\zeta, \sqrt{SE^2 + \tau^2}\right) \tag{13.7} \end{equation}\]
We can fit this in brms
as follows. In this model specification, one should not include the + (1 | study_id)
, and the prior for \(\tau\) should now be specified for sigma
.
c(prior(normal(0, 100), class = Intercept),
priors2 <-prior(normal(0, 100), class = sigma))
brm(effect | resp_se(`SE`, sigma = TRUE) ~ 1,
fit_sbi_sigma <-data = df_sbi,
prior = priors2,
control = list(adapt_delta = .99,
max_treedepth = 10))
There are slight differences with fit_sbi
due to the different parameterization and the sampling process, but the results are very similar:
posterior_summary(fit_sbi_sigma,
variable = c("b_Intercept", "sigma"))
## Estimate Est.Error Q2.5 Q97.5
## b_Intercept 13.4 6.50 2.606 27.2
## sigma 11.5 7.61 0.559 28.1
If we are not interested in the underlying effects in each study, this parameterization of the meta-analysis can be faster and more robust (i.e., it has less potential convergence issues). A major drawback is that we can no longer display a forest plot as we do in Figure 13.1.
Another interesting by-product of a random-effects meta-analysis is the possibility of displaying a forest plot (Figure 13.1). A forest plot shows the meta-analytic estimate (the parameter b_Intercept
in brms
) alongside the original estimates effect\(_n\) (and their SE\(_n\)) and the posterior distributions of the \(\zeta_n\) for each study (we reconstruct these estimates by adding b_Intercept
to the parameters starting with r_
in brms
). The original estimates are the ones fed to the model as data and the posterior distributions of the \(\zeta_n\) are calculated, as in previous hierarchical models, after the information from all studies is pooled together. The \(\zeta_n\) estimates are shrunken estimates of each study’s (unknown) true effect, shrunken towards the grand mean \(\zeta\), and weighted by the standard error observed in each study \(n\). The \(\zeta_n\) for a particular study is shrunk more towards the grand mean \(\zeta\) when the study’s standard error is large (i.e., when the estimate is very imprecise). The code below shows how to build a forest plot step by step.
First, change the format of the data so that it looks like the output of brms
:
df_sbi %>%
df_sbi <- mutate(Q2.5 = effect - 2 * SE,
Q97.5 = effect + 2 * SE,
Estimate = effect,
type = "original")
Extract the meta-analytical estimate:
posterior_summary(fit_sbi,
df_Intercept <-variable = c("b_Intercept")) %>%
as.data.frame() %>%
mutate(publication = "M.A. estimate", type = "")
For the pooled estimated effect (or fitted value) of the individual studies, we need the sum of the meta-analytical estimate (intercept) and each of the by-study adjustment. Obtain this with the fitted()
function:
fitted(fit_sbi) %>%
df_model <- # Convert matrix to data frame:
as.data.frame() %>%
# Add a column to identify the estimates,
# and another column to identify the publication:
mutate(type = "adjusted",
publication = df_sbi$publication)
Bind the observed effects, the meta-analytical estimate, and the fitted values of the studies together, and plot the data:
# the adjusted estimates and the meta-analysis estimate:
bind_rows(df_sbi, df_model, df_Intercept) %>%
# Plot:
ggplot(aes(x = Estimate,
y = publication,
xmin = Q2.5,
xmax = Q97.5,
color = type)) +
geom_point(position = position_dodge(.5)) +
geom_errorbarh(position = position_dodge(.5)) +
# Add the meta-analytic estimate and Credible Interval:
geom_vline(xintercept = df_Intercept$Q2.5,
linetype = "dashed",
alpha = .3) +
geom_vline(xintercept = df_Intercept$Q97.5,
linetype = "dashed",
alpha = .3) +
geom_vline(xintercept = df_Intercept$Estimate,
linetype = "dashed",
alpha = .5) +
scale_color_discrete(breaks = c("adjusted", "original"))
It is important to keep in mind that a meta-analysis is always going to yield biased estimates as long as we have publication bias: if a field has a tendency to allow only “big news” studies to be published, then the literature that will appear in the public domain will be biased, and any meta-analysis based on such information will be biased. Despite this limitation, a meta-analysis is still a useful way to synthesize the known evidence; one just has to remember that the estimate from the meta-analysis is almost certain to be biased.
13.1.1.2 Stan version of the meta-analysis model
Even though brms
can handle meta-analyses, fitting them in Stan allows us for more flexibility, which might be necessary in some cases. As a first attempt we could build a model that closely follows the formal specification given in equation (13.1).
data {
int<lower=1> N;
vector[N] effect;
vector[N] SE;
vector[N] study_id;
}
parameters {
real zeta;
real<lower = 0> tau;
vector[N] zeta_n;
}
model {
target += normal_lpdf(effect| zeta_n, SE);
target += normal_lpdf(zeta_n | zeta, tau);
target += normal_lpdf(zeta | 0, 100);
target += normal_lpdf(tau | 0, 100)
- normal_lccdf(0 | 0, 100);
}
Fit the model as follows:
system.file("stan_models",
ma0 <-"meta-analysis0.stan",
package = "bcogsci")
list(N = nrow(df_sbi),
ls_sbi <-effect = df_sbi$effect,
SE = df_sbi$SE,
study_id = df_sbi$study_id)
stan(ma0,
fit_sbi0 <-data = ls_sbi,
control = list(adapt_delta = .999, max_treedepth = 12))
## Warning: There were 3 divergent transitions after
## warmup. See
## https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate
## them.
## Warning: There were 1 chains where the estimated
## Bayesian Fraction of Missing Information was low. See
## https://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling
## problems
## Warning: Bulk Effective Samples Size (ESS) is too low,
## indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low,
## indicating posterior variances and tail quantiles may be
## unreliable. Running the chains for more iterations may
## help. See
## https://mc-stan.org/misc/warnings.html#tail-ess
We see that there are warnings. As discussed in section 11.1.2, we can use pairs plots to uncover pathologies in the sampling. Here we see the samples of zeta
and tau
are highly correlated:
pairs(fit_sbi0, pars = c("zeta", "tau"))
We face a similar problem as we faced in section 11.1.2, namely, the sampler cannot properly explore the neck of the funnel-shaped space, because of the strong correlation between the parameters. The solution is, as in section 11.1.2, a non-centered parameterization. Re-write (13.1) as follows:
\[\begin{equation} \begin{aligned} z_n & \sim \mathit{Normal}(0, 1)\\ \zeta_n &= z_n \cdot \tau + \zeta \\ \text{effect}_n & \sim \mathit{Normal}(\zeta_n, SE_n)\\ \zeta &\sim \mathit{Normal}(0,100)\\ \tau &\sim \mathit{Normal}_+(0,100) \end{aligned} \tag{13.8} \end{equation}\]
This works because if \(X \sim\mathit{Normal}(a, b)\) and \(Y \sim \mathit{Normal}(0, 1)\), then \(X = a + Y \cdot b\). You can re-visit chapter 11.1.2 for more details.
Translate Equation (13.8) into Stan code as follows in meta-analysis1.stan
:
data {
int<lower=1> N;
vector[N] effect;
vector[N] SE;
vector[N] study_id;
}
parameters {
real zeta;
real<lower = 0> tau;
vector[N] z;
}
transformed parameters {
vector[N] zeta_n = z * tau + zeta;
}
model {
target += normal_lpdf(effect| zeta_n, SE);
target += std_normal_lpdf(z);
target += normal_lpdf(zeta | 0, 100);
target += normal_lpdf(tau | 0, 100)
- normal_lccdf(0 | 0, 100);
}
The model converges with values virtually identical to the ones of the brms
model.
system.file("stan_models",
ma1 <-"meta-analysis1.stan",
package = "bcogsci")
stan(ma1,
fit_sbi1 <-data = ls_sbi,
control = list(adapt_delta = .999,
max_treedepth = 12))
print(fit_sbi1, pars = c("zeta", "tau"))
## mean 2.5% 97.5% n_eff Rhat
## zeta 13.4 2.71 28.3 739 1
## tau 11.6 0.55 30.2 729 1
We can also reparameterize the model slightly differently, if we set \(U_{n} \sim\mathit{Normal}(0 , SE_n)\) then,
\[\begin{equation} \text{effect}_n = U_n + \zeta_n \end{equation}\]
Then, given that \(\zeta_n \sim \mathit{Normal}(\zeta, \tau)\),
\[\begin{equation} \text{effect}_n \sim \mathit{Normal}(\zeta, \sqrt{SE^2 + \tau^2}) \tag{13.9} \end{equation}\]
See Box 13.1 if it’s not clear why this reparameterization works.
This is equivalent to the brms
model where sigma = TRUE
. As with brms
, we lose the possibility of estimating the posterior of the true effect of the individual studies.
Write this in Stan as follows; this code is available in the file meta-analysis2.stan
within the bcogsci
package:
data {
int<lower=1> N;
vector[N] effect;
vector[N] SE;
vector[N] study_id;
}
parameters {
real zeta;
real<lower = 0> tau;
}
model {
target += normal_lpdf(effect| zeta,
sqrt(square(SE) + square(tau)));
target += normal_lpdf(zeta | 0, 100);
target += normal_lpdf(tau | 0, 100)
- normal_lccdf(0 | 0, 100);
}
Fit the model:
system.file("stan_models",
ma2 <-"meta-analysis2.stan",
package = "bcogsci")
stan(ma2,
fit_sbi2 <-data = ls_sbi,
control = list(adapt_delta = .9))
print(fit_sbi2, pars = c("zeta", "tau"))
## mean 2.5% 97.5% n_eff Rhat
## zeta 13.7 3.23 29.6 822 1
## tau 11.7 0.70 29.3 1000 1
This summary could be reported in an article by displaying the posterior means and 95% credible intervals of the parameters.
13.2 Measurement-error models
Measurement error models deal with the situation where some predictor or the dependent variable, or both, are observed with measurement error. This measurement error could arise because a variable is an average (i.e., its standard error can also be estimated), or because we know that our measurement is noisy due to limitations of our equipment (e.g., delays in the signal from the keyboard to the motherboard, impedance in the electrodes in an EEG system, etc.).
13.2.1 Accounting for measurement error in individual differences in working memory capacity and reading fluency
As a motivating example, consider the following data from Nicenboim, Vasishth, et al. (2018). For each subject, we have the partial-credit unit (PCU) scores of an operation span task as a measure of their working memory capacity (Conway et al. 2005) along with their standard error. In addition, the reading fluency of each subject is calculated from a separate set of data based on the mean reading speeds (character/second) in a rapid automatized naming task (RAN, Denckla and Rudel 1976); the standard error of the reading speed is also available.
Of interest here is the extent of the association between working memory capacity (measured as PCU) and reading fluency (measured as reading speed in 50 characters per second). We avoid making any causal claims: It could be that our measure of working memory capacity really affects reading fluency or it could be the other way around. A third possibility is that there is a third variable (or several) that affects both reading fluency and working memory capacity. A treatment of causality in Bayesian models can be found in chapters 5 and 6 of McElreath (2020).
data("df_indiv")
df_indiv
## # A tibble: 100 × 5
## subj mean_rspeed se_rspeed mean_pcu se_pcu
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0.0521 0.00113 0.738 0.0648
## 2 2 0.0479 0.00121 0.292 0.0315
## 3 3 0.0601 0.00117 0.408 0.0900
## # ℹ 97 more rows
At first glance, we see a relationship between mean PCU scores and mean reading speed; see Figure 13.2. However, this relationship seems to be driven by two extreme data points on the top left corner of the plot.
df_indiv %>%
df_indiv <- mutate(c_mean_pcu = mean_pcu - mean(mean_pcu))
ggplot(df_indiv, aes(x = c_mean_pcu, y = mean_rspeed)) +
geom_point() +
geom_smooth(method = "lm")
A simple linear model shows a somewhat weak association between mean reading speed and centered mean PCU. The priors are relatively arbitrary but they are in the right order of magnitude given that reading speeds are quite short and well below 1.
df_indiv %>%
df_indiv <- mutate(c_mean_pcu = mean_pcu - mean(mean_pcu))
c(prior(normal(0, 0.5), class = Intercept),
priors <-prior(normal(0, 0.5), class = b),
prior(normal(0, 0.5), class = sigma))
brm(mean_rspeed ~ c_mean_pcu,
fit_indiv <-data = df_indiv,
family = gaussian(),
prior = priors)
fit_indiv
## ...
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept 0.06 0.00 0.05 0.06 1.00 5090
## c_mean_pcu -0.01 0.01 -0.03 0.00 1.00 2208
## Tail_ESS
## Intercept 2966
## c_mean_pcu 1845
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.01 0.00 0.01 0.01 1.00 2002 2008
##
## ...
# Proportion of samples below zero
mean(as_draws_df(fit_indiv)$b_c_mean_pcu < 0)) (Pb <-
## [1] 0.943
Figure 13.3(a) shows the posterior distribution of the slope in this model. Most of the probability mass is negative (94.25%), suggesting that a better PCU score is associated with slower reading speed rather than faster; that is, that a larger working memory capacity is associated with less reading fluency. This is not a very intuitive result and it could be the case that is driven by the two extreme data points. Rather than removing these data points, we’ll examine what happens when the uncertainty of the measurements is taken into account.
mcmc_plot(fit_indiv,
variable = "^b_c",
regex = TRUE,
type = "hist")
Taking this uncertainty of the measurement is important; in many practical research problems, researchers will often take average measurements like these and examine the correlation between them. However, each of those data points is being measured with some error (uncertainty), but this error is being ignored when we take the averaged values. Ignoring this uncertainty leads to over-enthusiastic inferences. A measurement-error model solves this issue by taking uncertainty into account.
The measurement error model is stated as follows. There is assumed to be a true unobserved value \(y_{n,TRUE}\) for the dependent variable, and a true unobserved value \(x_{n,TRUE}\) for the predictor, where \(n\) is indexing the observation number. The observed values \(y_n\) and the predictor \(x_n\) are assumed to be generated with some error:
\[\begin{equation} \begin{aligned} y_n &\sim\mathit{Normal}(y_{n,TRUE},SE_{y_n}) \\ x_n &\sim\mathit{Normal}(x_{n,TRUE},SE_{x_n}) \end{aligned} \end{equation}\]
The regression is fit to the (unknown) true values of the dependent and independent variables:
\[\begin{equation} y_{n,TRUE} \sim\mathit{Normal}(\alpha + \beta x_{n,TRUE},\sigma) \tag{13.10} \end{equation}\]
In addition, there is also an unknown standard deviation (standard error) of the latent unknown means that generate the underlying PCU means. I.e., we assume that each of the observed centered PCU scores are normally distributed with an underlying mean, \(\chi\), and a standard deviation \(\tau\). This is very similar to the meta-analysis situation we saw earlier: \(\zeta_n \sim\mathit{Normal}(\zeta,\tau)\), where \(\zeta_n\) was the location parameter of each study, and \(\zeta\) was the (unknown) location parameter representing the effect of interest, and \(\tau\) was the between-study variability.
\[\begin{equation} x_{n,TRUE} \sim\mathit{Normal}(\chi,\tau) \end{equation}\]
The goal of the modeling is to obtain posterior distributions for the intercept and slope \(\alpha\) and \(\beta\) (and the residual error standard deviation \(\sigma\)).
We need to decide on priors for all the parameters now. We use relatively vague priors, which can still be considered regularizing priors based on our knowledge of the order of magnitude of the measurements. In situations where not much is known about a research question, one could use such vague priors.
\[\begin{equation} \begin{aligned} \alpha &\sim\mathit{Normal}(0, 0.5)\\ \beta &\sim\mathit{Normal}(0, 0.5)\\ \chi &\sim\mathit{Normal}(0, 0.5)\\ \sigma &\sim\mathit{Normal}_+(0, 0.5)\\ \tau &\sim\mathit{Normal}_+(0, 0.5) \end{aligned} \tag{13.11} \end{equation}\]
13.2.1.1 The brms
version of the measurement error model
In brms
, the model specification would be as follows:
c(prior(normal(0, 0.5), class = Intercept),
priors_me <-prior(normal(0, 0.5), class = b),
prior(normal(0, 0.5), class = meanme),
prior(normal(0, 0.5), class = sdme),
prior(normal(0, 0.5), class = sigma))
Here, the parameter with class meanme
and sdme
refer to the unknown mean and standard deviation (standard error) of the latent unknown means that generate the underlying PCU means, \(\chi\) and \(\tau\) in Equation (13.11). Once we decide on the priors, we use resp_se(.)
with sigma = TRUE
(i.e, we don’t estimate \(y_{n,TRUE}\) explicitly) and we use me(c_meanpcu, se_pcu)
to indicate that the dependent variable c_mean_pcu
is measured with error and se_pcu
is its SE.
brm(mean_rspeed | resp_se(se_rspeed, sigma = TRUE) ~
fit_indiv_me <- me(c_mean_pcu, se_pcu),
data = df_indiv,
family = gaussian(),
prior = priors_me)
fit_indiv_me
## ...
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept 0.05 0.00 0.05 0.06 1.00 3345
## mec_mean_pcuse_pcu -0.00 0.01 -0.01 0.01 1.00 5426
## Tail_ESS
## Intercept 2758
## mec_mean_pcuse_pcu 3244
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.01 0.00 0.01 0.01 1.00 7954 3059
##
## ...
# Proportion of samples below zero
# Parameter names can be found out with `variables(fit_indiv_me)`
mean(as_draws_df(fit_indiv_me)$bsp_mec_mean_pcuse_pcu < 0)) (Pb_me <-
## [1] 0.64
The posterior for the slope is plotted in Figure 13.4; this figure shows that the association between PCU scores and reading speed is much weaker once measurement error is taken into account: The posterior is much more uncertain (much more widely distributed) than in the simple linear model we fit above (compare Figure 13.4 with Figure 13.3), and the direction of the association is now unclear, with 64% of the probability mass below zero, rather than 94%.
Figure 13.5 visualizes the main reason why we have no clear association in the measurement error analysis: the two points at the top left part of the plot that were driving the effect have very large SE for the measurement of reading speed. The code to produce it appears below and overlays several (250) regression lines that correspond to different samples of the posterior distribution with the measurements of reading speed and PCU.
as_draws_df(fit_indiv_me) %>%
df_reg <- select(alpha = b_Intercept, beta = bsp_mec_mean_pcuse_pcu) %>%
slice(1:250)
ggplot(df_indiv, aes(x = c_mean_pcu, y = mean_rspeed)) +
geom_point() +
geom_errorbarh(aes(xmin = c_mean_pcu - 2 * se_pcu,
xmax = c_mean_pcu + 2 * se_pcu),
alpha = .5, linetype = "dotted") +
geom_errorbar(aes(ymin = mean_rspeed - 2 * se_rspeed,
ymax = mean_rspeed + 2 * se_rspeed),
alpha = .5, linetype = "dotted") +
geom_abline(aes(intercept = alpha, slope = beta),
data = df_reg,
alpha = .05)
Of course, the conclusion here cannot be that there is no association between PCU scores and reading speed. In order to claim an absence of an effect, we would need to use Bayes factor (see chapter 15) or cross-validation (see chapter 16).
13.2.1.2 The Stan version of the measurement error model
As it happened when we modeled the meta-analysis, the main difficulty for modeling measurement error models directly in Stan is that we need to reparameterize the models to avoid correlations between samples of different parameters. The two changes that we need to do to the parameterization of our model presented in Equation (13.11) are the following.
- Sample from an auxiliary parameter \(z_n\) rather than directly from \(x_{n,TRUE}\), as we did in Equation (13.8):
\[\begin{equation} \begin{aligned} z_n & \sim\mathit{Normal}(0, 1)\\ x_{n,TRUE} &= z_n \cdot \tau + \chi \\ x_n & \sim \mathit{Normal}(x_{n,TRUE}, SE_{x_n}) \end{aligned} \end{equation}\]
- Don’t model \(y_{n,TRUE}\) explicitly as in Equation (13.10); rather take into account the SE and the variation on \(y_{n,TRUE}\) in the following way:
\[\begin{equation} y_{n} \sim\mathit{Normal}\left(\alpha + \beta x_{n,TRUE},\sqrt{SE_{y_n}^2 + \sigma^2}\right) \end{equation}\]
We are now ready to write this in Stan; the code is in the model called me.stan
:
data {
int<lower=1> N;
vector[N] x;
vector[N] SE_x;
vector[N] y;
vector[N] SE_y;
}
parameters {
real alpha;
real beta;
real chi;
real<lower = 0> sigma;
real<lower = 0> tau;
vector[N] z;
}
transformed parameters {
vector[N] x_true = z * tau + chi;
}
model {
target += normal_lpdf(x | x_true, SE_x);
target += normal_lpdf(y | alpha + beta * x_true,
sqrt(square(SE_y) + square(sigma)));
target += std_normal_lpdf(z);
target += normal_lpdf(alpha | 0, 0.5);
target += normal_lpdf(beta | 0, 0.5);
target += normal_lpdf(chi | 0, 0.5);
target += normal_lpdf(sigma| 0, 0.5)
- normal_lccdf(0 | 0, 0.5);
target += normal_lpdf(tau | 0, 0.5)
- normal_lccdf(0 | 0, 0.5);
}
Fit the model:
system.file("stan_models",
me <-"me.stan",
package = "bcogsci")
list(N = nrow(df_indiv),
ls_me <-y = df_indiv$mean_rspeed,
SE_y = df_indiv$se_rspeed,
x = df_indiv$c_mean_pcu,
SE_x = df_indiv$se_pcu)
stan(me, data = ls_me) fit_indiv_me_stan <-
print(fit_indiv_me_stan, pars = c("alpha", "beta", "sigma"))
## mean 2.5% 97.5% n_eff Rhat
## alpha 0.05 0.05 0.06 3610 1
## beta 0.00 -0.01 0.01 5216 1
## sigma 0.01 0.01 0.01 5592 1
The posterior distributions are similar to those that we obtained with brms
.
13.3 Summary
This chapter introduced two statistical tools that are potentially of great relevance to cognitive science: random-effects meta-analysis and measurement error models. Despite the inherent limitations of meta-analysis, these should be used routinely to accumulate knowledge through systematic evidence synthesis. Measurement errors can also prevent over-enthusiastic conclusions that are often made based on noisy data.
13.4 Further reading
For some examples of Bayesian meta-analyses in psycholinguistics, see Vasishth et al. (2013), Jäger, Engelmann, and Vasishth (2017), Nicenboim, Roettger, and Vasishth (2018), Nicenboim, Vasishth, and Rösler (2020), Bürki et al. (2020), Cox et al. (2022), and Bürki, Alario, and Vasishth (2023). A frequentist meta-analysis of priming effects in psycholinguistics appears in Mahowald et al. (2016). Sutton et al. (2012) and Higgins and Green (2008) are two useful general introductions that discuss systematic reviews, meta-analysis, and evidence synthesis; these two references are from medicine, where meta-analysis is more widely used than in cognitive science. A potentially important article for meta-analysis introduces a methodology for modeling bias, to adjust for different kinds of bias in the data (Turner et al. 2008). Meta-analyses have important limitations and should not be taken at face value; this point is discussed in, e.g., Spector and Thompson (1991).
13.5 Exercises
Exercise 13.1 Extracting estimates from published papers
Researchers often do not release the data that lie behind the published paper. This creates the problem that one has to figure out the estimated effect size and standard error from published statistics. This exercise gives some practice on how to do this by considering some typical cases that are encountered in repeated measures designs.
Suppose that in a repeated measures reading study, the observed t-value from a paired t-test is 2.3 with degrees of freedom 23. Suppose also that the estimated standard deviation is 150 ms. What is the estimated standard error and the estimated effect size?
Suppose that in a repeated measures reading study, the observed \(t\)-value from a paired \(t\)-test is is 2.3 with degrees of freedom 23. Suppose also that the estimated standard deviation is 150 ms. What is the estimated standard error and the estimated effect size?
A repeated measures reading study reports an F-score from an analysis of variance as \(F(1,34)=6.1\), with an effect size of 25 ms. What is the estimated standard error?
Exercise 13.2 A meta-analysis of picture-word interference data
Load the following data set:
data("df_buerki")
head(df_buerki)
## study d se study_id
## 1 Collina 2013 Exp.1 a 24 13.09 1
## 2 Collina 2013 Exp.1 b -25 17.00 2
## 3 Collina 2013 Exp.2 46 22.79 3
## 4 Mahon 2007 Exp.1 17 12.24 4
## 5 Mahon 2007 Exp.2 57 13.96 5
## 6 Mahon 2007 Exp. 4 17 8.01 6
subset(df_buerki, se > 0.60) df_buerki <-
The data are from Bürki et al. (2020). We have a summary of the effect estimates (d) and standard errors (se) of the estimates from 162 published experiments on a phenomenon called semantic picture-word interference. We removed an implausibly low SE in the code above, but the results don’t change regardless of whether we keep them or not, because we have data from a lot of studies.
In this experimental paradigm, subjects are asked to name a picture while ignoring a distractor word (which is either related or unrelated to the picture). The word can be printed on the picture itself, or presented auditorily. The dependent measure is the response latency, or time interval between the presentation of the picture and the onset of the vocal response. Theory says that distractors that come from the same semantic category as the picture to be named lead to a slower response then when the distractor comes from a different semantic category.
Carry out a random effects meta-analysis using brms
and display the posterior distribution of the effect, along with the posterior of the between study standard deviation.
Choose \(\mathit{Normal}(0,100)\) priors for the intercept and between study sd parameters. You can also try vague priors (sensitivity analysis). Examples would be:
- \(\mathit{Normal}(0,200)\)
- \(\mathit{Normal}(0,400)\)
Exercise 13.3 Measurement error model for English VOT data
Load the following data:
data("df_VOTenglish")
head(df_VOTenglish)
## subject meanVOT seVOT meanvdur sevdur
## 1 F01 108.1 4.56 171 11.7
## 2 F02 92.5 4.62 189 12.7
## 3 F03 82.6 3.13 171 10.0
## 4 F04 88.3 3.21 168 11.8
## 5 F05 94.6 3.67 166 15.0
## 6 F06 75.9 3.70 176 12.9
You are given mean voice onset time (VOT) data (with SEs) in milliseconds for English, along with mean vowel durations (with SEs) in milliseconds. Fit a measurement-error model investigating the effect of mean vowel duration on mean VOT duration. First plot the relationship between the two variables; does it look like there is an association between the two?
Then use brms
with measurement error included in both the dependent and independent variables. Do a sensitivity analysis to check the influence of the priors on the posteriors of the relevant parameters.
References
Bürki, Audrey, Francois-Xavier Alario, and Shravan Vasishth. 2023. “When Words Collide: Bayesian Meta-Analyses of Distractor and Target Properties in the Picture-Word Interference Paradigm.” Quarterly Journal of Experimental Psychology 76 (6): 1410–30. https://doi.org/https://doi.org/10.1177/17470218221114644.
Bürki, Audrey, Shereen Elbuy, Sylvain Madec, and Shravan Vasishth. 2020. “What Did We Learn from Forty Years of Research on Semantic Interference? A Bayesian Meta-Analysis.” Journal of Memory and Language 114. https://doi.org/10.1016/j.jml.2020.104125.
Conway, Andrew R. A., Michael J. Kane, Michael F. Bunting, D. Zach Hambrick, Oliver Wilhelm, and Randall W. Engle. 2005. “Working Memory Span Tasks: A Methodological Review and User’s Guide.” Psychonomic Bulletin & Review 12 (5): 769–86.
Cox, Christopher Martin Mikkelsen, Tamar Keren-Portnoy, Andreas Roepstorff, and Riccardo Fusaroli. 2022. “A Bayesian Meta-Analysis of Infants’ Ability to Perceive Audio–Visual Congruence for Speech.” Infancy 27 (1): 67–96.
Denckla, Martha Bridge, and Rita G. Rudel. 1976. “Rapid ‘Automatized’ Naming (R.A.N.): Dyslexia Differentiated from Other Learning Disabilities.” Neuropsychologia 14 (4): 471–79. https://doi.org/https://doi.org/10.1016/0028-3932(76)90075-0.
Higgins, Julian, and Sally Green. 2008. Cochrane Handbook for Systematics Reviews of Interventions. New York: Wiley-Blackwell.
Jäger, Lena A., Felix Engelmann, and Shravan Vasishth. 2017. “Similarity-Based Interference in Sentence Comprehension: Literature review and Bayesian meta-analysis.” Journal of Memory and Language 94: 316–39. https://doi.org/https://doi.org/10.1016/j.jml.2017.01.004.
Mahowald, Kyle, Ariel James, Richard Futrell, and Edward Gibson. 2016. “A Meta-Analysis of Syntactic Priming in Language Production.” Journal of Memory and Language 91: 5–27. https://doi.org/https://doi.org/10.1016/j.jml.2016.03.009.
McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Boca Raton, Florida: Chapman; Hall/CRC.
Nicenboim, Bruno, Timo B. Roettger, and Shravan Vasishth. 2018. “Using Meta-Analysis for Evidence Synthesis: The case of incomplete neutralization in German.” Journal of Phonetics 70: 39–55. https://doi.org/https://doi.org/10.1016/j.wocn.2018.06.001.
Nicenboim, Bruno, Shravan Vasishth, Felix Engelmann, and Katja Suckow. 2018. “Exploratory and Confirmatory Analyses in Sentence Processing: A case study of number interference in German.” Cognitive Science 42 (S4). https://doi.org/10.1111/cogs.12589.
Nicenboim, Bruno, Shravan Vasishth, and Frank Rösler. 2020. “Are Words Pre-Activated Probabilistically During Sentence Comprehension? Evidence from New Data and a Bayesian Random-Effects Meta-Analysis Using Publicly Available Data.” Neuropsychologia 142. https://doi.org/10.1016/j.neuropsychologia.2020.107427.
Spector, Tim D., and Simon G. Thompson. 1991. “The Potential and Limitations of Meta-Analysis.” Journal of Epidemiology and Community Health 45 (2): 89.
Sutton, Alexander J., Nicky J. Welton, Nicola Cooper, Keith R. Abrams, and A. E. Ades. 2012. Evidence Synthesis for Decision Making in Healthcare. Vol. 132. John Wiley & Sons.
Turner, R. M., David J. Spiegelhalter, G. Smith, and Simon G. Thompson. 2008. “Bias Modelling in Evidence Synthesis.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 172 (1): 21–47. https://doi.org/https://doi.org/10.1111/j.1467-985X.2008.00547.x.
Vasishth, Shravan, Zhong Chen, Qiang Li, and Gueilan Guo. 2013. “Processing Chinese Relative Clauses: Evidence for the Subject-Relative Advantage.” PLoS ONE 8 (10): 1–14. https://doi.org/https://doi.org/10.1371/journal.pone.0077006.