Density plots: revision
In the Absolute Beginners’ Guide to R and in the revision worksheet we used a Bayesian between-subjects t-test to look at the strength of evidence for the presence (or absence) of a difference between groups. For example, in the revision worksheet, we looked at whether the way you experienced a word (silently read, or spoken out loud) affected how good your memory of it was. We found a Bayes Factor close to 1, indicating there was neither good evidence for, nor good evidence against, there being a difference (an indecisive result). Bayes Factors greater than 3 are generally considered to be evidence for a difference worth listening to, while Bayes Factors less than one-third (0.33) are generally considered to be evidence for the absence of a difference that is worth listening to. Others have provided some additional guidelines on how to interpret Bayes Factors; the following comes from Andraszewicz et al. (2014):
A rule-of-thumb interpretation of Bayes Factors | |
Bayes Factors | Interpretation |
---|---|
> 100 | Extreme evidence for the alternative hypothesis |
30 - 100 | Very strong evidence for the alternative hypothesis |
10 - 30 | Strong evidence for the alternative hypothesis |
3 - 10 | Moderate evidence for the alternative hypothesis |
1 - 3 | Anecdotal evidence for the alternative hypothesis |
0.33 - 1 | Anecdotal evidence for the null hypothesis |
0.10 – 0.33 | Moderate evidence for the null hypothesis |
0.03 – 0.10 | Strong evidence for the null hypothesis |
0.01 – 0.03 | Very strong evidence for the null hypothesis |
< 0.01 | Extreme evidence for the null hypothesis |
Note that this is BF10, which means that the larger the Bayes Factor is, the more evidence we have for the alternative hypothesis. |
In this worksheet, we’ll cover how to calculate a Bayes Factor for a simple within-subjects experiment. As we covered in the statistical power worksheet, a within-subjects experiment is one where all the conditions are done by all the participants. Within-subjects designs are often a really good idea, because they are more efficient. In other words, they tend to allow us to give confident answers to questions while testing fewer people than would be needed in a between-subjects experiment.
The technique we’re going to use is called ANOVA, short for ANalysis Of VAriance. It’s probably the most used analysis technique in psychology. Its name (ANOVA) has a long history that isn’t really worth getting into here, particularly as it comes from our history in psychology of using p-values as evidence terms. As we’ve covered previously, p-values have no interpretation that is simultaneously accurate, simple, and useful. In this worksheet, we focus on a form of ANOVA that calculates Bayes Factors, which are more easily interpretable. If you ever need to calculate a traditional ANOVA, take a look at the traditional ANOVA worksheet.
In the preprocessing worksheet, you loaded
data from a git repository into an R project. Go back
to that project now on RStudio; the project was called
rminr-data
This is very important. Make sure you
are in the rminr-data
project, changing your Rstudio
project using the drop-down list on the top right of Rstudio if you need
to (change to the rminr-data
project you used for the
preprocessing worksheet). If you do not do this, the rest of this
worksheet will not work.
Now that you’re in the rminr-data
project, create a new
script called anova1.R, put all the comments and
commands you use in this worksheet into that file, and save
regularly.
You will notice that your Environment still contains all the
data frames from the previous worksheet. You can leave them there if you
like, but some people prefer to start with a “clean slate”. You can
remove data frames with the rm
command (as in
remove). For example, to remove the
dat
data frame you’d type rm(dat)
.
The rm
command can also be used with the
ls()
command. The ls()
command gives you a
list of the names of the things in your environment. The command is
called ls
because it
lists them. We can use the
ls()
and rm()
commands together to remove
everything from our Environment, as follows.
Add the following comments and commands to your script and run them:
# Within-subject differences
# Clear environment
rm(list = ls())
Notice that your Environment tab in Rstudio is now empty. This command only clears the Environment, it leaves all the files in your project as they were (you can still see them in the Files panel).
Now add these comments and commands to your script, and run them, to load this new data set:
# Load tidyverse
library(tidyverse)
# Load data into 'words'
words <- read_csv("wordnaming2.csv")
Now click on the words
data frame in the
Environment tab, and take a look. You’ll see that its a data
frame with 10 columns and 37,800 rows.
We’ll be using this data set over the next few worksheets, so it’s important to take some time here to understand the experiment it comes from.
The data frame contains simulated data for a word-naming experiment. In the experiment, participants just had to name (i.e. speak out loud) each written word as it appeared on the screen. However, on every trial, there was also a picture. Sometimes the picture matched the word (e.g. the word ‘car’ and a picture of a car). This is called a congruent trial. Other times, the picture and word were different (e.g. the word ’ boat’ and a picture of a carrot). This is called an incongruent trial. Participants were instructed to ignore the picture, but were they able to do so? If they can’t help processing the picture, then they should be slower to name the word on incongruent trials than on congruent trials. There are a third type of trials, neutral trials. These are trials on which just the word appears, with no picture.
In this experiment, the word naming task is being used to study the effects of meditation practice. Some psychologists have suggested that learning how to meditate can improve people’s ability to ignore irrelevant information. If so, we should be able to improve people’s performance on this word naming task by training them how to meditate first.
So, in this experiment, 140 participants were randomly allocated to a meditation-training condition, in which they received a week’s training on how to meditate, before doing the word naming experiment. Another 140 were assigned to a no-training condition. These people just did the word naming experiment, with no meditation training. Finally, another 140 people were assigned to a relaxation-training condition, in which they received a week’s training, but on some relaxation techniques, instead of meditation.
Within each group of participants, half were male and half were female. Each participant completed 90 trials of the word naming task. They did 30 trials, took a short break, then did another 30 trials, took another short break, and did the final 30 trials. This gives us 37,800 word naming trials in total. On each trial, the participant either named the word correctly, or they didn’t. Either way, they took a certain amount of time (measured in milliseconds) to name the word.
Here’s a summary of what each of the columns in the data set contains:
Column | Description | Values |
---|---|---|
subj | Unique anonymous participant number | 1-420 |
sex | Biological sex of the participant | “male”, “female” |
medit | Whether the participant has had meditation, relaxation, or no training | “meditate”, “relax”, “control” |
block | Block of trials | 1-3 |
trial | Trial number (per participant) | 1-90 |
word | The word presented on this trial | e.g. “cup” |
pic | The name of the picture presented | e.g. “cake” |
congru | Experimental condition: congruent, incongruent or neutral trial? | ‘cong’, ‘incong’, ‘neutral’ |
acc | Accuracy of naming response | 1 = correct, 0 = incorrect |
rt | Reaction time | in milliseconds |
DISCLAIMER: This data set is loosely based on the design of experiments that have really been done, and is realistic in terms of its size and complexity. However, it’s important to remember that this is generated data, not data actually collected from participants. So, you should not take the analyses you do of this simulated data as evidence for the effectiveness of meditation in improving attentional control. As is often the case, real data in this area is less clear, and more controversial.
As covered in the preprocessing worksheet, the first thing we need to do with realistic data sets like this is to preprocess them, so the data is in a format that we can analyse. The two steps we need to do for our first analysis are filtering and summarising.
In our first analyses, we’re going to take a look at the participants who had no pre-training in meditation or relaxation (i.e. the control group), so we get a sense of what performance on this task ‘normally’ looks like. So, enter the following comment into your script:
# Filter control condition into 'wordctrl'
and then add and run a command that uses
filter
to extract just the control condition, and put it
into a new data frame called wordctrl
. If you need a
reminder of how the filter
command works, take a look at
the preprocessing worksheet.
EXPECTED OUTPUT: There will be new data frame in your
Environment window that has 12,600 rows and 10 columns and is
called control
.
Also, for now, we’re not going to look at the neutral trials, just the congruent and incongruent ones. So, add this comment to your script:
# Filter out neutral trials, pass remainder to 'wordctrlCI'
and add and run a command to filter out the
neutral trials from wordctrl
and put the remaining
data into wordctrlCI
. You may find that the command
!=
is the most efficient way to do this. For example, if
you were trying to exclude the relaxation condition, you could use
filter(medit != "relax")
. The command !=
means
‘not equal to’.
EXPECTED OUTPUT: There will be a new data frame in your
Environment window that has 8,400 rows and 10 columns and is
called wordctrlCI
.
The data frame is still very long, because it contains data for every trial. What we want to know, for each participant, is whether they are quicker to name the words on the congruent trials (where the picture and word match) than on the incongruent trials (where the picture and word are different). So, we need to summarise each person’s data down to just two numbers - their mean reaction time on congruent trials, and their mean reaction time on incongruent trials. So, add the following comment to your script:
# Group by 'subj' and 'congru'; calculate mean RT; pass to 'wordctrlCIsum'
and then write and run a command to group
the data by participant (subj
) and experimental condition
(congru
), using the group_by
command, get the
mean reaction times using the summarise
command, and put
this summarised data into a new data frame called
wordctrlCIsum
. If you need a reminder of how to group and
summarise data, take a look at the preprocessing worksheet.
Column names with summarize
: There is
one new thing to learn here, though, relative to the previous worksheet.
The column containing the RT data in this new data frame will need to
have the name rt
, because we use this name later on. You
can set the column name within the summarise
command. For
example, summarise(acc = mean(acc))
takes the mean of the
column acc
and puts the answer into a column called
acc
in your new data frame. If you just use
summarise(mean(acc))
then the column name you end up with
is mean(acc)
and this would be a pain to use in later
commands, so we rename the column as acc
by typing
summarise(acc = mean(acc))
instead.
EXPECTED OUTPUT: There will be a new data frame in your
Environment window called wordctrlCIsum
. It will
have 280 rows - two reaction times for each of the 140 participants -
one for the congruent trials, and one for the incongruent trials. You’ll
notice that the incongruent reaction times tend to be larger, as
expected.
Add the following comment to your script:
# Density plot of RT, by 'congru'
Now, as you did in the Absolute Beginners’ Guide to
R, write and run a command to create a density plot
of this summarised data (wordctrlCIsum
), showing the
distribution of reaction times in the two conditions. Click on the link
above if you need to revise how to do this.
If you get it right, it should look like this:
We can see a few different things in this plot. First, there’s quite a lot of variation between people in how fast they react, with mean reaction time for incongruent trials anywhere from about 200 milliseconds to 800 milliseconds. Second, reaction times seem to vary less on congruent trials (averages around 400-600 milliseconds). Third, the peak of the incongruent distribution is slightly to the right of the congruent distribution, suggesting that, overall, incongruent trials might take a bit longer to react to than congruent trials.
But…looking at the data this way misses the most important thing about the design of this experiment, which is that it is within subjects. This is not two different groups of people, one who see only congruent trials, and the other who only see incongruent trials. Every participant sees both types. So, there’s a better way to look at this data, particularly when (as here) people differ a lot in their average reaction time.
What we can do is calculate a difference score for each participant. In other words, how much longer each participant takes to respond to incongruent trials, than congruent ones.
In order to do this, we have to learn some new types of data preprocessing - pivoting and mutating.
Pivoting is where you take a long data file (lots of rows,
few columns) and make it wider. Or where you take a wide data file (lots
of columns, few rows) and make it longer. In this case, we need to make
wordctrlCIsum
wider, for reasons that will become clearer
in the next step.
To make a data frame wider, we can use the pivot_wider
command from the tidyverse
package. To illustrate,
enter the following comment and command into your R script and
run it:
# Pivot wordctrlCIsum into a wider data frame; place in 'ctrl'
ctrl <- wordctrlCIsum %>% pivot_wider(names_from = congru, values_from = rt)
and look at ctrl
in the Environment window, we can see
that it now has fewer rows (140). It has the same number of columns, but
the column names are different. Specifically, we now have one row for
each participant and we have one column for their congruent reaction
time, and one column for their incongruent reaction time.
Explanation of command: The pivot_wider
command has two components: names_from
and
values_from
. The first one, names_from
, tells
R which column you want to use to as the column names in your new, wider
data frame. In this case, we want to take them from the
congru
column of wordctrlCIsum
. The
values_from
part tells R which column contains the data you
want to put in these new columns. In this case, it’s the rt
column of wordctrlCIsum
.
Mutation in this context means to take some things and
change them to make something new. In this case, we want to mutate the
congruent and incongruent reaction times into a difference score,
i.e. incongruent RT minus congruent RT. We do this using the
mutate
command, as follows:
Enter the following comment and command into your R script and run it:
# Calculate RT difference; place in 'ctrldiff'
ctrldiff <- ctrl %>% mutate(diff = incong - cong)
Look at ctrldiff
in the Environment window -
you will see that it has a fourth column, diff
and that
each row contains the incongruent reaction minus the congruent reaction
time. For most participants, the difference is positive, meaning that
incongruent trials are slower. However, some participants are the other
way around (e.g. participant 146).
Explanation of command: It’s the mutate
command that’s doing all the work here. We give it a name for the new
column we want to create, in this case diff
. We then tell
it how to calculate that new column. In this case, by subtracting the
congruent reaction time from the incongruent reaction time,
incong - cong
.
Now we have the difference scores, we can take a look at their distribution. We do this using a density plot, as we have on several previous occasions.
Enter the following comment and commands into your R script and run them:
# Display density plot of RT differences
ctrldiff %>% ggplot(aes(diff)) +
geom_density(aes(y = ..scaled..)) +
geom_vline(xintercept = 0, colour = 'red')
The only slightly unusual thing about this command is the
geom_vline
part, which we use to draw a vertical line on
the plot, that intercepts (hits) the x axis at zero
(xintercept = 0
) - see also the Risk Taking worksheet in the Putting R to
Work guide.
In this case, the vertical line allows us to quickly see that most of the distribution is to the right of zero. In other words, for most people, incongruent trials take longer to respond to than congruent trials. On average, it takes around 100 ms longer to react to incongruent trials, although the range is anything from around -200 to +300 milliseconds.
It is these differences, rather than the absolute reaction times for incongruent or congruent trials, that are critical to this within-subjects experiment. In the next section, we use Bayesian analysis to work out how good the evidence is for the claim that incongruent trials are on average slower than congruent trials (i.e. that the difference scores are on average positive).
We can use the command anovaBF
to look at the evidence
for this within-subjects effect of congruency. This command
takes data in the format of our wordctrlCIsum
data frame,
so we’re almost ready to go. But, before we can do this analysis, we
have to tell R which columns of this data frame are
factors.
The word factor is a jargon term for “column that contains
information about the experimental design”. To understand the meaning of
this term, take a look at the wordctrlCIsum
data frame in
your environment. In wordctrlCIsum
, the column
congru
is a factor because it tells us whether the
reaction time comes from an congruent trial (cong
) or an
incongruent trial (incong
). The column subj
is
also a factor, because it tells us which participant the reaction time
comes from. The rt
column is not a factor,
because it contains the data (reaction times) that we’re analyzing.
We tell R this in the following way.
Enter this comment and these commands into your R script and run them:
# Set 'congru' and 'subj' as factors
wordctrlCIsum$congru <- factor(wordctrlCIsum$congru)
wordctrlCIsum$subj <- factor(wordctrlCIsum$subj)
Explanation of command - The first command means
“turn the congru
column of the wordctrlCIsum
data frame into a factor
”. More specifically:
wordctrlCIsum$congru
- This means the
congru
column of the wordctrlCIsum
data
frame
factor(wordctrlCIsum$congru)
- The factor
command tells R that the column wordctrlCIsum$congru
is a
factor.
<-
This means put the thing on the right of
<-
into the thing on the left. In other words, once
factor
has labelled wordctrlCIsum$congru
as a
factor, record that change back into
wordctrlCIsum$congru
(rather than, for example, sending the
data to a different column).
The second command works in the same way as the first.
Now we’ve given R this information about factors, we’re ready to calculate a Bayes Factor for the difference between congruent and incongruent trials. To do this, first we load the BayesFactor package as before.
Enter this comment and command into your R script and run it:
# Load BayesFactor package
library(BayesFactor)
and then we use the following command.
Enter this comment and command into your R script and run it:
# Calculate w/subj Bayesian ANOVA for effect of 'congru' on RT
anovaBF(formula = rt ~ congru + subj,
data = data.frame(wordctrlCIsum),
whichRandom = "subj")
Bayes factor analysis
--------------
[1] congru + subj : 43224861 ±1.12%
Against denominator:
rt ~ subj
---
Bayes factor type: BFlinearModel, JZS
Much of this anovaBF
command is the same as for a Bayesian t-test. For example,
formula = rt ~ congru
means we want to know the effect of
the congruency condition (congru
) on the dependent measure
of reaction time (rt
).
The first part that’s new is that the formula
contains
+ subj
. Unlike for ttestBF
, we always need to
include this for anovaBF
, so it knows which data came from
which participant. What we include is the name of the column that
contains the subject IDs – in this case, subj
.
The other part that’s new is whichRandom = "subj"
. The
command whichRandom
means “tell me which factors
are random factors”. A random factor is one where, if
you ran the experiment again, the levels of the factor would be
different. So, if you ran this experiment again, you’d almost certainly
want to test different people, so the levels of the subj
factor would be different if you re-ran the experiment. This makes it a
random factor. In contrast, if you wanted to re-run this
experiment in the same way as before, you would still have congruent and
incongruent trials, so the levels of the congru
factor
aren’t random. They’re generally described as fixed. The
command anovaBF
assumes factors are fixed unless you tell
it they are random.
The remaining part, data = data.frame(wordctrlCIsum)
,
just tells R where to find the data.
The first part of the output is just a way of showing a ‘progress bar’ on a worksheet. R shows you a progress bar going from 0% to 100% to let you know it’s working on the answer. This analysis may take several seconds to run, so you may end up looking at this progress bar for a while.
The second part of the output gives the Bayes Factor. The key figure
here is 44286971
, which is the Bayes Factor. Note
that your number might be slightly different. This is what the
±1.23%
part of the above output indicates. The number
44286971
is the correct answer, to within about plus or
minus 1%. It’s like saying “this person is 2 metres tall, give or take
5cm”. The command anovaBF
can take a while to run, so we
prioritise speed over accuracy to some extent. If it was important to
get the Bayes Factor precisely correct, we could let R work on the
problem for much longer (e.g. overnight). However, the extra time is
usually not worth it, and we don’t cover how to increase accuracy of the
output in this worksheet.
The Bayes Factor on this occasion is around 40 million, meaning there is really very strong evidence in support of an effect of congruency. Pretty much however unlikely we thought it was beforehand that an incongruent picture would slow down word naming, we should now be pretty confident that it does.
The rest of the output is not crucial to understand right now, but if you are curious take a look at the more on Bayes factors worksheet.
In the above Bayes Factor calculation, we looked at just the
congruent and incongruent trials of the experiment. However, this
experiment had a third level on the condition factor –
there were also neutral trials where no picture was shown. These data
are in the wordctrl
data frame that you made. In this
section, we’re going to do an analysis using all three conditions.
Unlike ttestBF
, anovaBF
can cope with a
factor than has more than two levels. This is true for
both between-subject and within-subject factors, although we’ll focus on
within-subject factors in this worksheet.
Add the following comment to your script:
# Group 'wordctrl' by 'subj' and 'congru'; calculate mean RT; place in 'wordctrlsum'
and then write and run a command to create a
subject-level summary of mean reaction times, as you did before. However, this time, you’ll need all three
conditions in that summary, so use your wordctrl
data
frame, rather than your wordctrlCI
data frame. Put your
summary into a data frame called wordctrlsum
.
EXPECTED OUTPUT: There will be a new data frame
wordctrlsum
in your Environment, with 420 rows and
3 columns.
Add the following comment to your script:
# Display density plot of RT, by 'congru'
and then write and run a command that produces a
density plot that contains all three conditions. You can do this the
same way as before,
just use wordctrlsum
instead of
wordctrlsumCI
.
EXPECTED OUTPUT: This is what the graph should look like:
The main thing to notice about this density plot is that the reaction times for neutral trials are intermediate between the congruent and incongruent trials, both in terms of their mean, and in terms of their variance.
It’s still the case that it is the difference between conditions that is key here, because this is a within-subjects experiment. However, there are three differences we could work out here (incongruent - neutral, neutral - congruent, incongruent - congruent), so we’re going to leave further graphing until a bit later.
The command anovaBF
allows us to calculate a Bayes
Factor across all three conditions, all we need to do is give it the
appropriate subject-level summary (wordctrlsum
in this
case). However, don’t forget that, before anovaBF
can work,
it needs to know which columns of wordctrlsum
are factors
(you will need to set subj
and congru
as
factors). So add the following comments to your script:
# Set 'congru' and 'subj' as factors
# Calculate w/subj Bayesian ANOVA for effect of 'congru' on RT
and then insert and run some commands to perform a Bayesian ANOVA that includes all three levels of the congruency condition.
EXPECTED OUTPUT: The correct output is as follows (your Bayes Factor may be slightly different):
Bayes factor analysis
--------------
[1] congru + subj : 1.741829e+14 ±0.66%
Against denominator:
rt ~ subj
---
Bayes factor type: BFlinearModel, JZS
The Bayes Factor here is approximately 1.8e+14
, which is
read as \(1.8 \times 10^{14}\) (click
here
if you need a reminder about scientific notation). So, the
Bayes Factor is overwhelmingly in support of the conclusion that there
is a difference between these three conditions.
It’s important to be clear what the above Bayes Factor tell us. It gives us evidence that the three conditions (congruent, incongruent, neutral) are not all the same. This is worth knowing, but often we want evidence for more specific hypotheses. For example, does the presentation of a congruent picture speed responding? If so, there should be evidence that congruent trials are faster than neutral trials. The Bayes Factor we have just calculated does not tell us that – it just tells us the evidence level for the claim that the three conditions are different.
Calculating the evidence for a particular pair in an experiment is called doing a pairwise comparison. You’ve in fact already done this – you removed the neutral trials from the data set and calculated a Bayes Factor for the congruent vs. incongruent comparison. You can use the same techniques to look at any pair.
The goal in this exercise is to look at the reaction time difference between congruent and neutral trials in the control condition of this experiment.
Specifically, the task is to write a new R script that contains all the necessary commands, and only the necessary commands, to do all of the following, in this order:
# Within-subject differences
# EXERCISE
Start your script with the command rm(list=ls())
(preceded by a suitable comment). This command clears your
Environment and so ensures your script is not relying on
commands you have previously run.
Load the packages you need (tidyverse
and
BayesFactor
). Precede with an appropriate comment.
Load the data file wordnaming2.csv (using
read_csv
). Precede with an appropriate comment.
Preprocess the data to include only the control condition, and
exclude the incongruent trials. Use the
filter
command to do this. Precede with an appropriate
comment.
Generate a subject-level summary of these filtered data (use the
group_by
and summarise
commands). Precede with
an appropriate comment.
Show a density plot of the RT differences for congruent versus
neutral trials. You’ll need to use pivot_wider
and
mutate
to get the data in the right format, and then use
ggplot
to produce the graph. Precede with an appropriate
comments.
Calculate a Bayes Factor for this difference. Use
anovaBF
to do this, and don’t forget to use the
factor
command to tell R which columns of your data frame
are factors. Precede with an appropriate comments.
Other requirements:
ctrlCI
if in fact it contains the congruent and
neutral trial types (call it e.g. ctrlCN
).Your output should look like the example below. Note that your Bayes Factor may be slightly different to the one reported here, for the reasons we covered earlier.
Once it does, copy and paste your R script for this exercise into PsycEL. REMEMBER: This should be all the code you need to complete this exercise, and only the code you need to complete this exercise (not any other code from this worksheet). Even with comments, it shouldn’t be longer than around 25 lines of code. Your code has to run, produce the correct answer, and not include irrelevant code, to pass this exercise.
Bayes factor analysis
--------------
[1] congru + subj : 7050364 ±2.15%
Against denominator:
rt ~ subj
---
Bayes factor type: BFlinearModel, JZS
Just before we finish, some notes on the issue of multiple comparisons, just because this is something you may hear about elsewhere.
To explain the concept of multiple comparisons, I’d like you to
imagine you had a factor with five levels. The command
anovaBF
would cope just fine with this. But, with five
levels, there are ten pairwise comparisons you could do (1 versus 2, 1
versus 3, etc.). In traditional analysis techniques (i.e. techniques
resulting in p values) you will sometimes hear people talking about
‘correcting for multiple comparisons’. This is because the more tests
you do, the more likely it is that one of them will be ‘significant’
(\(p < .05\)), even if there is in
fact no difference between any of the levels of the factor. One common
suggestion is to use a stricter significance level (e.g. \(p < .01\)) if you’re doing a lot of
tests.
This is not an approach covered in these worksheets. This is for two reasons:
First, we’re using Bayesian methods, not traditional ANOVA. Although issues around multiple comparisons still occur with Bayesian techniques, the issues are not the same, the solutions are not same, and the whole thing is just a bit too complex for this intermediate-level worksheet.
Second, correcting for multiple comparisons is seldom the answer anyway. If you knew you needed to compare those two conditions before you looked at the data, this is known as a confirmatory analysis, and we don’t generally adjust for multiple comparisons for such analyses. Alternatively, you might be exploring your data after you collected it. For example, you might observe a difference you didn’t expect to see, and do a test on this. This is called exploratory analysis, and it’s fine — it’s good to fully explore your data. But, results from exploratory analysis should normally be replicated before we are particularly convinced by them, whatever the p value (or Bayes Factor). In other words, you should run the study again, predict you would see the difference again, and then test for it. This is because, if you look long and hard enough, you’ll always find some pattern in a set of data, even if there is not actually anything to be found.
This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.