Before starting this exercise, you should have completed all the relevant Absolute Beginners’, Part 1 worksheets. Each section below indicates which of the earlier worksheets are relevant.
Relevant worksheet: Intro to RStudio
In this excercise, you’ll be analysing some data that you and your peers recently collected. To get this data into R, follow these steps:
Open a RStudio project for this analysis and, within that, create a script file.
Upload the CSV file you have been given for this activity into your RStudio project folder. If you want to try out this worksheet without that data file, you can use this example CSV file instead. You can only complete your PsycEL activity if you use the CSV file you were sent.
Load the tidyverse package, and then load your data into R.
# Load tidyverse
library(tidyverse)
# Load data
data <- read_csv("green.csv")
Note: In the example above, you’ll need to replace
green.csv
with the name of the CSV file you just uploaded
into your RStudio project.
Look at the data by clicking on it in the Environment tab in RStudio.
Each row is a rating by one participant in this study of creativity. Groups of participants came up with a creative solution to a problem, while either taking a walk in an urban environment or a nature environment. Each of these solutions has been rated for creativity by a set of raters.
Will the nature environment lead to more creative ideas than the urban environment?
Column | Description | Values |
---|---|---|
Solution | Number of the Solution | a number |
Rater | Reference number of the person rating the solution | a number |
Cond | Which environment was the creator in? | “Urban”, “Nature” |
score | How creative was the idea rated to be? | 0-100, higher numbers = more creative |
Relevant worksheet: Group Differences
We start by “pre-processing” our data, in order to make it easier to analyse. We do this in two steps:
In some cases, a participant did not provide a rating of a solution –
this is then represented in the dataset as NA
. R uses
NA
to specify that this data point is missing – in this
case, because the participant didn’t respond.
Although it’s good to explicitly record that a response was not made,
keeping these NA
in the dataset will cause problems later
on, so we’re going to remove them:
# Remove NAs
data <- data %>% drop_na(score)
The command drop_na(score)
is new – it just means remove
the rows of the dataset where the score is recorded as NA
.
The rest of the command uses things we covered in the Group Differences
worksheet – the dataframe data
is sent (i.e., piped,
%>%
) to the drop_na()
command, which
removes the NA
, and the results are stored
(<-
) back in the data
dataframe.
Each solution was rated by several people. We’re going to take the
average (mean) of those ratings, so we’re left with one creativity score
per solution. We use the group_by
, summarise
,
and mean
commands we used in the Group Differences
worksheet to do this:
# Group by 'Cond' and 'Solution', calculate mean score; place results into 'creative'
creative <- data %>% group_by(Cond, Solution) %>% summarise(score = mean(score))
`summarise()` has grouped output by 'Cond'. You can override using the
`.groups` argument.
As before, you can safely ignore the “ungrouping” message that you receive.
If we look at this summarized data, by clicking on the Environment tab of RStudio, we can see that we now have one creativity score per solution.
Relevant worksheets: Group Differences, Evidence
We start by looking to see how the mean creativity scores differ for
those who were in a nature or an urban environment. We can do this using
the group_by
and summarise
functions in a
similar way to before, but on our preprocessed data, which we have
stored in the data frame creative
:
# Group by 'Cond', calculate mean score.
creative %>% group_by(Cond) %>% summarise(mean(score))
# A tibble: 2 × 2
Cond `mean(score)`
<chr> <dbl>
1 Nature 42.7
2 Urban 39.2
Your output will look similar to this, but the numbers will probably be different. In this example, it looks like there’s a small difference, with the creativity ratings slightly higher in the Nature environment – but how does this between-group difference compare to the within-group variability? As we covered in the Group Differences worksheet, this is most easily looked at with a scaled density plot:
# Display density plots of 'score', by 'Cond'
creative %>% ggplot(aes(score, colour = factor(Cond))) + geom_density(aes(y = ..scaled..)) + xlim(0, 100)
Explanation of command: The only new part here is
xlim(0, 100)
, which sets limits on the
x-axis of your graph. Specifically, it forces the
lowest value on the x-axis to be 0 and the highest value to be 100.
Without xlim
, R chooses limits that it thinks are sensible.
Like all computer programs, R isn’t that bright, so often it makes sense
to tell it more precisely what you want.
In this example, the graph tells a somewhat different story to the means - although a difference between groups is visible, it is small compared to the variability within each group.
We can express the size of the difference in means, relative to the within-group variability, as an effect size. As we said in the Group Differences worksheet, we calculate an effect size in R like this:
# Load a package that calculates effect sizes
library(effsize)
# Calculate Cohen's d for the effect of 'Cond' on 'score'
cohen.d(creative$score ~ creative$Cond)
Cohen's d
d estimate: 0.2145072 (small)
95 percent confidence interval:
lower upper
-0.5973451 1.0263596
In this example, the effect size is around 0.21, which is typically described as a small effect. The effect size for your data may be different.
At this point, the most pressing question is probably whether the difference observed in the mean scores is likely to be real, or whether it’s more likely down to chance. As we saw in the Evidence worksheet, the best way to look at this is with a Bayesian t-test:
# Load BayesFactor package
library(BayesFactor, quietly = TRUE)
# Calculate Bayesian t-test for effect of 'Cond' on 'score'
ttestBF(formula = score ~ Cond, data = data.frame(creative))
Bayes factor analysis
--------------
[1] Alt., r=0.707 : 0.4053154 ±0%
Against denominator:
Null, mu1-mu2 = 0
---
Bayes factor type: BFindepSample, JZS
The Bayes Factor in this case is approximately a 1/2 (0.41 to be more precise), meaning it’s about twice as likely there isn’t a difference as there is. Your number will likely be a bit different.
Enter the mean creativity score for each condition, the effect size, and the Bayes Factor for the difference, into PsycEL.
Using the convention that there is a difference if BF > 3, there isn’t a difference if BF < 0.33, and if it’s between 0.33 and 3, we’re unsure, select difference, no difference, or unsure, on PsycEL.
This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.