Before starting this exercise, you should have completed all the Absolute Beginners’ workshop exercises. If not, take a look at those exercises before continuing. Each section below also indicates which of the earlier worksheets are relevant.
Relevant worksheet: Intro to RStudio
You’ll need to complete the Psych:EL excerise to get the CSV file containing your data. You’ll also need to get an older person to complete the same Psych:EL exercise to get a second CSV file containing their data.
Once you have downloaded your CSV files, open a project on RStudio Server for this analysis, create a script file, and upload your CSV to your project.
Plymouth University students: Create/open your
project named psyc415
; within that create a script file
called risk-rat.R
. Enter all commands into that script and
run them from there.
Relevant worksheet: Exploring data
Load the tidyverse package, and load your own data, and that of your older participant.
# Risk-taking
# Load tidyverse
library(tidyverse)
# Load my data into 'risk.me'
risk.me <- read_csv("riskrat.csv")
# Load older participant's data into 'risk.other'
risk.other <- read_csv("riskrat-other.csv")
Note: Everyone’s CSV files have different names. For
example, yours might be called 10435678you.csv
and
10435678other.csv
. In the example below, you’ll need to
replace riskrat.csv
and riskrat-other.csv
with
the name of your personal CSV files.
Look at the data by clicking on it in the Environment tab in RStudio. Each row is one person’s rating for one question. Here’s what each of the columns in the data set contain:
Column | Description | Values |
---|---|---|
SRN | Your Student Reference Number | |
who | Is this data from you, or from the other (older) person you tested? | “you”, “other” |
group | Which sort of risk-taking behaviour is this question about? | “ethical”, “financial”, “health”, “social”, “recreation” |
qu | This number uniquely identifies the question that was asked | 1 - 26, e.g. qu. 19 is “Taking a skydiving class” |
rating | The rating given in response to this question | 1 - 7, higher numbers = more likely to engage in the risky behaviour described in the question. |
Relevant worksheets: Group Differences
How highly did you score on each of the types of risk-taking behaviour (e.g. ethical, financial, …)?
To look at this, we take the average (mean) rating you made for each
type of behaviour. To do this, we use the group_by
and
summarise
commands you learned in the Group
Differences worksheet.
# Group by 'group', calculate mean of 'rating'
risk.me %>% group_by(group) %>% summarise(mean(rating))
# A tibble: 5 × 2
group `mean(rating)`
<chr> <dbl>
1 ethical 6.5
2 financial 3.8
3 health 5.2
4 recreation 4.17
5 social 2.83
As before, you can safely ignore the “ungrouping” message that you receive.
NOTE: Your output should look similar to that shown above, but the numbers will be different.
Which types of risk-taking behaviours did you score highest on? And lowest on?
Relevant worksheets: Group Differences
People tend to be less likely to take risks as they get older. Is this the case for you and the older adult you tested? In order to answer this question, we first have to put your data, and that of your older adult, together in one data frame.
We can use the bind_rows
command to combine two data
frames, like this:
# Combine 'risk.me' and 'risk.other' into one data frame
risk <- bind_rows(risk.me, risk.other)
Now we can compare you and your older adult on your overall mean
risk-taking score. We do this by grouping by the who
variable in the risk
data frame. If you get this
right, your output will look a bit like this, although the exact numbers
will be different:
# Group 'risk' by 'who', calculate mean of 'rating'
risk %>% group_by(who) %>% summarise(mean(rating))
# A tibble: 2 × 2
who `mean(rating)`
<chr> <dbl>
1 other 3.54
2 you 4.35
Who scores higher on risk taking – you, or your older adult?
Enter the exact risk-taking score for you and your older adult into your lab book.
This part of the exercise can only be completed once sufficient number of people have completed the risk-taking questionnaire on Psych:EL. When this happens, you will be able to download everyone’s data from Psych:EL as a CSV file. Download that file, and copy it into your RStudio project (the project you generated at the beginning of this exercise).
Relevant worksheet: Exploring data
Load the tidyverse package, and load your everyone’s data.
# Load data into 'risk.all'
risk.all <- read_csv("riskrat-all.csv")
Look at the data by clicking on it in the Environment tab in RStudio. You’ll see it has the same columns as the the other data files, it just has a lot more rows (because it contains a lot of participants).
Relevant worksheet: Group Differences
Let’s start by looking at the range of scores your peers got on this
questionnaire. The first thing we’ll need to do is filter the
data so it only contains your classmates, not the older adults. This is
because older adults tend to score lower on risk taking than younger
adults, and so it’s best to compare your score to people who are closer
to your own age. We do this using the filter
command you
learned in the Group Differences worksheet. Here, we want to
keep all ratings where the column who
says you,
because these are the ratings for when your peers are answering the
questionnaire themselves. We can filter like this:
# Filter the 'you' data into 'risk.young'
risk.young <- risk.all %>% filter(who == "you")
Now we can look at the range of scores given by your peers. A
density plot is a good choice for this, which you learned to
produce in the Group Differences worksheet. Here, we’re going
to make a density plot of the data in column rating
of the
risk.young
data frame:
# Display density plot of 'rating'
risk.young %>% ggplot(aes(rating)) + geom_density(aes(y = ..scaled..), adjust = 2)
Note: You may have noticed the addition of
adjust = 2
in the above command, which we didn’t use in the
Group Differences worksheet. The adjust
command
changes how smooth the density plot looks, with higher numbers making
for smoother plots. Try changing the value to see what effect it has on
your plot.
In this particular plot, a rating of around 5 is the most common, with higher and lower ratings becoming increasingly less likely. But where does your score fit on this distribution? You’ve already calculated your overall score, so you can make this comparison manually, but we can also draw a line on this density plot representing your score, which is more immediately interpretable.
To do this, we use the command geom_vline
(vline
being short for “vertical line”) to draw a line on
the plot to show your score. Replace the number 4.35
in the
command below with your score:
# Plot as above, with vertical line added
risk.young %>% ggplot(aes(rating)) + geom_density(aes(y = ..scaled..), adjust = 2) +
geom_vline(xintercept = 4.35)
In the above example, the individual’s score is close to the centre of the distribution. Are you towards the bottom, towards the top, or near the middle?
Finally, we’ll make this plot a bit prettier by the addition of some
colour. Here, I’ve used some fair ugly colours, for your plot use a
lightblue
fill and a red
line:
# Plot as above, with a green fill and yellow line
risk.young %>% ggplot(aes(rating)) +
geom_density(aes(y = ..scaled..), adjust = 2, fill = "green") +
geom_vline(xintercept = 4.35, colour = 'yellow')
Use RStudio to export your light blue and red graph as an Image, and upload it to your lab book.
This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.