Before starting this exercise, you should have completed all the Absolute Beginners’ workshop exercises. If not, take a look at those exercises before continuing. Each section below also indicates which of the earlier worksheets are relevant.
Relevant worksheet: Intro to RStudio, Exploring data
In order to complete this worksheet, you’ll need to have downloaded your CSV file from the PsycEL exercise. See the instructions on PsycEL for how to do this.
Once you have downloaded your CSV file, open a project on RStudio Server for this analysis, create a script file, and upload your CSV to your project.
Plymouth University students: Create/open your
project named psyc412
; within that create a script file
called memories.R
. Enter all commands into that script and
run them from there.
Finally, load the tidyverse package, and load your data.
# Memories from life
# Load package
library(tidyverse)
# Load data
mems <- read_csv("memories-single.csv")
Your CSV file may have a different name to the example above. If so,
you will need to change memories-single.csv
to the name of
your file.
Relevant worksheet: Exploring data.
Are memories from all time periods about equally common? Or are
recent memories more common than remote ones? Or perhaps some other
pattern? A histogram can help us to answer this question by visualising
our data. You covered how to make a histogram in the Exploring
Data worksheet. In this case, our data of interest are in the
period
column of the mems
data frame, so the
command we use is:
# Plot a histogram of 'period'
mems %>% ggplot(aes(period)) + geom_histogram(binwidth=.5)
Your histogram will look something like the above, but the heights of the bars will likely be somewhat different.
The binwidth
has been set to .5 here to make a gap
between each bar in the histogram. Try changing binwidth
to
1 to see what effect it has on your plot.
Not bad…but it could be better. In particular, having the time
periods labelled as numbers doesn’t make for a very readable graph; it
would be better if we used more meaningful labels. We can use the
scale_x_continuous
command of ggplot to add our
own labels to a histogram:
# Plot the histogram with labels on the x-axis
mems %>% ggplot(aes(period)) +
geom_histogram(binwidth=.5) +
scale_x_continuous(limits = c(0.75,5.25), breaks = 1:5, labels = c("Fred", "Wilma", "Barney", "Betty", "Pebbles"))
Explanation: The command
scale_x_continuous
contains the words breaks
,
labels
and limits
.
breaks = 1:5
tells R we want a bar for each of the
periods 1, 2, 3, 4 and 5.
labels
gives the label for each of those bars, in
order.
limits
tells R what to use as the minimum and maximum
values for the x axis. It is important to include this as well as
setting breaks
because otherwise R will ignore zero-height
bars. We set the range from 0.75 to 5.25 (rather than from 1 to 5)
because we need to leave room for the width of the bars (which we set to
.5 using binwidth
)
# EXERCISE
Add the above to your script and then complete the exercise below
Export your histogram, using the Export icon on RStudio’s Plots window, and selecting “Save as image…”. Give it a meaningful file name (e.g. “memories-hist”) and click ‘Save’.
Download your histogram from RStudio server - see these instructions for a reminder of how to do this.
Upload your histogram to PsycEL (see the PsycEL activity for instructions of how to do this).
This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.