Before starting this exercise, you should have completed all the relevant Absolute Beginners’, Parts 1 & 2 worksheets. Each section below indicates which of the earlier worksheets are relevant.
Relevant worksheet: Intro to RStudio
You and your partner must first complete the behaviour coding exercise. You’ll then get a CSV file that contains both your ratings.
Once you have your CSV file, open a project on RStudio Server for this analysis, create a script file, and upload your CSV to your project.
Plymouth University students: Create/open your
project named psyc416
; within that create a script file
called lions.R
. Enter all commands into that script and run
them from there.
Relevant worksheet: Exploring data
Load the tidyverse package, and load your data.
# Animal welfare
# Load tidyverse
library(tidyverse)
# Load data into 'animals'
animals <- read_csv("animals.csv")
Note: Everyone’s CSV file will have a different
name. In the example above, you’ll need to replace
animals.csv
with the name of your personal CSV file.
Look at the data by clicking on it in the Environment tab in RStudio. Each row is one time point in the video you coded. Here’s what each of the columns in the data set contain:
Column | Description | Values |
---|---|---|
time | Time point. | 1 - 10 |
period | How long before feeding time? (in minutes). | 10 or 180 |
behav.r1 | Rater 1’s coding of the animal’s behaviour at each time point. | In the example file, you’ll find: “pacing”, “sleeping”, “standing”, “lying”, “running”. Your codes may be different. |
behav.r2 | Rater 2’s coding of the animal’s behaviour at each time point. | as above. |
loc.r1 | Rater 1’s coding of the animal’s location at each time point. | In the example file, you’ll find: “zone_1”, “zone_2”, “zone_3”, “zone_4”. Your codes may be different. |
loc.r2 | Rater 2’s coding of the animal’s location at each time point. | as above. |
Relevant worksheet: Inter-rater reliability
To what extent did you and your workshop partner agree on how each
behaviour should be coded? As we covered in the inter-rater
reliability worksheet, to look at this, we first have to
select
the relevant columns of the data frame. For example,
to look at inter-rater reliability for the behaviour category,
we select:
# Select columns; insert into 'behav'
behav <- animals %>% select(behav.r1, behav.r2)
We can now use the agree
command to work out percentage
agreement:
# Load inter-rater-reliability package
library(irr)
# Calculate percentage agreement
agree(behav)
Percentage agreement (Tolerance=0)
Subjects = 20
Raters = 2
%-agree = 70
NOTE: If you get an error here, type
install.packages("irr")
, wait for the package to finish
installing, and try again.
The key result here is %-agree
, which is your percentage
agreement. The term Subjects
here is a bit misleading, it
doesn’t mean the number of animals you observed (this data file contains
your ratings of one animal), it means the number of time points you
recorded an observation for.
Enter the percentage agreement for your behaviour and location codings into PsycEL.
Relevant worksheet: Inter-rater reliability
One problem with the percentage agreement measure is that
people will sometimes agree purely by chance. Jacob Cohen thought it
would be much neater if we could have a measure of agreement where zero
always meant the level of agreement expected by chance, and 1 always
meant perfect agreement. To calculate his measure, Cohen’s kappa, in R
we use the command kappa2
:
# Calculate Cohen's Kappa
kappa2(behav)
Cohen's Kappa for 2 Raters (Weights: unweighted)
Subjects = 20
Raters = 2
Kappa = 0.559
z = 4.17
p-value = 3.1e-05
Enter the Cohen’s kappa values for your behaviour and location codings into PsycEL.
There are some words that psychologists sometimes use to describe the level of agreement between raters, based on the value of kappa they get. These descriptions are listed in the inter-rater reliability worksheet, in the section “Describing Cohen’s kappa”.
On PsycEL, select the correct term to describe the kappa values for your behaviour and location codings.
If either of those descriptions are ‘moderate’ or lower, reflect on why that might be. For example, is there a problem with the definitions of the behavioural categories you used? What else might have caused the lack of agreement?
Write a few sentences into PsycEL summarizing your reflections.
Relevant worksheet: Relationships
Does the animal behave differently when it’s close to feeding time?
To look at this, we need to calculate the frequency of each behaviour at
our two time periods (10 minutes before feeding, and 180 minutes before
feeding). You can use the table
command we learned in the
Relationships worksheet to do this, but you’re going to have to
choose which behaviour, and which of your two raters, to look at. That’s
because it’s likely you will have had at least a few disagreements. But
if both of you were looking at the same behaviour, how can we decide who
was ‘right’? There are a few possible solutions, but for now we will
take the simplest: flip a coin to decide which of your raters’ data you
will use.
If you choose “behaviour” and rater 2, the commands would be:
# Create contingency table for 'period' by 'behav.r2'
cont <- table(animals$period, animals$behav.r2)
# Display contingency table
cont
lying pacing sleeping standing
10 3 1 5 1
180 2 1 6 1
What you have just done here, as we covered in the relationships worksheet, is to convert
your data frame, called animals
, into a
contingency table, called cont
. This contingency
table shows how often each behaviour occurs at each time period. Recall
that table(rows, columns)
is the command used in R for
producing contingency tables. We replace the word rows
with
the name of the variable we want to appear on the rows of the table, and
we replace the word columns
with the name of the variable
we want to appear in the columns of the table.
Relevant worksheet: Face recognition
To visualize the relationship between behaviour and feeding time, we’re going to use a bar chart. We covered bar charts in the Face recognition worksheet; here we’re going to extend that example to create a bar chart that shows our two different time periods on the same axes.
# Turn contingency table into a data frame, 'df'
df <- data.frame(cont)
# Name the columns of the data frame
colnames(df) <- c("Period", "Behaviour", "Frequency")
# Display a bar chart of 'Frequency' by 'Behaviour', grouping by 'Period'
df %>% ggplot(aes(x = Behaviour, y = Frequency, fill = Period)) + geom_col(position = "dodge")
This graph command goes a bit beyond what we’ve covered in previous worksheets, so here’s an explanation of how the new bits work:
df <- data.frame(cont)
- A data frame is the
standard way R stores data (e.g. animals
is a data frame).
The ggplot
commands expects to get a data frame, and gets
upset if it gets something else, like a contingency table. So, the first
thing we do is make a data frame version of cont
(our
contingency table), and give it a name (df
in this
case).
If you click on df
in the Environment tab of
RStudio, you’ll see that the rows of the contingency table have been
called “Var1” and the columns have been called “Var2”. These are not
very meaningful labels, so we use the colnames
command
(short for “column names”) to give them more meaningful names. This will
make our graph clearer. We do this using the command:
colnames(df) <- c("Period", "Behaviour", "Freq")
.
ggplot(aes(x = Behaviour, y = Freq, fill = Period))
- As
in previous bar graphs you’ve made, you need to tell ggplot
which data is on the x axis, and which is on the y axis. The new bit
here is that we also tell ggplot
to produce two different
colours of bars, with the colour depending on Period
.
geom_col(position="dodge")
- As before,
geom_col
is the command for a “column” plot (aka. a bar
chart). The new part here is position="dodge"
; this tells
ggplot
that you want the two different colours of bars to
be placed side-by-side, rather than directly on top of each other
(i.e. you want them to “dodge” each other).
In the above example graph, we can see that the animal was pacing and standing as often in the two time periods, but was lying slightly more 10 minutes before feeding time than 180 minutes before feeding time, and sleeping slightly more 180 minutes before feeding time than 10 minutes before. What do your data show? Did proximity to feeding time have an effect on behaviour? If so, which behaviours were most affected?
Enter a few sentences into PsycEL describing what your data show.
Export your graph, using the Export icon on RStudio’s Plots window, and selecting “Save as image…”. Give it a meaningful file name (e.g. “feed-time”) and click ‘Save’.
Download your graph from RStudio server - see these instructions for a reminder of how to do this.
Upload your graph to PsycEL (see the PsychEL activity for instructions of how to do this).
This material is distributed under a Creative Commons licence. CC-BY-SA 4.0..