This is an advanced worksheet, which assumes you have completed the Absolute Beginners’ Guide to R course, the Research Methods in Practice (Quantitative section) course, and the Intermediate Guide to R course.
This worksheet describes a full analysis pipeline for an undergraduate student dissertation on children’s language development. This study was an experiment which evaluated the Words in Game (WinG) test. WinG consists of a set of picture cards which are used in four tests: noun comprehension, noun production, predicate comprehension, and predicate production. The Italian and English versions of the WinG cards use different pictures to depict the associated words.
An earlier study found a difference between the English and Italian cards, for adults’ ratings of how well each picture represented the underlying construct. In this study, the researchers hypothesised that this difference would influence children’s WinG task scores, depending on which set of cards they were tested with. The experiment compared WinG performance of English-speaking children, aged approximately 30 months, tested with either the Italian or English cards. Therefore, this was a 4 (WinG task) x 2 (cards) mixed design.
Open the rminr-data
project we used previously.
Ensure you have the latest files by asking git to “pull
” the repository. Select the Git
tab, which is located in the row of tabs which includes the Environment
tab. Click the Pull
button with a downward pointing arrow. A window will open showing the files which have been pulled from the repository. Close the Git pull
window. The case-studies
folder should contain a folder named allegra-cattani
.
Next, create a new, empty R script and save it in the rminr-data
folder as wing.R
. Put all the commands from this worksheet into this file, and run them from there. Save your script regularly.
We start by reading the data.
rm(list = ls()) # clear the environment
library(tidyverse)
# read data
demographics <- read_csv('case-studies/allegra-cattani/demographics.csv')
nc <- read_csv('case-studies/allegra-cattani/noun_comprehension.csv')
np <- read_csv('case-studies/allegra-cattani/noun_production.csv')
pc <- read_csv('case-studies/allegra-cattani/predicate_comprehension.csv')
pp <- read_csv('case-studies/allegra-cattani/predicate_production.csv')
Explanation of commands:
tidyverse
package.Next we preprocess the demographics data.
# preprocess demographics data
demographics <- demographics %>%
set_names(~ str_to_lower(.)) %>%
mutate(subj = factor(subj), gender = factor(gender)) %>%
mutate(gender = recode_factor(.$gender, Male = 'male',
Female = 'female')) %>%
select(subj, gender, cdi_u, cdi_s)
Explanation of commands:
We convert the column names to lower case using set_names(~ str_to_lower(.))
. We select the gender
, cdi_u
and cdi_s
columns. These latter two columns contain scores for the Communicative Development Inventory (CDI). The CDI is a list of 100 words with columns ‘understands’ and ‘says’. For each word, the child’s parent placed a tick in these columns if they thought their child could understand and/or say the word. The scores for ‘understands’ (cdi_u
) and ‘says’ (cdi_s
) are a total of the ticked boxes in the two columns. We convert gender
to a factor and recode it’s levels to lower case. Finaly we select()
the columns we want to keep.
Our demographics
data frame now contains the child’s subject number, their gender and their parents’ CDI scores:
subj | gender | cdi_u | cdi_s |
---|---|---|---|
1 | female | 62 | 38 |
2 | male | 60 | 59 |
3 | female | 97 | 85 |
Next we preprocess the WinG data.
# preprocess noun comprehension data
nc <- nc %>%
set_names(~ str_to_lower(.)) %>%
mutate(subj = factor(subj), cards = factor(cards)) %>%
select(subj, cards, mountain:wellyboots)
Explanation of commands:
We convert the column names to lower case. The cards
column indicates whether the child was tested with the Italian or English cards, so we convert it to a factor along with subj
. We select these columns and the columns containing rating data for the words used in the the noun comprehension task. These are in the column range mountain:wellyboots
.
Here are the first few rows of nc
:
subj | cards | mountain | motorbike | penguin | box | apple | iron | cow | doll | hat | watch | sofa | clouds | hammer | glasses | elephant | backyard | spoon | bib | sink | wellyboots |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | english | D | C | C | C | C | P | C | C | C | P | D | C | D | C | C | P | C | P | P | C |
2 | italian | C | C | C | C | C | C | C | C | C | C | C | C | C | NR | C | D | C | C | C | C |
3 | english | C | C | C | C | C | C | C | C | C | C | C | C | C | C | NR | NR | C | C | C | C |
Before we go any further, we need to apply some exclusion criteria to our data. When the experimenters test children with WinG, they use a textual coding scheme to record each child’s responses. The codes we’re interested in here are:
N/A
(not to be confused with the R
data type NA
) - researchers used this code to indicate that the task was interrupted for some reason (e.g. the child began crying).C
or C*
- C
indicates that the child responded correctly for the picture on the card, C*
indicates that the response was a correct synonym. In this experiment, both of these values are considered correct responses.NTS
- This stands for “non-target but semantically related”. This code is used in the noun and predicate production tests, for example if the picture on the card was a house but the child said “hut”.In this experiment, the exclusion criteria were as follows:
Although these exclusion criteria are quite easy to describe, writing the code to apply them is a bit more complex. Furthermore, we need to apply these criteria separately for each of the four WinG tasks. Writing the same code four times would make our program harder to read and is error prone. If we were to discover that we’d made a mistake, we would have to fix it in four different places. Fortunately, we can just write the code once and then reuse it. We do this using a function.
Functions are reusable lines of code which perform a particular function (hence the name), and are one of the most powerful features of programming languages such as R. At this point you will be familiar with using functions. For example rm()
is a function for removing objects from the R environment, and is part of the base R
language. The rm()
function is reusable in the sense that you can use the same function to do slightly different things. For example, rm('foo')
would remove the single object foo
from your environment. The string foo
is called an argument and is the data processed by rm()
. The same function can also remove multiple variables provided in a list, so calling rm(ls())
, first calls ls()
which lists all objects in your environment, and passes the results as an argument to rm()
which removes them, thereby cleaning your environment of all objects. Similarly, you’ve used read_csv()
, with different filename arguments to read different data files. The read_csv()
function is part of the readr
package which is loaded when you call library(tidyverse)
(library()
is another base R
function).
Removing objects from the environment is such a common task that it’s a function included with R
. Reading files is almost as common, which is why it is part of the readr
package that you load with library(tidyverse)
. However, R
also includes a special function called function()
, which you can use to define your own functions. This is not particularly complicated, but requires a little explanation if you’ve never encountered the idea before (you can find more details in the R manual). We’ll explain the steps by writing a function called exclude()
, allowing us to apply our exclusion criteria for each of the four WinG tests.
# apply WinG exclusion criteria
exclude <- function(df) {
# exclude if N/A in item 17 or lower
logical_matrix <- df == 'N/A'
q17 <- logical_matrix %>%
which(arr.ind = TRUE) %>%
data.frame() %>%
group_by(row) %>%
summarise(min = min(col)) %>%
mutate(subj = factor(row)) %>%
select(subj, min)
q17 <- left_join(df, q17, by='subj') %>%
replace_na(list(min = 20))
q17 <- q17 %>% filter(min > 17) %>% select(-min)
# calculate total correct and semantically related
q17 <- q17 %>% mutate(correct = rowSums(. == 'C' | . == 'C*'),
related = rowSums(. == 'NTS'))
# exclude participants with scores < 2 sd below the mean
q17 <- q17 %>% filter(correct < mean(correct) + 2 * sd(correct))
return(q17)
}
Explanation of commands:
exclude()
function accepts a data frame argument df
, containing the data for a single WinG task.logical_matrix <- df == 'N/A'
converts df
to a matrix containing the value TRUE
where a cell in df
contained the string N/A
, or FALSE
otherwise.which(arr.ind = TRUE)
creates a matrix with columns row
(this will be the same as our subject number) and col
containing the row and column number of any cells with the value TRUE
(the cells that originally contained N/A
).row
.summarise(min = min(col))
summarises the grouped data by creating a single row containing the lowest column number that contained N/A
.mutate(subj = factor(row)) %>% select(subj, min)
creates a factor subj
from row
, and selects this along with the column number of the first column that contained N/A
. We have now converted our original data into a data from with rows for any subjects who had at least one N/A
and the number of the first word where the task was interrupted.q17 <- left_join(df, q17, by='subj')
joins this data frame to the data frame we passed to exclude()
. For rows where there is no matching subj
, min
gets the value NA
. replace_na(list(min = 20))
converts these NA
s to the value 20
, indicating the child provided an answer for all words.q17 <- q17 %>% filter(min > 17) %>% select(-min)
selects only children who provided answers beyond word 17, and then removes the min
column, as it’s no longer needed.q17 <- q17 %>% mutate(correct = rowSums(. == 'C' | . == 'C*'))
creates a new column correct
with a value for each row which is the sum of the number of columns containing C
or C*
(|
means ‘or’). Cells containing C
indicate that the child responded correctly for the picture on the card for the the word in this column. Cells containing C*
indicates that the response was a correct synonym. In this experiment, both of these values are considered correct responses. Similarly, we calculate the total number columns containing NTS
and store this in the column related
. We’ll use this value in a later analysis.q17 %>% filter(correct < mean(correct) + 2 * sd(correct))
calculates the mean and standard deviation for correct
, and then filters any subjects whose value for correct
was less than two standard deviations below the mean.Now we can use exclude()
to apply the exclusion criteria to our data.
`summarise()` ungrouping output (override with `.groups` argument)
Explanation of commands:
We pipe nc
into exclude()
, which applies our exclusion criteria. As a side-effect, it also calculates the correct
and related
scores for each subject.
Looking at nc_by_subj
, we can see that subject 18
was excluded from the noun comprehension task. The related
scores are all 0 because this column is only relevant for the word production tasks.
subj | correct | related |
---|---|---|
1 | 12 | 0 |
2 | 18 | 0 |
3 | 18 | 0 |
4 | 17 | 0 |
5 | 17 | 0 |
6 | 18 | 0 |
7 | 17 | 0 |
8 | 18 | 0 |
9 | 20 | 0 |
10 | 7 | 0 |
11 | 19 | 0 |
12 | 19 | 0 |
13 | 19 | 0 |
14 | 17 | 0 |
15 | 20 | 0 |
16 | 17 | 0 |
17 | 19 | 0 |
19 | 16 | 0 |
We repeat these steps to apply exclusion critera for the noun production, predicate comprehension and predicate production tasks.
np <- np %>%
set_names(~ str_to_lower(.)) %>%
mutate(subj = factor(subj), cards = factor(cards)) %>%
select(subj, cards, beach:gloves)
np_by_subj <- np %>% exclude() %>% select(subj, correct, related)
`summarise()` ungrouping output (override with `.groups` argument)
pc <- pc %>%
set_names(~ str_to_lower(.)) %>%
mutate(subj = factor(subj), cards = factor(cards)) %>%
select(subj, cards, big:pulling)
pc_by_subj <- pc %>% exclude() %>% select(subj, correct, related)
`summarise()` ungrouping output (override with `.groups` argument)
pp <- pp %>%
set_names(~ str_to_lower(.)) %>%
mutate(subj = factor(subj), cards = factor(cards)) %>%
select(subj, cards, small:pushing)
pp_by_subj <- pp %>% exclude() %>% select(subj, correct, related)
`summarise()` ungrouping output (override with `.groups` argument)
Explanation of commands:
This code is the same as for the noun comprehension task, apart the range of columns we select()
for each task. For noun production, the scores are in beach:gloves
, for predication comprehension they’re in big:pulling
, and for predication production they’re in small:pushing
.
Having preprocessed the demographics data and applied our exclusions to the WinG data, we now join these data frames together in preparation for calculating our descriptive statistics.
Explanation of commands:
We start by creating a data frame containing subj
and cards
for all participants, and joining it with the demographics
data frame. We use nc
to create the cards data, as this was a data frame before we applied any exclusion critera. If we didn’t do this, in the next stage we would end up with NA
values in cards
for excluded participants.
Here’s task_by_subj
:
subj | cards | gender | cdi_u | cdi_s |
---|---|---|---|---|
1 | english | female | 62 | 38 |
2 | italian | male | 60 | 59 |
3 | english | female | 97 | 85 |
4 | italian | male | 82 | 45 |
5 | english | female | 66 | 66 |
6 | italian | male | 47 | 32 |
7 | english | male | 39 | 27 |
8 | italian | female | 35 | 31 |
9 | english | male | 22 | 39 |
10 | italian | male | 34 | 10 |
11 | english | female | 49 | 28 |
12 | italian | female | 98 | 85 |
13 | english | female | 50 | 36 |
14 | italian | female | 62 | 56 |
15 | english | female | 81 | 60 |
16 | italian | male | 83 | 59 |
17 | english | female | 87 | 88 |
18 | italian | female | 49 | 19 |
19 | italian | female | 63 | 63 |
We join task_by_subj
to the other WinG task data frames:
task_by_subj <- task_by_subj %>%
full_join(., nc_by_subj, by='subj') %>%
full_join(., np_by_subj, by='subj', suffix = c('_nc','_np')) %>%
full_join(., pc_by_subj, by='subj') %>%
full_join(., pp_by_subj, by='subj', suffix = c('_pc', '_pp')) %>%
mutate(nc = correct_nc, np = correct_np, pc = correct_pc, pp = correct_pp) %>%
select(subj, gender, cards, nc, np, pc, pp, cdi_u, cdi_s,
related_nc, related_np, related_pc, related_pp)
Explanation of commands:
Each full_join()
, joins an additional data frame by matching values in subj
. The suffix = c('_nc','_np')
argument adds suffixes to disambiguate columns with the same name, in this case correct
and related
. We use mutate()
with select()
to remove the correct_
prefix from the ‘correct’ columns. We leave the related_
prefix on the ‘syntactically related’ columns.
Our data is now fully preprocessed. Notice how the join sets cells to the value NA
for subjects who were excluded for that particular task.
subj | gender | cards | nc | np | pc | pp | cdi_u | cdi_s | related_nc | related_np | related_pc | related_pp |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | female | english | 12 | 4 | NA | NA | 62 | 38 | 0 | 5 | NA | NA |
2 | male | italian | 18 | 12 | 17 | 9 | 60 | 59 | 0 | 2 | 0 | 3 |
3 | female | english | 18 | 13 | 17 | 9 | 97 | 85 | 0 | 3 | 0 | 0 |
4 | male | italian | 17 | 11 | 15 | 12 | 82 | 45 | 0 | 4 | 0 | 2 |
5 | female | english | 17 | 15 | 15 | 10 | 66 | 66 | 0 | 2 | 0 | 0 |
6 | male | italian | 18 | 11 | 15 | 7 | 47 | 32 | 0 | 2 | 0 | 1 |
7 | male | english | 17 | 10 | 13 | 9 | 39 | 27 | 0 | 7 | 0 | 3 |
8 | female | italian | 18 | 14 | 19 | 11 | 35 | 31 | 0 | 2 | 0 | 3 |
9 | male | english | 20 | 14 | 16 | 9 | 22 | 39 | 0 | 2 | 0 | 6 |
10 | male | italian | 7 | 2 | NA | NA | 34 | 10 | 0 | 1 | NA | NA |
11 | female | english | 19 | 12 | 14 | 8 | 49 | 28 | 0 | 4 | 0 | 4 |
12 | female | italian | 19 | 14 | 16 | 6 | 98 | 85 | 0 | 2 | 0 | 5 |
13 | female | english | 19 | 10 | 16 | 7 | 50 | 36 | 0 | 5 | 0 | 3 |
14 | female | italian | 17 | 11 | 17 | 5 | 62 | 56 | 0 | 3 | 0 | 2 |
15 | female | english | 20 | 16 | 17 | 10 | 81 | 60 | 0 | 3 | 0 | 1 |
16 | male | italian | 17 | 13 | 13 | 5 | 83 | 59 | 0 | 3 | 0 | 2 |
17 | female | english | 19 | 13 | 13 | 8 | 87 | 88 | 0 | 3 | 0 | 1 |
18 | female | italian | NA | NA | NA | NA | 49 | 19 | NA | NA | NA | NA |
19 | female | italian | 16 | 11 | 18 | 10 | 63 | 63 | 0 | 4 | 0 | 1 |
For our first analysis, we want to check that there were no differences in language ability between the children assigned to the Italian and English card groups. We do this using between-subjects t-tests of their parents’ CDI ratings. Bayesian t-tests were introduced in the Evidence worksheet.
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
************
Welcome to BayesFactor 0.9.12-4.2. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
Type BFManual() to open the manual.
************
Warning: data coerced from tibble to data frame
Bayes factor analysis
--------------
[1] Alt., r=0.707 : 0.404659 ±0%
Against denominator:
Null, mu1-mu2 = 0
---
Bayes factor type: BFindepSample, JZS
Warning: data coerced from tibble to data frame
Bayes factor analysis
--------------
[1] Alt., r=0.707 : 0.4518936 ±0%
Against denominator:
Null, mu1-mu2 = 0
---
Bayes factor type: BFindepSample, JZS
Explanation of commands:
First we load the BayesFactor
package. Next, we run a t-test which compares CDI ‘understands’ (cdi_u
) for the two card sets. We another t-test which compares CDI ‘says’ (cdi_s
) for the two card sets.
Explanation of output:
Here, we’re hoping to find evidence for the null hypothesis i.e. no differences in the means for the two groups. Our Bayes factors (CDI understands = 0.4; CDI says = 0.45) indicate that it’s 2-2.5 times more likely that there’s no difference, than there is a difference in language ability between the children tested using the Italian and English cards.
We would also like to check that there were no gender differences on any of the tests. We’ll create a table of descriptive statistics with four rows corresponding to the WinG task, and columns containing acurracy means and standard deviations, for male and female children. This table is described in detail in the Better tables worksheet. It’s convenient to see differences between groups next to the descriptive statistics. We’ll do this by adding a column to our table. Each cell will contain the result of a t-test comparing male and female accurcy, for the WinG task in the table row.
We start by calculating means and standard deviations for each WinG task by gender.
# table of descriptives and t-tests for WinG by gender
task_by_subj_l <- task_by_subj %>%
pivot_longer(cols = c(nc, np, pc, pp),
names_to = 'task',
values_to = 'correct')
descript <- task_by_subj_l %>%
group_by(task, gender) %>%
summarise(mean = mean(correct, na.rm = TRUE), sd = sd(correct, na.rm = TRUE))
`summarise()` regrouping output by 'task' (override with `.groups` argument)
Explanation of commands:
We convert task_by_subj
to long format by using pivot_longer()
on the correct score columns nc
, np
, pc
and pp
. We use this to calculate a mean and standard deviation by gender within task. When calculating these summary statistics, we need the argument na.rm = TRUE
to ignore missing data for participants who were excluded from any of the tests. We round the results to 2 decimal places. We use pivot_wider()
so that descriptives
has a row for each task, and columns containing the mean and standard deviation by gender.
Our table of descriptive statistics now looks like this:
task | gender | mean | sd |
---|---|---|---|
nc | male | 16.29 | 4.231 |
nc | female | 17.64 | 2.203 |
np | male | 10.43 | 3.952 |
np | female | 12.09 | 3.239 |
pc | male | 14.83 | 1.602 |
pc | female | 16.2 | 1.814 |
pp | male | 8.5 | 2.345 |
pp | female | 8.4 | 1.955 |
Next we do some preparation for displaying our table in APA format.
descript_table <- descript %>%
pivot_wider(names_from = gender, values_from = c(mean, sd))
descript_table <- descript_table %>% select(task, mean_female, sd_female, mean_male, sd_male)
descript_table %>% pander()
task | mean_female | sd_female | mean_male | sd_male |
---|---|---|---|---|
nc | 17.64 | 2.203 | 16.29 | 4.231 |
np | 12.09 | 3.239 | 10.43 | 3.952 |
pc | 16.2 | 1.814 | 14.83 | 1.602 |
pp | 8.4 | 1.955 | 8.5 | 2.345 |
Explanation of commands:
We widen the table, using the pivot_wider
command we used previously in the within-subject differences worksheet. We use select
to order the columns so that the means are next to their associated standard deviations.
Now we run some Bayesian t-tests to test for gender differences on each of the tasks. Between subjects t-tests were covered in the Evidence worksheet.
# t-test for accuracy by gender
t_gender <- function(df, group) {
df <- drop_na(df) # remove data for excluded subjects
bf <- ttestBF(formula=correct ~ gender, data = data.frame(df))
extractBF(bf)
}
Explanation of commands:
We’ll be doing the same t-test four times, once for the data from each task. Like the code for applying exclusion criteria that you saw earlier, this makes it a good candidate for a function. Line 1 begins the definition of the t_gender()
function. The function will run the t-test using the data it assigns to the variable df
. We won’t use the group
argument, but we’ll explain why it’s there when we call the function. Line 2 uses drop_na()
to remove rows for participants who were excluded for the task in df
. Recall that we set these values to NA
. Line 3 uses ttestBF()
to run a Bayesian t-test which compares task accuracy (correct
) for each gender
. Line 4 returns the Bayes factor.
Now we can use the function to run the four t-tests.
gender_bf <- task_by_subj_l %>%
group_by(task) %>%
group_modify(t_gender) %>%
select(task, bf)
gender_bf %>% pander()
task | bf |
---|---|
nc | 0.4897 |
np | 0.6805 |
pc | 0.9101 |
pp | 0.4377 |
Line 3 calls t_gender()
for each of the groups created by group_by(task)
in line 2. group_modify()
calls a function for each group of a grouped data frame. It always passes the function two values. The first is a data frame containing the subset of rows for the group. The second is the name of the group. The function must accept two arguments, which explains why we defined group
in t_gender()
, even though we didn’t use it. The data supplied by group_modify()
is replaced with the data frame returned by the function it calls. The column which defines the groups is preserved. So here, we’ve use t_gender()
, to replace the data for each test, with a data frame containing the Bayes factor for the associated t-test.
Finally, we join this to our descriptives
data frame, using the task
values to match up the rows.
task | mean_female | sd_female | mean_male | sd_male | bf |
---|---|---|---|---|---|
nc | 17.64 | 2.203 | 16.29 | 4.231 | 0.4897 |
np | 12.09 | 3.239 | 10.43 | 3.952 | 0.6805 |
pc | 16.2 | 1.814 | 14.83 | 1.602 | 0.9101 |
pp | 8.4 | 1.955 | 8.5 | 2.345 | 0.4377 |
We can now tidy the table up ready for printing.
task_names <- c(
nc = 'Noun Comprehension',
np = 'Noun Production',
pc = 'Predicate Comprehension',
pp = 'Predicate Production'
)
descript$task <- descript$task %>% recode(!!!task_names)
colnames(descript_table) <-
c("Task", "Female (M)", "Female (SD)", "Male (M)", "Male (SD)", "BF")
Explanation of commands:
Lines 1-7 use the approach for renaming factor levels that was introduced in the cleaning up questionnaire data worksheet. The last 2 lines give the columns more meaningful headings.
Now we can print our finished table.
Task | Female (M) | Female (SD) | Male (M) | Male (SD) | BF |
---|---|---|---|---|---|
nc | 17.64 | 2.20 | 16.29 | 4.23 | 0.49 |
np | 12.09 | 3.24 | 10.43 | 3.95 | 0.68 |
pc | 16.20 | 1.81 | 14.83 | 1.60 | 0.91 |
pp | 8.40 | 1.96 | 8.50 | 2.35 | 0.44 |
Explanation of commands:
We load the kableExtra
package and pipe our data into kable()
. The digits=2
part ensures that every number is reported to two decimal places. kable_styling()
prints the table to the Viewer window in RStudio. This table could be included in a reporty by copy-pasting into a word processor, and then styled according to APA guidelines.
Explanation of output:
The Bayes factors mean that it’s about twice as likely that there are no gender differences than that there are, for noun comprehension and predicate production. For noun production, it’s about 1.5 times as like that there isn’t a gender difference, than that there is. For predicate comprehension, the evidence for or against a gender difference is about equal.
We would also like to know if parents’ ratings of their child’s ability using the CDI was related to their child’s performance on the WinG tasks. We start by looking at correlations between CDI ‘understands’ comprehension scores on the noun and predicate tests.
# correlations between CDI and WinG
cdi_u_nc_cor <- cor(method = 'spearman', task_by_subj$cdi_u,
task_by_subj$nc, use="complete.obs")
cdi_u_nc_cor
[1] 0.06225014
Ignored 1 rows containing missing observations.
Bayes factor analysis
--------------
[1] Alt., r=0.333 : 0.6982793 ±0%
Against denominator:
Null, rho = 0
---
Bayes factor type: BFcorrelation, Jeffreys-beta*
cdi_u_pc_cor <- cor(method = 'spearman', task_by_subj$cdi_u,
task_by_subj$pc, use="complete.obs")
cdi_u_pc_cor
[1] -0.100163
Ignored 3 rows containing missing observations.
Bayes factor analysis
--------------
[1] Alt., r=0.333 : 0.5543404 ±0%
Against denominator:
Null, rho = 0
---
Bayes factor type: BFcorrelation, Jeffreys-beta*
Explanation of commands:
We use a Spearman correlation to test the relationship between task_by_subj$cdi_u
and task_by_subj$nc
(noun comprehension). This was introduced in the More on relationships, part 2 worksheet. The option use="complete.obs"
deletes any cases with a value of NA
. (Remember that we assigned the value NA
to cells for subjects who were excluded from a test.) We also calculate a Bayes Factor to test the reliability of the relationship. We use similar commands to test the relationship between cdi_u
and predicate comprehension (pc
).
Explanation of output:
There doesn’t appear to be a relationship between parents’ CDI ‘understands’ scores and children’s noun comprehension (rs = 0.06), although there is weak evidence against this conclusion (BF = 0.7). There seems to be a weak negative correlation between parents’ CDI ‘understands’ scores and children’s predicate comprehension (rs = -0.1), however, the Bayes factor of 0.55 means it’s almost twice as likely that there isn’t a relationship between these two variables, than that there is.
Now we look at correlations between parents’ ratings of their child’s ability to say words (CDI says), and the childrens’ noun and predicate production scores.
cdi_s_np_cor <- cor(method = 'spearman', task_by_subj$cdi_s,
task_by_subj$np, use="complete.obs")
cdi_s_np_cor
[1] 0.5673528
Ignored 1 rows containing missing observations.
Bayes factor analysis
--------------
[1] Alt., r=0.333 : 5.094892 ±0%
Against denominator:
Null, rho = 0
---
Bayes factor type: BFcorrelation, Jeffreys-beta*
cdi_s_pp_cor <- cor(method = 'spearman', task_by_subj$cdi_s,
task_by_subj$pp, use="complete.obs")
cdi_s_pp_cor
[1] -0.02759308
Ignored 3 rows containing missing observations.
Bayes factor analysis
--------------
[1] Alt., r=0.333 : 0.5874633 ±0%
Against denominator:
Null, rho = 0
---
Bayes factor type: BFcorrelation, Jeffreys-beta*
Explanation of commands:
These calculations are identical to those above, except we are looking at the evidence for relationships between cdi_u
and task_by_subj$np
(noun production), and task_by_subj$pp
(predicate production).
Explanation of output:
There is evidence of a strong, positive correlation between parents’ CDI ‘says’ scores and children’s noun production (rs = 0.57, BF = 5.09). There doesn’t appear to be a relationship between CDI ‘says’ and children’s predicate production (rs = -0.03, BF = 0.59).
In summary, noun production was the only task in which parents’ ratings of their childrens’ ability matched the childrens’ accuracy.
We’re now ready to test our main hypothesis, which predicts that there will be a difference WinG task scores, depending on which set of cards the children were tested with. We’ll start by creating plots to show the distribution of scores for the two card sets on the WinG tasks.
# plot WinG accuracy by card set
task_by_subj_l$task <- task_by_subj_l$task %>% recode(!!!task_names)
source('themeapa.R')
library(see)
task_by_subj_l %>% ggplot(aes(x = task, y = correct, fill = cards)) +
geom_violinhalf(position = position_identity(), alpha=0.7, size=0) +
scale_fill_grey() +
theme_APA +
theme(axis.text = element_text(size = 10)) +
xlab('WinG Task') + ylab('Accuracy (max = 20)')
Warning: Removed 8 rows containing non-finite values (stat_ydensity).
Explanation of commands:
Line 2 recodes the task
factor levels, to make them more meaningful on the plot’s x axis. Line 3 loads the see
package which provides the half_violin()
function. Line 4 defines the x axis of our plot to be the WinG tasks, the y axis to be task accuracy (1-20), and to use the cards
factor for the fill colour. Line 5 creates a “half violin” plot. As the name suggests, this shows one half of a violin plot. position = position_identity()
plots the two distributions on top of each other, making it easy to see how much they overlap. alpha=0.7
changes to transparency, again to help us see the overlapping area. size=0
removes the outline around the distributions. Line 6 uses a grey palette for filling in the two distributions. Line 7 gives our axes meaningful labels.
Explanation of output:
This plot gives a visual indication of whether there were differences between the Italian and English cards on each of the tests. Given the extensive overlap in scores between the card sets, this seems unlikely. The plot also shows that the data were slightly skewed on the noun tasks due to some low scores.
Although the plot suggests that the predicate data was normally distributed, given the small sample sizes, a non-parametric tests was considered most appropriate for analysing this data. We’ll use Mann-Whitney U tests to compare each task score for the Italian and English cards. The Mann-Whitney U test is explained in more detail in the Traditional non-parametric tests worksheet.
We start by creating some summary statistics.
# compare WinG accuracy by card set
task_by_subj_rank <- task_by_subj_l %>%
group_by(task) %>%
mutate(rank = rank(correct))
task_summary <- task_by_subj_rank %>%
group_by(task, cards) %>%
drop_na('correct') %>%
summarise(n = n(),
median = median(correct),
mean_rank = mean(rank),
sum_rank = sum(rank))
`summarise()` regrouping output by 'task' (override with `.groups` argument)
# A tibble: 8 x 6
# Groups: task [4]
task cards n median mean_rank sum_rank
<fct> <fct> <int> <dbl> <dbl> <dbl>
1 Noun Comprehension english 9 19 11.4 103
2 Noun Comprehension italian 9 17 7.56 68
3 Noun Production english 9 13 10.3 92.5
4 Noun Production italian 9 11 8.72 78.5
5 Predicate Comprehension english 8 15.5 7.12 57
6 Predicate Comprehension italian 8 16.5 9.88 79
7 Predicate Production english 8 9 9 72
8 Predicate Production italian 8 8 8 64
Explanation of commands:
Lines 1-3 rank the accuracy scores for each of the four tests. Line 6 groups the data by cards
within task
. Line 7 removes any rows where correct
contains the value NA
. These were the children we excluded earlier. Lines 8-11 calculate, for each group, the number of children, the median, and the mean and sum of the ranked items.
Now we run Mann-Whitney tests to make a direct comparison between the Italian and English cards for each WinG task.
# independent 2-group Mann-Whitney U Test
mann_whitney <- function(df, group) {
df <- df %>% drop_na('correct') # remove excluded subjects
wilcox.test(correct ~ cards, df) %>%
with(data.frame(U = statistic, p = round(p.value, 2)))
}
wing_mann_whitney <- task_by_subj_l %>%
group_by(task) %>%
group_modify(mann_whitney)
wing_mann_whitney
# A tibble: 4 x 3
# Groups: task [4]
task U p
<fct> <dbl> <dbl>
1 Noun Comprehension 58 0.13
2 Noun Production 47.5 0.56
3 Predicate Comprehension 21 0.26
4 Predicate Production 36 0.71
Explanation of commands:
The Mann-Whitney U test is a non-parametric equivalent of a t-test. It’s explained in detail in the Traditional non-parametric tests worksheet. Lines 2-6 define a function called mann_whitney
to run a Mann-Whitney test comparing the correct
column against the factor cards
. The function returns the values for U
and p
as a data frame.
Lines 8-10 group our data by task
, then use group_modify(mann_whitney)
to replace the data for each group with the Mann-Whitney test results. Line 11 displays the results.
Explanation of output:
As this is a traditional statistical test, the p-value indicates whether there was a significant difference between the cards on any of the tasks. All p values were > 0.05, suggesting there were no differences, contrary to our hypothesis. (Note that we’ve removed the cannot compute exact p-value with ties
warnings to make the output easier to read.)
In summary, these data suggest that WinG scores should be comparable, regardless of whether the English or Italian picture cards are used. It also shows that adults’ ratings of how the pictures map to the underlying concepts are quite different to children’s.
This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.