It can be helpful to present data in tables, rather than text,
especially when you need to refer to the same data in different parts of
a report. Although tables can be produced manually using a word
processor, generating them directly from your data ensures they are
up-to-date, and reduces copy-paste errors. This worksheet explains how
to use R
to produce some of the types of table used to
report psychological research.
To prepare for this worksheet:
Open the rminr-data
project we used previously.
If you don’t see a folder named going-further
, it
means you created your project before the data required for
this worksheet was added to the rminr-data
git repository.
You can get the latest files by asking git to “pull
” the
repository. Select the Git
tab, which is located in the row
of tabs which includes the Environment
tab. Click the
Pull
button with a downward pointing arrow. A window will
open showing the files which have been pulled from the repository. Close
the Git pull
window.
Open the Files
tab. The going-further
folder should contain the file
picture-naming-preproc.csv
.
Create a script named tables.R
in the
rminr-data
folder (the folder above
going-further
). Add the comments and code to this script as
you work through each section of the worksheet.
We’ll start by producing a correlation matrix. A correlation matrix shows correlations between all combinations of a set of variables, which is often required in research reports. We’ll demonstrate an easy way to produce correlation matrices, with APA styling, in a format that can be read by Microsoft Word or LibreOffice Writer. A similar approach can be used to produce other common table types.
We’ll generate a correlation matrix using the attitude
dataset, which is included with R
. These data are the
percentage of favourable attitudes given by employees, in relation to
seven questions regarding their department (you can find out a bit more
about these data by typing ?attitude
). Here are the first
few rows of the data frame:
rating | complaints | privileges | learning | raises | critical | advance |
---|---|---|---|---|---|---|
43 | 51 | 30 | 39 | 61 | 92 | 45 |
63 | 64 | 51 | 54 | 63 | 73 | 47 |
71 | 70 | 68 | 69 | 76 | 86 | 48 |
61 | 63 | 45 | 47 | 54 | 84 | 35 |
81 | 78 | 56 | 66 | 71 | 83 | 47 |
We’ll use the apaTables
package to generate the
correlation matrix.
Enter these comments and commands into your script, and run them:
# Better tables
# Clear the environment
rm(list = ls())
# Load 'apaTables' package
library(apaTables)
# Create an APA correlation matrix from the 'attitude' dataset, into file 'table1.doc'
apa.cor.table(attitude, filename = "table1.doc", table.number = 1)
Table 1
Means, standard deviations, and correlations with confidence intervals
Variable M SD 1 2 3 4 5 6
1. rating 64.63 12.17
2. complaints 66.60 13.31 .83**
[.66, .91]
3. privileges 53.13 12.24 .43* .56**
[.08, .68] [.25, .76]
4. learning 56.37 11.74 .62** .60** .49**
[.34, .80] [.30, .79] [.16, .72]
5. raises 64.63 10.40 .59** .67** .45* .64**
[.29, .78] [.41, .83] [.10, .69] [.36, .81]
6. critical 74.77 9.89 .16 .19 .15 .12 .38*
[-.22, .49] [-.19, .51] [-.22, .48] [-.25, .46] [.02, .65]
7. advance 42.93 10.29 .16 .22 .34 .53** .57** .28
[-.22, .49] [-.15, .54] [-.02, .63] [.21, .75] [.27, .77] [-.09, .58]
Note. M and SD are used to represent mean and standard deviation, respectively.
Values in square brackets indicate the 95% confidence interval.
The confidence interval is a plausible range of population correlations
that could have caused the sample correlation (Cumming, 2014).
* indicates p < .05. ** indicates p < .01.
Explanation of commands:
We load the apaTables
package.The function to generate a
correlation matrix is apa.cor.table()
. We pass the
attitude
data frame as the first argument, and use
filename
to specify that the output should be saved in the
file table1.doc
. The table.number
argument
sets the number in the table heading output, in this case “Table 1”. If
you omit this argument, the text will be “Table XX”.
Explanation of output:
Export
table1.doc
from RStudio and open it using a word
processor
The first thing to notice is that the styling (spacing, use of italics, horizontal lines, positioning of captions and footnotes etc.) complies with the APA guidelines for tables.
The table number and caption is above the table itself - you will need to edit the caption by hand to make it more meaningful, for example “Means, standard deviations, and correlations with confidence intervals, for the attitude measures of Study 1”.
The Variable
column contains a number and the column
name for the seven attitude variables. The next two columns show the
mean and standard deviation for each variable. The remaining columns use
the numbers from items in the Variable
column as headings,
indicating that they refer to the same variable. The cells show the
correlation between the column variables and each of the variables in
the rows. Cells are left empty where a variable would otherwise be
correlated with itself. The 95% confidence
interval for the correlation is shown in square brackets.
For example, the correlation between rating
and
complaints
in this sample is .83. The confidence interval
indicates that the population value is likely to be between .66 and
.91.
Evidence for the correlation is calculated using traditional
statistics, rather than the Bayes factors described in the Relationships, part 2 worksheet. One asterisk
(*
) indicates p < .05
. Two asterisks
(**
) signify p < .01
. These calculations
assumed a two-tailed test; one-tailed tests for correlations are
explained in the More on relationships,
part 2 worksheet. Also recall that p-values are widely misinterpreted, so it would be
better to edit this part of the table by hand to reflect Bayes Factors you have already calculated.
We suggest using *
for BF > 3, **
for BF
> 10, o
for BF < 0.33, and oo
for BF
< 0.1. Change the text at the bottom of the table accordingly.
For this exercise, we’ll load some data from a study which measured aspects of participants’ personality.
Enter these comments and commands into your script, and run them:
# Exercise 1
# Load tidyverse
library(tidyverse)
#Load data into 'big5'
big5 <- read_csv('case-studies/jon-may/big5_total.csv')
The first few rows show that the scale used measured the ‘big 5’ personality factors; openness to experience, conscientiousness, extroversion, agreeableness and neuroticism (OCEAN).
subj | openness | conscientiousness | extraversion | agreeableness | neuroticism |
---|---|---|---|---|---|
1 | 29 | 28 | 14 | 36 | 20 |
2 | 22 | 22 | 28 | 28 | 26 |
3 | 33 | 33 | 21 | 37 | 25 |
4 | 17 | 34 | 14 | 39 | 13 |
5 | 27 | 27 | 30 | 40 | 25 |
Create a correlation matrix for the five personality factors. Number
the table as “Table 2”, and save the results in table2.doc
.
Your table should look like this in Rstudio:
Table 2
Means, standard deviations, and correlations with confidence intervals
Variable M SD 1 2 3 4
1. openness 23.15 6.78
2. conscientiousness 25.10 7.23 .15
[-.14, .42]
3. extraversion 21.50 7.86 .27 -.01
[-.02, .51] [-.29, .28]
4. agreeableness 33.54 4.55 .27 .20 .43**
[-.01, .52] [-.09, .46] [.17, .64]
5. neuroticism 16.00 7.41 .34* .28 .13 .07
[.06, .57] [-.00, .52] [-.16, .40] [-.22, .34]
Note. M and SD are used to represent mean and standard deviation, respectively.
Values in square brackets indicate the 95% confidence interval.
The confidence interval is a plausible range of population correlations
that could have caused the sample correlation (Cumming, 2014).
* indicates p < .05. ** indicates p < .01.
…and it should be APA formatted in the file
table2.doc
.
Copy the R code you used for this exercise, including the comments, into PsycEL
As with graphs, there is often an element of design involved in
presenting tabular data in a format most useful for your reader.
Packages like apaTables
are useful for producing APA tables
where there is a standard way to present data. However, you often need a
table which is customised to present your data in the most useful
format. The cost of custom tables is that the content requires a little
more preprocessing, and styling the table according to APA standards
will require some hand-formatting in your word processor.
We’ll demonstrate this process by producing a table of descriptive statistics. The data we’ll use comes from an experiment which evaluated children’s language development using the Words in Game (WinG) test. WinG consists of a set of picture cards which are used in four tests: noun comprehension, noun production, predicate comprehension, and predicate production. The Italian and English versions of the WinG cards use different pictures to depict the associated words. The experiment tested whether English-speaking children aged approximately 30 months, produce similar responses for the two sets of cards. We would like to produce a single table, containing descriptive statistics for all four tests.
We start by loading the data; enter this comment and command into your script, and run it:
# Load data into 'wing_preproc'
wing_preproc <- read_csv('going-further/picture-naming-preproc.csv')
The first few rows of wing_preproc
look like this:
subj | gender | cards | nc | np | pc | pp | cdi_u | cdi_s | related_nc | related_np | related_pc | related_pp |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | female | english | 12 | 4 | NA | NA | 62 | 38 | 0 | 5 | NA | NA |
2 | male | italian | 18 | 12 | 17 | 9 | 60 | 59 | 0 | 2 | 0 | 3 |
3 | female | english | 18 | 13 | 17 | 9 | 97 | 85 | 0 | 3 | 0 | 0 |
4 | male | italian | 17 | 11 | 15 | 12 | 82 | 45 | 0 | 4 | 0 | 2 |
5 | female | english | 17 | 15 | 15 | 10 | 66 | 66 | 0 | 2 | 0 | 0 |
6 | male | italian | 18 | 11 | 15 | 7 | 47 | 32 | 0 | 2 | 0 | 1 |
Our test scores are currently in wide format (lots of columns, few rows), but R generally requires data to be in long format (lots of rows, few columns). This means we first have to make the data frame wider, so we can calculate summary statistics.
Enter this comment and these commands into your script, and run them:
# Convert from wide to long format; select relevant columns; record in 'task_by_subj'
task_by_subj <- wing_preproc %>%
pivot_longer(cols = c(nc, np, pc, pp),
names_to = 'task',
values_to = 'correct') %>%
select(subj, gender, cards, task, correct)
Explanation of command:
In the Within-subject differences
worksheet, you learned how to use pivot_wider()
to
widen long data frames. The pivot_longer()
command does the
reverse – it lengthens wide data frames.
cols = c(nc, np, pc, pp)
selects the columns we want to
pivot. Each value in these columns is added to a row in a new column
called correct
(values_to = 'correct'
). In the
same row, a new column task
is set to the name of the
column which the value came from (names_to = 'task'
). All
of the values in the other columns are duplicated for each row. We select just the columns we want for our
table of descriptive statistics.
The first few rows of task_by_subj
look like this:
subj | gender | cards | task | correct |
---|---|---|---|---|
1 | female | english | nc | 12 |
1 | female | english | np | 4 |
1 | female | english | pc | NA |
1 | female | english | pp | NA |
2 | male | italian | nc | 18 |
Now we can calculate some summary statistics, using commands that we’ve already used in previous worksheets.
Enter this comment and these commands into your script, and run them:
# Create table of descriptive statistics
descript <- task_by_subj %>%
group_by(task, gender) %>%
summarise(mean = mean(correct, na.rm = TRUE), sd = sd(correct, na.rm = TRUE))
Explanation of commands:
We’ve come across group_by
before, here we use it to group
the data by two variables at the same time, task
and
gender
, giving us eight groups overall.
We’ve also come across summarize
before, including the use of
na.rm = TRUE
to deal with missing data.
Our data now looks like this:
task | gender | mean | sd |
---|---|---|---|
nc | female | 17.64 | 2.203 |
nc | male | 16.29 | 4.231 |
np | female | 12.09 | 3.239 |
np | male | 10.43 | 3.952 |
pc | female | 16.2 | 1.814 |
pc | male | 14.83 | 1.602 |
pp | female | 8.4 | 1.955 |
pp | male | 8.5 | 2.345 |
The descript
data frame contains just the numbers we
want to include in our report - the means and standard deviations for
each of the eight groups. However, the row labels (np
,
etc.) are not particularly clear, so we replace them with something more
human readable.
Enter these comments and commands into your script, and run it:
# Define task names, for each task code
task_names <- c(
nc = 'Noun Comprehension',
np = 'Noun Production',
pc = 'Predicate Comprehension',
pp = 'Predicate Production'
)
# Recode task codes into task names
descript$task <- descript$task %>% recode(!!!task_names)
Explanation of commands: We’re using the
recode
command that we’ve previously used in the cleaning
up questionnaire data worksheet:
We start by telling R what each of the codes, nc
etc., mean. So, for example nc = 'Noun Comprehension'
. We
combine the four ‘translations’ together into task_names
using c()
(short for ‘concatenate’, i.e. put things
together).
We then take the task
columns of the
descript
data frame (descript$task
) and pipe
(%>%
) it to recode
, where it uses
task_names
to do the recoding. We write
(<-
) that result back into
descript$task
.
Our table now looks like this:
task | gender | mean | sd |
---|---|---|---|
Noun Comprehension | female | 17.64 | 2.203 |
Noun Comprehension | male | 16.29 | 4.231 |
Noun Production | female | 12.09 | 3.239 |
Noun Production | male | 10.43 | 3.952 |
Predicate Comprehension | female | 16.2 | 1.814 |
Predicate Comprehension | male | 14.83 | 1.602 |
Predicate Production | female | 8.4 | 1.955 |
Predicate Production | male | 8.5 | 2.345 |
Our table is now clear and easy to read. We could include it in a report without much further effort, and the reader would be able to easily see what we wanted to show them. However, it is not quite in the format that psychologists are most familiar with (which is APA format). In APA format, the table would look more like this:
Task | Female (M) | Female (SD) | Male (M) | Male (SD) |
---|---|---|---|---|
Noun Comprehension | 17.64 | 2.2 | 16.29 | 4.23 |
Noun Production | 12.09 | 3.24 | 10.43 | 3.95 |
Predicate Comprehension | 16.2 | 1.81 | 14.83 | 1.6 |
Predicate Production | 8.4 | 1.96 | 8.5 | 2.35 |
In other words, it would be wider: more columns and fewer rows.
We can widen the table, using the pivot_wider
command we
have previously used in the within-subject
differences worksheet.
Enter this comment and these commands into your script, and run them:
# Widen table
descript_table <- descript %>%
pivot_wider(names_from = gender, values_from = c(mean, sd))
Our table now has the same format as an APA table…
task | mean_female | mean_male | sd_female | sd_male |
---|---|---|---|---|
Noun Comprehension | 17.64 | 16.29 | 2.203 | 4.231 |
Noun Production | 12.09 | 10.43 | 3.239 | 3.952 |
Predicate Comprehension | 16.2 | 14.83 | 1.814 | 1.602 |
Predicate Production | 8.4 | 8.5 | 1.955 | 2.345 |
…but the columns are in a different order. APA format dictates that
means should be placed next to their associated standard deviations in a
table (APA format is weirdly specific). Fortunately, we can rearrange
columns using the select
command that we’ve come across
before.
Enter this comment and these commands into your script, and run it:
# Re-order columns
descript_table <- descript_table %>% select(task, mean_female, sd_female, mean_male, sd_male)
task | mean_female | sd_female | mean_male | sd_male |
---|---|---|---|---|
Noun Comprehension | 17.64 | 2.203 | 16.29 | 4.231 |
Noun Production | 12.09 | 3.239 | 10.43 | 3.952 |
Predicate Comprehension | 16.2 | 1.814 | 14.83 | 1.602 |
Predicate Production | 8.4 | 1.955 | 8.5 | 2.345 |
Finally, we can replace the column names with something a bit more
human readable, using the colnames
function.
Enter this comment and command into your script, and run it:
# Rename columns
colnames(descript_table) <- c("Task", "Female (M)", "Female (SD)", "Male (M)", "Male (SD)")
Task | Female (M) | Female (SD) | Male (M) | Male (SD) |
---|---|---|---|---|
Noun Comprehension | 17.64 | 2.203 | 16.29 | 4.231 |
Noun Production | 12.09 | 3.239 | 10.43 | 3.952 |
Predicate Comprehension | 16.2 | 1.814 | 14.83 | 1.602 |
Predicate Production | 8.4 | 1.955 | 8.5 | 2.345 |
Note that it would arguably be clearer to write “mean” rather than “M”, but it’s another quirk of APA style that we write “M” to stand for mean.
There are a number of different ways to get a table in R into your
word processor. We’re going to use the kableExtra
package,
because it’s really flexible, so it’s capable of producing almost any
table you might need. We’re only going to use it in the most basic way
here; for some other examples of what it can do, see the kableExtra website.
To get a version of descript_table
that you can
cut-and-paste into your word processor, enter these comments and
commands into your script, and run them:
# Load 'kableExtra' package
library(kableExtra)
# Output wordprocessor-friendly table
descript_table %>% kable(digits = 2) %>% kable_styling()
Explanation of commands:
library(kableExtra)
loads the kableExtra
package.kable()
. The
digits=2
part ensures that every number is reported to two
decimal places.kable()
into kable_styling()
.
This command prints the table to the Viewer window in RStudio.Explanation of output:
Try copying the table into your word processor now. In the
Viewer pane, select all of the rows and columns in the
table, then right-click and select Copy
. Open your word
processor and select Paste
. (For this to work on a Mac, you
will need be working with RStudio in Chrome rather than Safari.)
Starting with the data in task_by_subj
, generate a table
of descriptive statistics showing task accuracy for the Italian and
English cards. It should look like this:
Task | English (M) | English (SD) | Italian (M) | Italian (SD) |
---|---|---|---|---|
Noun Comprehension | 17.89 | 2.47 | 16.33 | 3.61 |
Noun Production | 11.89 | 3.59 | 11.00 | 3.61 |
Predicate Comprehension | 15.12 | 1.64 | 16.25 | 1.91 |
Predicate Production | 8.75 | 1.04 | 8.12 | 2.75 |
Copy the R code you used for this exercise, including the comments, into PsycEL.
You can avoid copy-pasting tables (and all other analyses) by writing
your reports using R Markdown
instead of a word processor.
R Markdown
is a language for writing documents which
include R
code. The code is run, and the output is included
in the document. R Markdown
can be used to produce
different types of document (e.g. reports, presentations, web pages), in
various formats (e.g. Microsoft Word, PDF, HTML). The
Research Methods in R
worksheets are written using
R Markdown
, and although we don’t teach it in these
materials, there are other courses which
make it easy to learn.
This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.