Better tables

Introduction
Getting started
Creating a correlation matrix
Exercise 1
Creating a custom table of descriptive statistics
APA-format tables
Copying to your wordprocessor
Exercise 2
R Markdown

Introduction

It can be helpful to present data in tables, rather than text, especially when you need to refer to the same data in different parts of a report. Although tables can be produced manually using a word processor, generating them directly from your data ensures they are up-to-date, and reduces copy-paste errors. This worksheet explains how to use R to produce some of the types of table used to report psychological research.

Getting started

To prepare for this worksheet:

Open the rminr-data project we used previously.
If you don’t see a folder named going-further, it means you created your project before the data required for this worksheet was added to the rminr-data git repository. You can get the latest files by asking git to “pull” the repository. Select the Git tab, which is located in the row of tabs which includes the Environment tab. Click the Pull button with a downward pointing arrow. A window will open showing the files which have been pulled from the repository. Close the Git pull window.
Open the Files tab. The going-further folder should contain the file picture-naming-preproc.csv.
Create a script named tables.R in the rminr-data folder (the folder above going-further). Add the comments and code to this script as you work through each section of the worksheet.

Creating a correlation matrix

We’ll start by producing a correlation matrix. A correlation matrix shows correlations between all combinations of a set of variables, which is often required in research reports. We’ll demonstrate an easy way to produce correlation matrices, with APA styling, in a format that can be read by Microsoft Word or LibreOffice Writer. A similar approach can be used to produce other common table types.

We’ll generate a correlation matrix using the attitude dataset, which is included with R. These data are the percentage of favourable attitudes given by employees, in relation to seven questions regarding their department (you can find out a bit more about these data by typing ?attitude). Here are the first few rows of the data frame:

rating	complaints	privileges	learning	raises	critical	advance
43	51	30	39	61	92	45
63	64	51	54	63	73	47
71	70	68	69	76	86	48
61	63	45	47	54	84	35
81	78	56	66	71	83	47

We’ll use the apaTables package to generate the correlation matrix.

Enter these comments and commands into your script, and run them:

# Better tables
# Clear the environment
rm(list = ls()) 
# Load 'apaTables' package
library(apaTables)
# Create an APA correlation matrix from the 'attitude' dataset, into file 'table1.doc'
apa.cor.table(attitude, filename = "table1.doc", table.number = 1)



Table 1 

Means, standard deviations, and correlations with confidence intervals
 

  Variable      M     SD    1           2           3           4           5          6          
  1. rating     64.63 12.17                                                                       
                                                                                                  
  2. complaints 66.60 13.31 .83**                                                                 
                            [.66, .91]                                                            
                                                                                                  
  3. privileges 53.13 12.24 .43*        .56**                                                     
                            [.08, .68]  [.25, .76]                                                
                                                                                                  
  4. learning   56.37 11.74 .62**       .60**       .49**                                         
                            [.34, .80]  [.30, .79]  [.16, .72]                                    
                                                                                                  
  5. raises     64.63 10.40 .59**       .67**       .45*        .64**                             
                            [.29, .78]  [.41, .83]  [.10, .69]  [.36, .81]                        
                                                                                                  
  6. critical   74.77 9.89  .16         .19         .15         .12         .38*                  
                            [-.22, .49] [-.19, .51] [-.22, .48] [-.25, .46] [.02, .65]            
                                                                                                  
  7. advance    42.93 10.29 .16         .22         .34         .53**       .57**      .28        
                            [-.22, .49] [-.15, .54] [-.02, .63] [.21, .75]  [.27, .77] [-.09, .58]
                                                                                                  

Note. M and SD are used to represent mean and standard deviation, respectively.
Values in square brackets indicate the 95% confidence interval.
The confidence interval is a plausible range of population correlations 
that could have caused the sample correlation (Cumming, 2014).
 * indicates p < .05. ** indicates p < .01.

Explanation of commands:

We load the apaTables package.The function to generate a correlation matrix is apa.cor.table(). We pass the attitude data frame as the first argument, and use filename to specify that the output should be saved in the file table1.doc. The table.number argument sets the number in the table heading output, in this case “Table 1”. If you omit this argument, the text will be “Table XX”.

Explanation of output:

Export table1.doc from RStudio and open it using a word processor

The first thing to notice is that the styling (spacing, use of italics, horizontal lines, positioning of captions and footnotes etc.) complies with the APA guidelines for tables.

The table number and caption is above the table itself - you will need to edit the caption by hand to make it more meaningful, for example “Means, standard deviations, and correlations with confidence intervals, for the attitude measures of Study 1”.

The Variable column contains a number and the column name for the seven attitude variables. The next two columns show the mean and standard deviation for each variable. The remaining columns use the numbers from items in the Variable column as headings, indicating that they refer to the same variable. The cells show the correlation between the column variables and each of the variables in the rows. Cells are left empty where a variable would otherwise be correlated with itself. The 95% confidence interval for the correlation is shown in square brackets.

For example, the correlation between rating and complaints in this sample is .83. The confidence interval indicates that the population value is likely to be between .66 and .91.

Evidence for the correlation is calculated using traditional statistics, rather than the Bayes factors described in the Relationships, part 2 worksheet. One asterisk (*) indicates p < .05. Two asterisks (**) signify p < .01. These calculations assumed a two-tailed test; one-tailed tests for correlations are explained in the More on relationships, part 2 worksheet. Also recall that p-values are widely misinterpreted, so it would be better to edit this part of the table by hand to reflect Bayes Factors you have already calculated. We suggest using * for BF > 3, ** for BF > 10, o for BF < 0.33, and oo for BF < 0.1. Change the text at the bottom of the table accordingly.

Exercise 1

For this exercise, we’ll load some data from a study which measured aspects of participants’ personality.

Enter these comments and commands into your script, and run them:

# Exercise 1
# Load tidyverse
library(tidyverse)
#Load data into 'big5'
big5 <- read_csv('case-studies/jon-may/big5_total.csv')

The first few rows show that the scale used measured the ‘big 5’ personality factors; openness to experience, conscientiousness, extroversion, agreeableness and neuroticism (OCEAN).

subj	openness	conscientiousness	extraversion	agreeableness	neuroticism
1	29	28	14	36	20
2	22	22	28	28	26
3	33	33	21	37	25
4	17	34	14	39	13
5	27	27	30	40	25

Create a correlation matrix for the five personality factors. Number the table as “Table 2”, and save the results in table2.doc. Your table should look like this in Rstudio:



Table 2 

Means, standard deviations, and correlations with confidence intervals
 

  Variable             M     SD   1           2           3           4          
  1. openness          23.15 6.78                                                
                                                                                 
  2. conscientiousness 25.10 7.23 .15                                            
                                  [-.14, .42]                                    
                                                                                 
  3. extraversion      21.50 7.86 .27         -.01                               
                                  [-.02, .51] [-.29, .28]                        
                                                                                 
  4. agreeableness     33.54 4.55 .27         .20         .43**                  
                                  [-.01, .52] [-.09, .46] [.17, .64]             
                                                                                 
  5. neuroticism       16.00 7.41 .34*        .28         .13         .07        
                                  [.06, .57]  [-.00, .52] [-.16, .40] [-.22, .34]
                                                                                 

Note. M and SD are used to represent mean and standard deviation, respectively.
Values in square brackets indicate the 95% confidence interval.
The confidence interval is a plausible range of population correlations 
that could have caused the sample correlation (Cumming, 2014).
 * indicates p < .05. ** indicates p < .01.

…and it should be APA formatted in the file table2.doc.

Copy the R code you used for this exercise, including the comments, into PsycEL

Creating a custom table of descriptive statistics

As with graphs, there is often an element of design involved in presenting tabular data in a format most useful for your reader. Packages like apaTables are useful for producing APA tables where there is a standard way to present data. However, you often need a table which is customised to present your data in the most useful format. The cost of custom tables is that the content requires a little more preprocessing, and styling the table according to APA standards will require some hand-formatting in your word processor.

We’ll demonstrate this process by producing a table of descriptive statistics. The data we’ll use comes from an experiment which evaluated children’s language development using the Words in Game (WinG) test. WinG consists of a set of picture cards which are used in four tests: noun comprehension, noun production, predicate comprehension, and predicate production. The Italian and English versions of the WinG cards use different pictures to depict the associated words. The experiment tested whether English-speaking children aged approximately 30 months, produce similar responses for the two sets of cards. We would like to produce a single table, containing descriptive statistics for all four tests.

We start by loading the data; enter this comment and command into your script, and run it:

# Load data into 'wing_preproc'
wing_preproc <- read_csv('going-further/picture-naming-preproc.csv')

The first few rows of wing_preproc look like this:

subj	gender	cards	nc	np	pc	pp	cdi_u	cdi_s	related_np	related_pc	related_pp
1	female	english	12	4	NA	NA	62	38	5	NA	NA
2	male	italian	18	12	17	9	60	59	2	0	3
3	female	english	18	13	17	9	97	85	3	0	0
4	male	italian	17	11	15	12	82	45	4	0	2
5	female	english	17	15	15	10	66	66	2	0	0
6	male	italian	18	11	15	7	47	32	2	0	1

Table of descriptives

Our test scores are currently in wide format (lots of columns, few rows), but R generally requires data to be in long format (lots of rows, few columns). This means we first have to make the data frame wider, so we can calculate summary statistics.

Enter this comment and these commands into your script, and run them:

# Convert from wide to long format; select relevant columns; record in 'task_by_subj'
task_by_subj <- wing_preproc %>%
  pivot_longer(cols = c(nc, np, pc, pp),
               names_to = 'task',
               values_to = 'correct') %>%
  select(subj, gender, cards, task, correct)

Explanation of command:

In the Within-subject differences worksheet, you learned how to use pivot_wider() to widen long data frames. The pivot_longer() command does the reverse – it lengthens wide data frames. cols = c(nc, np, pc, pp) selects the columns we want to pivot. Each value in these columns is added to a row in a new column called correct (values_to = 'correct'). In the same row, a new column task is set to the name of the column which the value came from (names_to = 'task'). All of the values in the other columns are duplicated for each row. We select just the columns we want for our table of descriptive statistics.

The first few rows of task_by_subj look like this:

subj	gender	cards	task	correct
1	female	english	nc	12
1	female	english	np	4
1	female	english	pc	NA
1	female	english	pp	NA
2	male	italian	nc	18

Now we can calculate some summary statistics, using commands that we’ve already used in previous worksheets.

Enter this comment and these commands into your script, and run them:

# Create table of descriptive statistics
descript <- task_by_subj %>%
  group_by(task, gender) %>%
  summarise(mean = mean(correct, na.rm = TRUE), sd = sd(correct, na.rm = TRUE))

Explanation of commands:

We’ve come across group_by before, here we use it to group the data by two variables at the same time, task and gender, giving us eight groups overall.
We’ve also come across summarize before, including the use of na.rm = TRUE to deal with missing data.

Our data now looks like this:

task	gender	mean	sd
nc	female	17.64	2.203
nc	male	16.29	4.231
np	female	12.09	3.239
np	male	10.43	3.952
pc	female	16.2	1.814
pc	male	14.83	1.602
pp	female	8.4	1.955
pp	male	8.5	2.345

Meaningful labels

The descript data frame contains just the numbers we want to include in our report - the means and standard deviations for each of the eight groups. However, the row labels (np, etc.) are not particularly clear, so we replace them with something more human readable.

Enter these comments and commands into your script, and run it:

# Define task names, for each task code
task_names <- c(
  nc = 'Noun Comprehension',
  np = 'Noun Production',
  pc = 'Predicate Comprehension',
  pp = 'Predicate Production'  
)
# Recode task codes into task names
descript$task <- descript$task %>% recode(!!!task_names)

Explanation of commands: We’re using the recode command that we’ve previously used in the cleaning up questionnaire data worksheet:

We start by telling R what each of the codes, nc etc., mean. So, for example nc = 'Noun Comprehension'. We combine the four ‘translations’ together into task_names using c() (short for ‘concatenate’, i.e. put things together).
We then take the task columns of the descript data frame (descript$task) and pipe (%>%) it to recode, where it uses task_names to do the recoding. We write (<-) that result back into descript$task.

Our table now looks like this:

task	gender	mean	sd
Noun Comprehension	female	17.64	2.203
Noun Comprehension	male	16.29	4.231
Noun Production	female	12.09	3.239
Noun Production	male	10.43	3.952
Predicate Comprehension	female	16.2	1.814
Predicate Comprehension	male	14.83	1.602
Predicate Production	female	8.4	1.955
Predicate Production	male	8.5	2.345

APA-format tables

Our table is now clear and easy to read. We could include it in a report without much further effort, and the reader would be able to easily see what we wanted to show them. However, it is not quite in the format that psychologists are most familiar with (which is APA format). In APA format, the table would look more like this:

Task	Female (M)	Female (SD)	Male (M)	Male (SD)
Noun Comprehension	17.64	2.2	16.29	4.23
Noun Production	12.09	3.24	10.43	3.95
Predicate Comprehension	16.2	1.81	14.83	1.6
Predicate Production	8.4	1.96	8.5	2.35

In other words, it would be wider: more columns and fewer rows.

We can widen the table, using the pivot_wider command we have previously used in the within-subject differences worksheet.

Enter this comment and these commands into your script, and run them:

# Widen table
descript_table <- descript %>%
  pivot_wider(names_from = gender, values_from = c(mean, sd))

Our table now has the same format as an APA table…

task	mean_female	mean_male	sd_female	sd_male
Noun Comprehension	17.64	16.29	2.203	4.231
Noun Production	12.09	10.43	3.239	3.952
Predicate Comprehension	16.2	14.83	1.814	1.602
Predicate Production	8.4	8.5	1.955	2.345

…but the columns are in a different order. APA format dictates that means should be placed next to their associated standard deviations in a table (APA format is weirdly specific). Fortunately, we can rearrange columns using the select command that we’ve come across before.

Enter this comment and these commands into your script, and run it:

# Re-order columns
descript_table <- descript_table %>% select(task, mean_female, sd_female, mean_male, sd_male)

task	mean_female	sd_female	mean_male	sd_male
Noun Comprehension	17.64	2.203	16.29	4.231
Noun Production	12.09	3.239	10.43	3.952
Predicate Comprehension	16.2	1.814	14.83	1.602
Predicate Production	8.4	1.955	8.5	2.345

Finally, we can replace the column names with something a bit more human readable, using the colnames function.

Enter this comment and command into your script, and run it:

# Rename columns
colnames(descript_table) <- c("Task", "Female (M)", "Female (SD)", "Male (M)", "Male (SD)")

Task	Female (M)	Female (SD)	Male (M)	Male (SD)
Noun Comprehension	17.64	2.203	16.29	4.231
Noun Production	12.09	3.239	10.43	3.952
Predicate Comprehension	16.2	1.814	14.83	1.602
Predicate Production	8.4	1.955	8.5	2.345

Note that it would arguably be clearer to write “mean” rather than “M”, but it’s another quirk of APA style that we write “M” to stand for mean.

Copying into your wordprocessor

There are a number of different ways to get a table in R into your word processor. We’re going to use the kableExtra package, because it’s really flexible, so it’s capable of producing almost any table you might need. We’re only going to use it in the most basic way here; for some other examples of what it can do, see the kableExtra website.

To get a version of descript_table that you can cut-and-paste into your word processor, enter these comments and commands into your script, and run them:

# Load 'kableExtra' package
library(kableExtra)
# Output wordprocessor-friendly table
descript_table %>% kable(digits = 2) %>%  kable_styling()

Explanation of commands:

library(kableExtra) loads the kableExtra package.
We pipe our data into kable(). The digits=2 part ensures that every number is reported to two decimal places.
We then pipe kable() into kable_styling(). This command prints the table to the Viewer window in RStudio.

Explanation of output:

Try copying the table into your word processor now. In the Viewer pane, select all of the rows and columns in the table, then right-click and select Copy. Open your word processor and select Paste. (For this to work on a Mac, you will need be working with RStudio in Chrome rather than Safari.)

Exercise 2

Starting with the data in task_by_subj, generate a table of descriptive statistics showing task accuracy for the Italian and English cards. It should look like this:

Task	English (M)	English (SD)	Italian (M)	Italian (SD)
Noun Comprehension	17.89	2.47	16.33	3.61
Noun Production	11.89	3.59	11.00	3.61
Predicate Comprehension	15.12	1.64	16.25	1.91
Predicate Production	8.75	1.04	8.12	2.75

Copy the R code you used for this exercise, including the comments, into PsycEL.

R Markdown

You can avoid copy-pasting tables (and all other analyses) by writing your reports using R Markdown instead of a word processor. R Markdown is a language for writing documents which include R code. The code is run, and the output is included in the document. R Markdown can be used to produce different types of document (e.g. reports, presentations, web pages), in various formats (e.g. Microsoft Word, PDF, HTML). The Research Methods in R worksheets are written using R Markdown, and although we don’t teach it in these materials, there are other courses which make it easy to learn.

This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.

Better tables

Paul Sharpe, Andy Wills, Allegra Cattani

Contents

Introduction

Getting started

Creating a correlation matrix

Exercise 1

Creating a custom table of descriptive statistics

Table of descriptives

Meaningful labels

APA-format tables

Copying into your wordprocessor

Exercise 2

R Markdown

rating	complaints	privileges	learning	raises	critical	advance
43	51	30	39	61	92	45
63	64	51	54	63	73	47
71	70	68	69	76	86	48
61	63	45	47	54	84	35
81	78	56	66	71	83	47

rating	complaints	privileges	learning	raises	critical	advance
43	51	30	39	61	92	45
63	64	51	54	63	73	47
71	70	68	69	76	86	48
61	63	45	47	54	84	35
81	78	56	66	71	83	47

rating	complaints	privileges	learning	raises	critical	advance
43	51	30	39	61	92	45
63	64	51	54	63	73	47
71	70	68	69	76	86	48
61	63	45	47	54	84	35
81	78	56	66	71	83	47