Research Methods in R
2023 Edition.
Research Methods in R is a set of guides on how to use R as your central research methods tool. They are written by various authors, and curated by Andy Wills. The target audience is psychology undergraduate students. Research Methods in R is Creative Commons, so you are free to reuse these materials and adapt them as you wish, as long as you attribute them to their authors, and as long as your modifications have a Creative Commons licence. They come with absolutely no warranty of any kind.
Note to teachers: These materials have been tested against R version 4.2.1 (released 23rd June 2022), and the most recent version of packages that were available on MRAN on the 1st June 2022.
List of guides
Introductory quides
Start with ONE of these three options:
- Long, easy-going introduction
- Somewhat shorter introduction
- Even shorter introduction
Intermediate guides
Next, go through BOTH of these guides.
Advanced guides
General resources
-
Why R? Discussion of the advantages of R over other software packages.
-
Pedagogy Discussion of philosophy of teaching and learning underlying these materials (mainly aimed at teachers).
-
Who is using R? Partial list of psychology degree programmes around the world than use R.
-
Other resources. A list of other Creative Commons resources about using R.
-
Calculating your module mark. How to calculate a final module mark from your component marks, using R.
-
Dealing with common errors. List of commonly-encountered errors and how to solve them.
Absolute Beginners’ Guide to R
A series of worksheets on using R for data analysis in psychology. No previous knowledge of R, or of psychology, is assumed.
Part 1
-
Introduction to RStudio. A basic introduction to the software.
-
Exploring data. Means, medians, and histograms.
-
More on tibbles. Deeper explanation of ‘tibbles’ in R.
-
Means and medians. Some slides on the difference between a mean and a median.
-
-
Group differences. Means and standard deviations, by group. Filtering data. Effect size.
-
Evidence. Introduction to p values. Traditional between-subjects t-test. Bayesian between-subjects t-test.
-
More on t-tests. Further information on traditional t-tests, and confidence intervals.
-
More on Bayes Factors. A more detailed discussion of Bayes Factors.
-
-
Analyzing your project data. Analysing your own data.
-
Entering data by hand. Entering data into a spreadsheet. Saving data into your RStudio project.
Part 2
-
Inter-rater reliability. Percentage agreement. Cohen’s kappa.
- More on Cohen’s kappa. A discussion of some potentially surprising outputs from a Cohen’s kaapa calculation.
-
Relationships. Frequency and contingency tables. Mosaic plots. Traditional chi-square test. Bayesian test.
-
More on relationships. Extension material on chi-square calculations, including issues surrounding ordered variables (e.g. age), the interpretation of large contingency tables, and a further explanation of the output of the Bayesian chi-square test.
-
Sample characteristics. How to calculate summary information about your sample, such as number of participants or gender balance, from your data file.
-
-
Relationships, part 2. Density plots. Scatter plots. Correlation co-efficient. Bayesian and traditional tests.
-
More on relationships, part 2. Spearman’s correlation, Kendall’s tau, one-tailed tests, confidence intervals, plus a deeper look at the output of the Bayesian correlation test.
-
Making reports with R. How to insert an RStudio graph into your word processor document (e.g. Word). Links to RMarkdown as an alternative.
-
Putting R to work
These are mainly further practice in the skills learned in Absolute Beginners’. Where the exercises contain completely new skills, these are shown in bold. Where the exercises extend a skill you’ve already been taught, these are shown in italics. The exercises become somewhat more difficult as you go down the list.
If you are a current undergraduate student at Plymouth University, you should complete the accompanying Psych:EL (Psychology: Experiential Learning) activity first, in order to generate your own set of data. If you’re not, you can download sample data files here.
-
Autobiographical memory. Entering data by hand, histograms.
-
Face recognition. Means, filtering data, and a bar graph.
-
Spatial navigation. More on bar graphs.
-
Response compatibility. Means, filtering data, standard deviations, and density plots.
-
Visual illusions. Filtering data, means, violin plot, Bayesian t-test.
-
Facial attractiveness. Means, standard deviations, inter-quartile range, and density plots.
-
Police lineup. Contingency table, mosaic plot, Bayesian contingency test, means, density plot, Bayesian t-test
-
Risk taking. Means, combining data frames, filtering data, and density plots.
-
Animal Welfare. Percentage agreement, Cohen’s kappa, contingency tables, bar charts.
-
Creativity and the environment. Preprocessing, means, density plots, effect size, Bayesian t-test.
-
Political psychology. Means, filtering data, summarizing data, density plots, effect size, Bayesian t-test, traditional t-test.
A Very Brief Guide to R
The Absolute Beginners’ Guide to R and Putting R to Work provide, between them, about 20 hours of introductory material. For those in a hurry, the Very Brief Guide to R covers the most critical material from those two courses in about four hours.
-
Using RStudio: Brief introduction to the software
-
Exploring data: Loading data, calculating means
-
Group differences: Grouping, density plots, filtering.
-
Evidence, part 1: Bayesian and traditional t-tests
-
Evidence, part 2: Bayes and traditional correlation, scatterplot
Research Methods in Practice (Quantitative section)
These are intermediate-level materials. They are maintained by Ben Whalley on a separate site, but have been designed to fit in here in this sequence of materials. Only the quantitative section of Ben’s site contains information concerning the usage of R.
- Research Methods in Practice: Data handling, fitting lines - scatterplot with best fit line , converting Likert scales from text to numbers, reverse scoring scale items, multiple regression.
Intermediate Guide to R
These are intermediate-level materials. They provide analysis methods for conducting realistic, high-quality studies in psychology. They are aimed at a second-year undergraduate audience.
-
Revision: A quick recap of key information covered in earlier courses.
-
Statistical power: How to calculate the statistical power of experiments.
- More on statistical power: A deeper discussion on statistical power, including: (1) relation between statistical power and the replication crisis, (2) better standards for statistical power, (3) how to improve effect size, (4) estimating effect size from previous work.
-
Data preprocessing: Getting data from lab-based (OpenSesame) experiments into a format closer to something you can actually analyse, in five steps: loading, selecting, filtering, summarising, and combining. Also covers combining data frames, renaming columns.
- More on preprocessing: A slightly more advanced worksheet, covering adding columns to a data frame, and subsetting strings.
-
Within-subject differences: Data preprocessing (pivoting and mutating). One-factor within-subject Bayesian ANOVA. Pairwise comparisons, multiple comparisons.
- More on Bayes Factors. A more detailed discussion of Bayes Factors.
-
Understanding interactions: Learn what an interaction is, and learn how to do line plots at the same time.
-
Factorial differences: Two-factor Bayesian ANOVA (one within, one between), plus advice on: pairwise comparisons, better graphs, reporting Bayesian ANOVA, and ordinal (i.e. ordered) independent variables.
Going further with R
These are slightly more advanced materials, aimed at a final-year undergraduate psychology audience.
-
Data management
- Data management: Anonymity and privacy, good and bad file types, creating and sharing a private github repository, adding a repository to Rstudio, adding files to github using Rstudio, modifying and updating files, git log as your logbook, branching, recovering an earlier version of a file.
-
Preprocessing
-
Data preprocessing for experiments: De-duplicating data, excluding participants, log transform.
-
Data preprocessing for scales: Handling missing data, calculating scale scores, tidying survey data.
-
-
Descriptive statistics
-
Better tables: correlation matrix, custom table of descriptive statistics.
-
Better graphs: publication-quality graphs showing both central tendency and variability (or uncertainty) of your data, including: line plots, distribution plots (density, violion, half-violin), box plots, and confidence intervals. Suggested plots for one- and two-factor designs, within-subject, between-subject, or mixed designs, and with ordered and unordered variables. Discussion of common bad plots to avoid (bar plots; confusions over confidence intervals). Pairs plot for correlational designs.
-
Analysing scales: Cronbach’s alpha.
-
-
Bayesian inferential statistics
-
Estimate sample size with Bayes Factors: An introduction and manual to Bayesian Power Calculations.
-
One-sample Bayesian t-test: Comparing a single-group sample of data against a population mean.
-
More on Bayesian ANOVA: More on two-factor Bayesian ANOVA.
-
More on regression: Multiple regression with more than two predictors, hierarchical regression, evidence for individual predictors.
-
-
Traditional inferential statistics
-
Traditional ANOVA: p-value based, approach to ANOVA.
-
Traditional non-parametric tests: Mann-Whitney U, Kruskal-Wallis H.
-
Case studies
These are full preprocessing and analysis pipelines, mainly based on final-year undergraduate psychology projects.
-
The effects of negative mental imagery on self-esteem: preprocessing, Cronbach’s alpha, Bayesian ANOVA.
-
The Perruchet Effect: Downloading from OSF, de-duplicating data, excluding participants, line graphs, baseline correction of neuroscience data, functions, loops, merging data frames, list of participant numbers, log transforms, recoding data, Bayesian linear regression for within-subjects designs.
-
Childrens’ language development: preprocessing, Bayesian t-test, tables of descriptive statistics, correlations, half-violin plot, Wilcoxon test.
R for Pros
These worksheets go beyond what is taught in previous sections of RMINR. They are aimed at high-achieving undergraduates, as well as postgraduate students and professional researchers. They assume familiarity with material up to and including Going Further with R.
- Bayesian ANOVA for Pros: doing two-factor within-subjects Bayesian ANOVA better.
Source code
These teaching materials were generated using a combination of Markdown and RMarkdown. The full source code is available on github.
Licence
This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.
Parts of this material have been adpated from these other Creative Commons materials:
- May, J. (2018). Getting Results with R.
- Whalley, B. (2018). Just Enough R.
- Wills, A. (2015). R for Experimental Psychologists.
Acknowledgements
Thanks to the following people for their feedback and advice on these materials:
Jackie Andrade, Eleanor Andrade May, Martyn Atkins, Patric Bach, Alison Bacon, Dale Barr, Nadège Bault, Chris Berry, Allegra Cattani, Laura Charlton, Lisa DeBruine, Charlotte Edmunds, Emily Filewood, Giorgio Ganis, Phil Gee, Michaela Gummerum, Yaniv Hanoch, Cathryn Harries, Jessica Hart, Sophie Homer, Courtney Hooton, Angus Inkster, Jasmin Jones, Peter Jones, Laith Kahn, Gokcek Kul Helen Lloyd, Chris Longmore, Jon May, Anthony Mee, Chris Mitchell, Millie Monks, Karol Nedza, Alyson Norman, Charlie Reynolds, Matt Roser, Paul Sharpe, Alastair Smith, Julian Stander, Sylvia Terbeck, Michael Verde, Clare Walsh, Ben Whalley.