This document covers some of the reasons we use R in this course. It’s not “required reading”, but take a look if you’re interested.
R is a piece of software for handling data. It’s the one used on this course, but it’s not the only option available, others include: Excel, Jamovi, JASP, MATLAB,Stata and, perhaps the most talked-about alternative, SPSS.
Students prefer R. In a recent study, undergraduate psychology students at Glasgow University were given a choice between R and SPSS, having experienced both. Two-thirds of the students chose R. Those who chose R did better in the final assessments and showed lower stats anxiety. R is being used to teach Plymouth University undergraduates (and visiting Year 10 students) across a range of different courses. Read more.
Data science is a graduate skill in high demand, and using R is a key skill in that market. In contrast, demand for SPSS skills has been declining dramatically for a decade. At SPSS’s current rate of decline, it’ll be gone by the time you graduate. Read more at r4stats and at loveR.
R is free. You don’t need to pay anything to download or use it, and never will. In contrast, once you leave university, SPSS would cost you or your employer around £1,000 - £3,400 per person per year.
Every analysis you can think of is already available in R, thanks to over 18,000 free packages. As new analyses are developed, they become available in R first. In 2013, SPSS realised it couldn’t keep up with R, and admitted defeat.
Real data analysis is mainly preprocessing – scientists spend around 80% of their analysis time getting the data into a format where they can apply statistical tests. R is fantastically good at preprocessing. Our course focusses on realistic data analysis, making R the perfect tool for the job.
The alternatives to R for real data analysis are either kludgy, error prone and have poor reproducibility (e.g. preprocessing in Excel, followed by statistics in SPSS), or are more niche in the graduate jobs market (e.g. MATLAB). In particular, Excel is famously error prone with, for example, 1 in 5 experiments in genetics having been screwed up by Excel and the case for the UK government’s policy of financial austerity being based on an Excel screwup.
R’s use of scripts means that, if you have done the analysis completely in R, you already have a full, reproducible record of your analysis path. Anyone with an internet connection can download R, and reproduce your analysis using your script. Making your analyses reproducible is an essential skill in many areas of research.
R is “free as in freedom” because all the source code is available to everyone (it’s “open source”). Some reasons this is important:
All software has bugs; making the source code available means it’s more likely that these bugs are found and fixed. In contrast, no one outside of IBM can look at the source code for SPSS, and it’s entirely up to IBM whether they fix, or tell you about, the bugs it has.
All software is eventually abandoned by the people who wrote it (if for no other reason than their death). Open source software only dies if no one in the world cares enough about it to maintain it. In contrast, closed-source software (e.g. SPSS) dies as soon as the current owners decide to kill it.
You can use R without having to install it, e.g. RStudio Plymouth.
Jamovi and JASP are free software packages for statistical analysis that are written using R, and hence have some similarities to R. They also have some similarities to SPSS in the sort of user interface they provide. Although some people find “point and click” interfaces appealing, such systems are substantively limited when it comes to data preprocessing, they do not encourage a repoducible (script-based) approach to open science, and they obscure the process of analysis from the user, which tends to reduce understanding.
Jamovi and JASP are also relatively new projects, with a low bus factor, while R has been freely available for more than 20 years, is supported by an international core team of 20 developers, and is also supported by major tech companies including Microsoft, and posit (who provide the RStudio interface).
In the summer of 2022, JASP changed their underlying algorithm for Bayesian ANOVA, leading to the latest version sometimes producing substantially different answers to older versions (e.g. a change in which of two main effects is reported as having substantial Bayesian evidence). Although the change was well-motivated, it was announced with little fanfare, and it seems likely that the average user of JASP would not have been aware of the underlying change. This kind of non-obvious change is hard to avoid when the process of analysis is obscured from the user by the use of a point and click interface, and causes problems for the reproducibility of analyses over time.
This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.