Example experiment checklist
Below is an example evaluation of Study 1 of Mueller & Oppenheimer (2014) against the good experiment checklist:
- 1. What is the hypothesis?
Taking notes on paper leads to better learning than taking notes on a laptop.
This is because students can type faster than they write, and this speed encourages verbatim note taking, which is less effective than summarising.
Score for clarity (0-2): 2
- 2. Is the hypothesis directional?
Yes
Score (no: 0, yes: 1): 1
- 3. Is the study employing Strong Inference?
Not really. The authors present no credible theory that would predict the opposite outcome.
Score (no: 0, yes: 1): 0
-
4. What are the two conditions?
- Note taking by hand
- Note taking on laptop
Score for clarity (0-2):2
- 5. What is intended to vary between the conditions?
Medium of note taking
Score for clarity (0-2): 2
- 6. What else is varying between conditions?
-
Number of others in room (one, or none) - seemingly uncontrolled between conditions.
-
Notes are scored in original form for laptop notes, but were transcribed (by whom?) for the hand-written notes.
-
Number of words written larger for laptops.
-
Laptop notes were more verbatim.
-
The students were directed to the room by the experimenter. The experimenter would presumably be aware of the hypothesis and the condition the participant was in. Initial instructions given could have affected performance.
- 7. Which if any of these are potentially a confounding variable, given the hypothesis?
#1 - If unbalanced between conditions, could have caused difference.
#2 - Although necessary to blind score, the transcription process itself may not have been done blind to the hypothesis (it’s not stated)
#5 - Possible failure of blinding, which could have led to effect.
Score for level of control (0-2):1
- 8. How many people per condition?
Not stated. We’re told there were 67 people in total, but not how they were split between conditions
- 9. What is measured?
Response to ‘factual’ and ‘conceptual’ questions.
- 10. How is this scored?
By the author and a second rater.
- 11. Have they controlled for pre-existing differences? (Detection, matching, large-group randomization)
Not stated. We might guess randomization. Sample size is rather small for that to be effective.
Score for level of control (0-2): 1
- 12. Is there a substantial issue with attrition?
Very few people removed from sample, so presumably attrition was close to zero in both groups.
Score for attrition (0 = serious issues, 2 = no issues, 0-2) : 2
- 13. Have they controlled for participant effects? (e.g. by blinding)
No, they didn’t - participants knew which condition they were in. Didn’t ask these students which method they thought would be better.
Control for participant effects (0-2): 0
- 14. Have they controlled for experimenter effects? (e.g. blind scoring, pre-registration)
Scoring was blind, but transcription of written notes was not, and also possible experimenter influence through initial instructions. And the analysis plan was not pre-registered.
Score for experimenter effects (0-2): 1
TOTAL SCORE: 12 (out of 18).