Writing Effective Multiple Choice Questions
I recently began my workshop, Assessments 101: More Than Multiple Choice, with the caveat that I don’t hate multiple choice questions (MCQ). For exams I still regularly use MCQ in addition to a variety of other formats.
I think MCQs get a bad rap. They’re a tool, and like all tools there is a skill to using them effectively. There are strategies for writing good MCQ items, and there are ways to evaluate whether your items are effective. Using these strategies can make you vastly more comfortable with using – and defending the use of – MCQ on your exams.
Writing Good Stems
1. To get at application and analysis, use vignette-style questions. |
|
2. Have students interpret novel material that is referenced over several questions. |
|
3. Consider principles of Universal Design for Learning (UDL) to write questions that are accessible to all learners. |
|
Writing Good Distractors
1. Write statements that are true but don’t answer the question. |
|
2. Write statements that might seem right to the student but are incorrect. |
|
3. If you use a pair of concepts in the options, make it two pairs |
|
4. Try to make all options the same length. |
|
5. Avoid using “All of the above”, “None of the above” |
|
6. Avoid using “A, B and D” style options |
|
Overall Design of MCQ Section
1. Align exam questions with course learning outcomes |
|
2. Make the distribution of answers even among As, Bs, Cs, and Ds |
|
3. Use 3 – 4 options total |
|
1 Rodriguez (2005) recommends 3 options. When I went from 5 options to 4 options, my exam averages did not change.
Use Item Analysis to Improve Exam Items
Item analysis is a statistical procedure for evaluating the validity and reliability of MCQ exams. These statistics are often available through the scanning technology available to instructors, and they are also available via Moodle for instructors that use the quiz function.
Question Difficulty is the proportion of students that got the question correct – the harder the question, the fewer students will get the question correct. The optimal difficulty is p(correct) 0.5 – 0.75. I start to look hard at questions with p(correct) < .5, and I am highly likely to eliminate questions with p(correct) < 0.3. Keep in mind that if students are guessing, p(correct) should be about 0.25.
The Discrimination Index (DI) is the correlation for an exam item with overall exam performance. The notion is that good questions should be more likely to be answered correctly by strong students compared to weak students, so a good question should positively correlate with overall exam performance. A DI of 0.1 is ok, but above 0.3 is pretty good. A negative DI tends to indicate that there is something misleading about a question such that strong students are over-interpreting the question, while weak students are taking it at face value.
TABLE: Guides for Interpreting Question Difficulty & Discrimination Index Together
LOW DIFFICULTY p(correct) > .8 |
MEDIUM DIFFICULTY p(correct) from 0.5 – 0.8 |
HIGH DIFFICULTY p(correct) < 0.4 |
|
LOW DI 0 – 0.15 |
|
|
|
HIGH DI 0.15 – 0.5+ |
|
|
|
Item analysis reports often also include a Reliability Coefficient. The reliability coefficient is basically a correlation coefficient that indicates how well all of the items correlate with each other. The higher the reliability coefficient, the more likely it is that your exam is targeting a single construct (i.e. knowledge of a particular discipline). The reliability coefficient should be at least 0.6, above 0.7 is good for a classroom test. Standardized tests like the SATs have a reliability coefficient of around 0.9.
A Final Note on MCQ Exam Sections
UDL encourages multiple means of expression. There is not one magic exam question format or assignment design that will allow all students to perfectly demonstrate their level of achievement with course learning outcomes. Variability and options allow students to find a method of expression that effectively demonstrates their level of competence. I would encourage you to give your students who struggle with MCQ another avenue to demonstrate competence on exams. Usually my MCQ strongly correlates with the other sections of my exam, but occasionally I have a student who suffers on one type of question format or another. I think it’s important that those students have the opportunity to earn a grade that better reflects their competence with the course material, rather than their competence with MCQ.