|Year : 2016 | Volume
| Issue : 4 | Page : 201-204
Item analysis of multiple choice questions: Assessing an assessment tool in medical students
Chandrika Rao, HL Kishan Prasad, K Sajitha, Harish Permi, Jayaprakash Shetty
Department of Pathology, K. S. Hegde Medical Academy, Deralakatte, Mangalore, Karnataka, India
|Date of Web Publication||2-Sep-2016|
Dr. Chandrika Rao
Department of Pathology, K. S. Hegde Medical Academy, Deralakatte, Mangalore - 575 018, Karnataka
Source of Support: None, Conflict of Interest: None
Aim: Assessment is a very important component of the medical course curriculum. Item analysis is the process of collecting, summarizing, and using information from student's responses to assess the quality of multiple-choice questions (MCQs). Difficulty index (P) and discrimination index (D) are the parameters used to evaluate the standard of MCQs. The aim of the study was to assess quality of MCQs. Materials and Methods: The study was conducted in the Department of Pathology. One hundred and twenty, 2nd year MBBS students took the MCQs test comprising 40 questions. There was no negative marking and evaluation was done out of 40 marks, and 50% score was the passing mark. Postvalidation of the paper was done by item analysis. Each item was analyzed for difficulty index, discrimination index, and distractor effectiveness. The relationship between them for each item was determined by Pearson correlation analysis using SPSS 20.0. Results: Difficulty index of 34 (85%) items was in the acceptable range (P = 30–70%), 2 (5%) item was too easy (P >70%), and 4 (10%) items were too difficult (P <30%). Discrimination index of 24 (60%) items was excellent (D >0.4), 4 (10%) items were good (D =0.3–0.39), 6 (15%) items were acceptable (D =0.2–0.29), and 6 (15%) items were poor (D < 0–0.19). A total 40 items had 120 distractors. Amongst these, 6 (5%) were nonfunctional distracters, 114 (95%) were functional distracters. The discrimination index exhibited positive correlation with difficulty index (r = 0.563, P = 0.010, significant at 0.01 level [two-tailed]). The maximum discrimination (D = 0.5–0.6) was observed in acceptable range (P = 30–70%). Conclusion: In this study, the majority of items fulfilled the criteria of acceptable difficulty and good discrimination. Moderately easy/difficult had the maximal discriminative ability. Very difficult item displayed poor discrimination, but the very easy item had high discrimination index, indicating a faulty item, or incorrect keys. The results of this study would initiate a change in the way MCQ test items are selected for any examination, and there should be proper assessment strategy as part of the curriculum development.
Keywords: Difficulty index, discrimination index, distractor effectiveness, item analysis, multiple choice questions
|How to cite this article:|
Rao C, Kishan Prasad H L, Sajitha K, Permi H, Shetty J. Item analysis of multiple choice questions: Assessing an assessment tool in medical students. Int J Educ Psychol Res 2016;2:201-4
|How to cite this URL:|
Rao C, Kishan Prasad H L, Sajitha K, Permi H, Shetty J. Item analysis of multiple choice questions: Assessing an assessment tool in medical students. Int J Educ Psychol Res [serial online] 2016 [cited 2018 Feb 25];2:201-4. Available from: http://www.ijeprjournal.org/text.asp?2016/2/4/201/189670
| Introduction|| |
Multiple choice questions (MCQs) are frequently used to assess students in different educational streams for objectivity and wide reach of coverage in less time. MCQs are used mostly for comprehensive assessment at the end of academic sessions and provide feedback to the teachers on their educational actions. Designing MCQ is a complex and time-consuming process in a multidisciplinary, integrated curriculum. MCQ needs to be tested for the standard or quality. Item analysis examines the student responses to individual test items (MCQ) to assess the quality of those items and test as a whole. Item analysis assesses the assessment tool for the benefit of both student and teacher.
The aim of the study was to analyze the quality of MCQ's. To investigate the relationship of items having good difficulty and discrimination indices with their distractor efficiency and to find out the correlation between difficulty index (P) and discrimination index (D).
| Materials and Methods|| |
The study was conducted in the Department of Pathology as part of the assessment. Total 120, second year MBBS students took the MCQ's test comprising 40 questions with a single best response. There was no negative marking and time allotted was half an hour. Prevalidation of the paper was done by the head of the department. The evaluation was done out of 40 marks, and 50% score was the passing mark.
Postvalidation of the paper was done by item analysis. The scores of all the students were arranged in order of merit. The upper one-third students were considered high achievers and lower one-third as low achievers. Each item was analyzed for:,
- Difficulty index or P value using formula P = H + L/N ×100
H = Number of students answering the item correctly in the high achieving group
L = Number of students answering the item correctly in the low achieving group
N = Total number of students in the two groups (including nonresponders)
- Discrimination index (D) or d value using formula, d = H – L ×2/N
Where the symbols H, L, and N represent the same values as mentioned above
- Distractor effectiveness (DE) or functionality.
Difficulty index (P) if
P< 30% Difficult
P = 30–70% Acceptable
P > 70% Easy
Discrimination index (D) if
D = Negative. Defective item/wrong key
D = 0–0.19 Poor discrimination
D between 0.2 and 0.29 Acceptable discrimination
D between 0.3 and 0.39 Good discrimination
D > 0.4 Excellent discrimination.
An item contains a stem and four options including one correct (key) and three incorrect (distractor) alternatives. Nonfunctional distractor (NFD) in an item is the option, other than the key selected by <5% of students and functional or effective distractor is the option selected by 5% or more students. On the basis of NFDs in an item, DE ranges from 0% to 100%. If an item contains three or two or one or nil NFDs, then DE would be 0, 33.3%, 66.6%, and 100%, respectively.
The data are reported as a percentage and mean plus or minus standard deviation (SD) of n items. The relationship between the difficulty index and discrimination index values for all items was determined using Pearson correlation analysis and using SPSS 20.0 (IBM, Armonk, NY, United States of America). P< 0.05 was considered to indicate statistical significance.
| Results|| |
A total of 120 students gave the test consisting of 40 MCQs. As seen in [Table 1], mean difficulty index (P) was 50.16 ± 16.15 while mean discrimination index (D) was 0.34 ± 0.17. The distribution between difficulty indices (range 23.7–75.0) and discrimination indices (range 0–0.66) in all forty MCQ items were analyzed.
|Table 1: Comparison of difficulty index. Discrimination index and distractor effectiveness of the MCQ item|
Click here to view
A total of 40 items had 120 distractors. Amongst these, 6 (5%) were NFDs, 114 (95%) were functional distractors. Mean distractor efficiency was 89.99 ± 24.42 and distribution range from 0% to 100% [Table 1].
[Figure 1] shows, out of a total 40 items, difficulty indices of 5% MCQ items were easy (P > 70%), about 10% were difficult (P < 30%) and the remaining 85% of the items were within an acceptable range (30–70%).
The discrimination indices for 40 items showed 15% of the items with poor discrimination power (0–0.19), and 60% of the items exhibited excellent discrimination (>0.4). The remaining 25% were acceptable and good, out of which 15% of the items had an acceptable range (0.2–0.29) and 10% of the items showed good discrimination (0.3–0.39) [Figure 2].
The discrimination index correlated positively with the difficulty index (r =0.563, P = 0.010, significant at 0.01 level [two-tailed]). The maximum discrimination (D = 0.5–0.6) was observed in acceptable range (P = 30–70%).
| Discussion|| |
The effective measurement of knowledge acquired is an important component of medical education. MCQ form useful assessment tool in measuring factual recall and if carefully constructed can test higher order of thinking skills which is very important for a medical graduate. The method of assessment should be regularly evaluated. Developing an appropriate assessment strategy is a key part in curriculum development. It is important to evaluate MCQ items to see how effective they are in assessing the knowledge of students.
Postexamination analysis of the MCQs helps to assess the quality of individual test items and test as a whole. Poor items can be modified or removed from the store of questions.
Previous studies have proposed the mean of difficulty index as 39.4 ± 21.4%, 52.53 ± 20.59. Karelia et al. showed a range of mean ± SD between 47.17 ± 19.79 and 58.8 ± 19.33 in a study conducted over a period of 5 years. They also showed 61% items in acceptable range (P = 30–70%), 24% items (P >70%), and 15% items (P< 30%). Other study by Patel and Mahajan showed 80% of items in the acceptable range. Our findings corresponded with this study having a mean of difficulty index as 75.0 ± 23.7. The P value of 34 (85%) items was in acceptable range, two items (5%) easy, and 4 (10%) items difficult.
Higher the difficulty index lower is the difficulty of the question. The difficulty index and discrimination index are reciprocally related. Questions with high P value are considered to be good discriminators. The value of discrimination index normally ranges between 0 and 1. Any discrimination index of 0.2 or higher is acceptable, and the test item would be able to differentiate between weak and good students. In this, it shows that 75% had discrimination index of more than 0.2. Out of 75%, 65% showed mean discrimination index of equal to or more than 0.4, indicating that these MCQ item were excellent test items for differentiating between poor and good performers. There were no items with negative discrimination index. Some studies have shown negative discrimination index in 20%. Items with negative discrimination index decrease the validity of the test and should be removed from the collection of questions.
Earlier studies have revealed 29% items with discrimination index >0.4, 46% items with discrimination index between 0.2 0.39 and 21% items with discrimination index <0.19. A positive correlation was noted in difficulty and discrimination indices. The Same observation was reported by Pande et al., 2013 and Si-Mui Sim and Rasaiah 2006 in their studies. Mitra et al., 2009 showed that the discrimination index correlated poorly switch difficulty index (r = −0.325). The negative correlation signifies with increasing difficulty index values; there was a decrease in the discrimination index indicating that low performers were more likely to get the correct answer. In the present study, moderately easy/difficult (acceptable range) items had the maximal discriminative ability. Very difficult item displayed poor discrimination, but the very easy item had high discrimination index, indicating a faulty item, or incorrect keys.
The distraction effect of items in our study was 89.99%. The number of an NFDs also affect the discrimination power of an item. It is seen that reducing the number of distractors from four to three decreases the difficulty index while increasing the discrimination index and reliability. Hingorjo  observed that items having one NFD had excellent discrimination ability. (D = 0.427) As compared to items with all four functioning distractors (D = 0.351). This compares well with other studies favoring better discrimination by three distractors as compared to four.
It was also observed that item having good difficulty index (P = 30–70) and good/excellent D (D > 0.24), considered to be ideal question, had DE of 85.15% which is close to items having one NFD.
| Conclusion|| |
Item analysis is a simple yet valuable procedure performed after the examination providing information regarding the reliability and validity of an item/test by calculating difficulty index, discrimination index, distractor efficiency, and their interrelationship. An ideal item (MCQ) will be the one which has average difficulty index between 31% and 60%, high discrimination (D > 0.25), and maximum distractor efficiency (100%) with three functional distractors. Items analyzed in the study were neither too easy nor too difficult (mean difficulty index = 50.16%), and overall discrimination index was 0.34, which is acceptable. In this study, the majority of items fulfilled the criteria of acceptable difficulty and good discrimination. Easy items with poor discrimination index will be reviewed and reconstructed. The results of this study should initiate a change in the way MCQ test items are selected for any examination, and there should be proper assessment strategy as part of the curriculum development. Much more of these kinds of analysis should be carried out after each examination to identify the areas of potential weakness in the one best answer type of MCQ tests to improve the standard of assessment.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Mehta G, Mokhasi V. Item analysis of multiple choice questions – An assessment of the assessment tool. Int J Health Sci Res 2014;4:197-202.
Tejinder S, Piyush G, Daljit S. Principles of Medical Education. 3rd
ed. New Delhi: Jaypee Brothers Medical Publishers (P) Ltd; 2009. p. 70-7.
Ananthakrishna N. The item analysis: In: Medical education principles and practice; by Anathakrishnan N, Sethukumaran KR, Kumar S, editors. ch. 20. Pondicherry, India: Alumni Association of National Teacher Training centre, JIPMER; 2000. p.131-7.
Pande SS, Pande SR, Parate VR, Nikam AP, Agrekar SH. Correlation between difficulty and discrimination indices of MCQ's in formative exam in physiology. South East Asian J Med Educ 2013;7:45-50.
Karelia BN, Pillai A, Vegada BN. The levels of difficulty and discrimination indices and relationship between them in four-response type multiple choice questions of pharmacology summative tests of year II M.B.B.S students. IeJSME 2013;7:41-6.
Patel KA, Mahajan NR. Itemized analysis of questions of multiple choice question (MCQ) exam. Int J Sci Res 2013;2:279-80.
Carroll RG. Evaluation of vignette-type examination items for testing medical physiology. Am J Physiol 1993;264:S11-5.
Gajjar S, Sharma R, Kumar P, Rana M. Item and test analysis to identify quality multiple choice questions (MCQs) from an Assessment of medical students of Ahmedabad, Gujarat. Indian J Community Med 2014;39:17-20.
Sim SM, Rasiah RI. Relationship between item difficulty and discrimination indices in true/false-type multiple choice questions of a para-clinical multidisciplinary paper. Ann Acad Med Singapore 2006;35:67-71.
Mitra NK, Nagaraja HS, Ponnudurai G, Judson JP. The levels of difficulty and discrimination indices in type A multiple choice questions of pre-clinical semester 1 multidisciplinary summative tests. IeJSME 2009;3:2-7.
Hingorjo MR, Jaleel F. Analysis of one-best MCQs: The difficulty index, discrimination index and distractor efficiency. J Pak Med Assoc 2012;62:142-7.
[Figure 1], [Figure 2]