133x Filetype PDF File size 2.53 MB Source: practicingphysician.org
SPECIAL ARTICLE The American Board of Internal Medicine Recertification Examination: Process and Results JOHN A. MESKAUSKAS, M.S., and GEORGE D. WEBSTER, M.D., F.A.O.P., Philadelphia, Pennsylvania On 26 October 1974, 3356 diplomates of the American ommendations were that recertification should be voluntary Board of Internal Medicine (ABIM) took a 1-day written and educational, and that no one should lose his primary examination for recertification consisting of multiple-choice, certification as a result of the examination. In 1970, the matching, and true-false questions derived from the Board of Regents of the American College of Physicians American College of Physicians' Medical Knowledge Self- (ACP) adopted a similar resolution, which specified that Assessment Program III and the ABIM Certifying Examination the ABIM should be the recertifying body. pool. The passing score was set by using a normative Early in the deliberations it was agreed that recertifica- standard applied to a reference group of internists practicing tion should be an accolade of continued clinical compe- general internal medicine who had had 2 or more years of tence. Ideally, evaluation of such competence should in- residency training completed between the years 1949 and clude assessment of the internist's ciinical performance. 1958. The passing score represented approximately 63% The committee reviewed in detail the present methods for correct answers. The failure rate for the total number of such assessment, including peer review, chart audit, oral examinees was 4.3%. Mean score of examinees showed an examinations, and computer examinations, but concluded inverse relation with age but relatively slight differences that for none of these were validity, reliability, and feasi- when analyzed according to the degree of subspecialization, bility well enough established to make them usable within practice setting, hospital affiliation, or size of patient the next few years. Recognizing that a necessary com- community. ponent of continued clinical competence is adequate knowledge, particularly of recent important advances, the committee decided that the ABIM's first recertification ON 26 OCTOBER 1974, over 3000 diplomatcs of the Ameri- examination should be a written one and closely linked can Board of Internal Medicine (ABIM) voluntarily took to the already-planned ACP Medical Knowledge Self- an examination for recertification in their specialty. This Assessment Program III (MKSAP 111). examination, the first recertification examination to be The Process given by a specialty hoard, was developed in response to The following plan was adopted. the growing awareness of the need for continued accounta- 1. The MKSAP Committee of the American College of bility to the public of the medical profession's competence. Physicians! developed a syllabus of the important advances It presages the announced intention of most of the other in general internal medicine within the past few years. The 21 specialty boards to develop some method of periodic syllabus was made available to subscribers in January reevaluation of their diplomates. This paper reviews the 1974. process and reports development, administration, and 2. In July 1974. subscribers received a set of 720 multi- scores of the examination results. ple-choice questions developed by nine test committees History appointed by the College. These questions pertained to the In 1969, in response to a recommendation of its Long- information contained in the syllabus and its references. Range Planning Committee, the ABIM adopted a resolu- Subscribers had until 1 October to return their answer tion favoring the concept of recertification of its diplomates, sheets for the MKSAP questions. and a committee was formed to study the methods by 3. On 26 October 1974, the Recertification Examination which this might he accomplished*. Among the first rec- was administered by the ABIM in 86 centers across the Commiltee tin Recertificalion: Dr. James Hammar^tL-n and Dr. Robert country under proctored conditions. This examination was Petersdorf (Chairmen); Dr. Franklin Epstein, Dr. Edmund Fiink, Dr, based on the syllabus. W. Lester Henry, Jr., Dr. Wallace Jensen, and Dr. Richard Reitemeicr. t MKSAP Commitiee: Dr. Nicholas P. Christy, Dr. Mariin Goldberg, Consultants: Dr. William Daines. Dr. James Fries, and Dr. William Har- Dr. James W. Hollingsworth, Dr. Calvin Kay, Dr. Thomas Killip. Dr. less, Albert 1. Mendelhiiff, Dr, Roben Pi'tersdorf (Chairman), Dr. Anthony V. From ihe American Board of Internal Medicine, Philadelphia, Pennsyl- Pisciotia, Dr. Theodore Rodman, Dr. Philip D. Swanson, and Dr. Marvin vania. Turck. Annals of Internal Medicine 82:577-581, 1975 577 4. In November 1974, the answer sheets for the self- ABIM in its other examinations. The types of questions assessment questions, together with the correct answers used were multiple-choice, matching, and multiplc-true- and references, were returned to the subscribers to MKSAP false. The distribution of questions among the nine organ- III. Those who indicated their desire to be scored in com- system areas was approximately equal. parison with other subscribers received a printout of their perccntile rank in the nine organ-system or disease areas The Examination Instrument covered in the syllabus and the self-assessment test. On receipt of the answer sheets from the testing cen- 5. In February 1975, those physicians who had taken ters, a provisional test analysis was computed on a sample the Rccertification Examination received notification of of the total group to discover errors in the answer key their pass-fail status and data on their score in the nine or questions that were misunderstood by the examinees. organ-system categories. Three such questions were found, all of the true-false 6. The physicians who successfully passed the Recertifi- type, and were eliminated from scoring. Then the revised cation Examination are receiving a certificate that attests answer key was applied to the entire examinee group. to their "Continued Scholarship in Internal Medicine." In This procedure assured the examinees and the Board that addition, a notation that they are recertified will appear decisions would not be made based on questions containing in their listing in the next edition of the Directory of Medi- apparent fiaws. cal Specialists, published under the auspices of the Ameri- The multiple-choice and matching and the true-false can Board of Medical Specialists. The results of those scores were both converted to standard scores so that the physicians who took the examination and were unsuccessful mean of the converted scores was set equal to 500 and will not be released to anyone other than the individual the standard deviation to 100. This was done to eliminate candidate. No hospital, society, or organization will know differences in mean score and variability due to the differ- who took the examination and failed it. ent question formats. (True-false questions tend to be Who Took the Examination "easier" and the distribution tends to be more compressed tban multiple-choice and matching question scores [1].) Only internists certified by the ABIM in 1968 or before The correlation coefficient between the two scores, how- were eligible for the examination. Although the number ever, was high (0.82). The scores were combined on an of internists who were eligible is not known precisely, equal-weight basis to yield a composite standard score. it was estimated to be about 15 000. Although the ex- Thus, a person at the mean of the distribution on multiple- amination was offered to all diplomates, the Board recog- choice and matching questions and the true-false questions nized that many internists who had pursued careers leading would achieve a score of 500 on each of the two standard to subspecialty certification might prefer to wait to be scores and the total-test composite score. recertified in their subspecialty. Initial registration reached The results of the final test analysis are shown in Table levels of approximately 4300, but for various reasons a 1. The total number of scoreable units in the examination number of registrants withdrew, and 3356 took the exami- was 495, and the average examinee answered 79% of these nation. One third of the 3356 indicated that over 50% of questions correctly. The reliability of the examination, their practice was related to subspecialty medicine. based on tbe total test composite score, was 0.97—a value Approximately one third of the physicians who took the that is seldom exceeded because the theoretical maximum examination were internists who had 2 or more years of for a perfect test is 1.00. This result was gratifying, be- residency training in university training programs, one cause reliability can be interpreted as an index of the de- third had at least 2 years of residency or fellowship, or gree to which examinees would be rank-ordered the same both, in university programs, and one third had less than way on repeated testings under identical conditions. The 2 years of their training in such programs. Only two per- calculation and interpretation of reliability coefficients sons of the total number had all their formal postgraduate varies somewhat according to tbe method used. The for- training in other than university programs. mula (K-R20) used here was developed hy Kuder and Of the 71 internists over the age of 65 who took the Richardson (2). examination, the oldest was 77 years. Similar information was calculated for multiple-choice The Examination and matching versus true-false questions and according to subspecialty area. These data are shown in Table 1. Be- The Recertification Examination consisted of 274 mul- cause each of these subtests is shorter than the entire test, tiple-choice type questions (498 scoreable units) selected the reliability coefficients decrease accordingly. The sub- by the ABIM's Committee on Recertification. The ques- test reliabilities would not be considered optimal for use tions were selected from the MKSAP III pool of questions as stand-alone tests, but they are sufficient for the report- (64%) and from the ABIM's pool of questions used in ing of score profiles to examinees. the 1972-74 Certifying Examinations (36%). Two cri- The average biserial correlation coefficient (r) (3) be- teria were used to select questions; [1] the subject matter tween performance of persons on each item and the test was relevant to the practice of general internal medicine as a wbole was 0.35. Compared with the usual finding for and [2] facts required to answer the question were con- most examinations in the medical area, this is a high order tained in the MKSAP syllabus or were considered to be of relation, and it is important because of its implications. core knowledge. Some MKSAP III questions had to be High item-test correlations manifest themselves in a high modified to conform to the question formats used by the degree of consistency of classification of persons: if two 578 April 1975 Annals of Internal Medicine Volume 82 Number 4 Table 1. Examinatfon Analysis ships and other specialized training. Physicians who took Examination Units Number Mean P\ Reli- clearly atypical patterns of training, that is. only 1 year, of ability or 4 or more years of residency, or 6 or more years of total Scoreable graduate training, performed at a lower level than physi- Units* cians who took the more typical training pattern. Tbis Entire test 495 0.79 0.97 should not be interpreted to suggest that tbe length of Subtests graduate training is unrelated to performance: ratber, it Multiple-choice and matching was tbougbt to be related to tbe fact tbat over the 38-year quesiions 218 0.77 0.95 bistory of tbe ABIM, a wide variety of patterns of gradu- Multiple true-false qtiestions in 0.81 0.93 ate training have been approved. Cardiovascular questions 73 0.79 0.84 A very small number of foreign medicai gradtiates took Endocrinology questions 50 0.80 0.80 tbis examination, but tbose wbo did were very well pre- Gastroenterology questions 55 0.81 0.81 pared. Hematology questions 53 0.79 0.76 Data were collected on the performance of diplomates Infectious disease questions 51 0.S3 0.77 grouped according to cbaracteristics of their professional Nephrology questions 59 0.78 0.80 Neurology questions 41 0.72 0.71 settings. These data are in Table 3. The performance of Pulmonary questions 54 0.80 0.77 pbysicians grouped by tbeir primary practice setting (Ques- Rheumatology questions 59 0.81 0.74 tion 1) was remarkably similar. Differences between solo ' A scoreable iiniL eonsisis of a rcspnnse to a queslior from which a and group practitioners, wbetber in small or large groups, score can he ohtaincii. For multiple-choice questions, this is one response were minimal. The difference in mean scores between tbe per quesiion. F'or truc-fal.'ie questions, which appear in sets of five true- fatse alternatives, each of the true-false alternatives counts as a scoreabie solo practitioners and those in large group practice is unit. statistically significant (P < 0.001) but less tban bad been t The P value, or difficulty index, of an item is the proportion of the group that answered the item correctly. The mean P is the average of predicted. Similarly, tbe pbysicians in government and such values. It can be interpreted as a percentage if the decimal point is academic settings performed comparably to the others. ignored. questions botb have bigh biserial r's, individuals who an- Table 2. Summary of Performance* swer oue correctly are mucb more likely to answer tbe otber correctly also. Tbis suggests that the test has a high Classification Number Mean Standard degree of consistency io measuring tbe breadtb of knowl- Deviation edge it is designed to measure. By age in years (85% response) Younger than 40 207 550 69.2 Examinee Performance 40-44 743 537 72.1 Tbe performance of examinees, grouped in various ways, 45-49 735 520 79.6 is presented in Tables 2 and 3. These data were gatbered 50-54 614 498 90.5 tbrougb a questionnaire mailed to all registrants before 55-59 324 47] 97.3 60-64 173 439 110.7 the examination. Tbis was a voluntary procedure, but over 65 and older 71 379 114.6 85% of ABIM diplomatcs taking tbe examination re- By the year of residency completed sponded. Questions relating to age, training, and practice (data available for 82%) were asked to develop a description of the physicians wbo 1964 or later 364 552 65.7 took the examination. 1959-63 834 529 75 0 1954-58 635 514 84.4 Tbe data in Table 2 sbow the number of persons in eacb 1949-53 591 492 91.8 of tbe categories, the mean score, and tbe standard devia- 1944-48 202 451 95.4 tion of tbese scores^—a measure of variability of the score 1943 or earlier 134 433 122.6 distribution. Tn the distributions based on age, there was By medical school location (85% response) an inverse relation between age and examination perform- U.S./Canada/U.K. 2779 508 92.2 ance. Tbis pbenomenon bas been observed a number of Foreign 80 528 76.3 times previously (4-6). At the same time, tbe variability By number of years of training in of the scores, as measured by the standard deviation, sub- genera] internal medicine (83% stantially increased witb age. Thus, tbe pbysicians who per- response) 1 149 508 95.7 formed best in tbe 65-and-older age group were performing 2 um 515 87.2 at tbe level of the best of tbe younger-tban-40 age group; 3 987 513 83.9 however, tbe majority achieved lower scores. 4 or more 199 462 109.4) Tbe data by tbe year of completion of residency are By the number of years of total graduate training (86% response) closely related to the data by age, but, of course, are not 1 107 508 I0I.2 equivalent. Here tbe same pattern is seen: persons who 2 316 521 85.9 completed their graduate training most recently achieved 3 711 517 86.5 bigber scores thaa tbose who completed tbeir training 4 1446 511 87.8 5 220 482 99.1 more remotely. 6 or more 75 432 109.9 Data were also collected on the total number of years of Total group 3355 500 95.9 residency training and graduate training, including fellow- Mean and standard deviation based on tiie composite standard score. Meskauskas and Webster Recertification Examination 579 Table 3. Examinee Performance Related to Responses to Questions aminees from the total subspecialty population. on Practice Characteristics There was no difference in performance among the Question ] groups who identified themselves as practicing in various ^lumber Mean Standard types of hospitals. This finding is counter to presumption Deviation that a university hospital environment, with its many edu- For the last 5 years, which of tne cational opportunities, would produce superior perfor- following best describes the setting mance by its physicians on the examination. Also of interest of more than half of your pro- is the finding that the size of the population from which fessional activity? (84% response) Solo/private practice 981 493 95.8 the examinee's patients were drawn was not related to Private group of 2 to 10 837 514 90.6 performance. Physicians from rural areas appeared to be Private hospital or clinic of at no disadvantage compared with those from urban 11 or more 396 526 79.8 centers. Military or governmental practice (VA, PHS, etc.) 156 513 95.6 The mean scores of ABIM recertification candidates on Full-time academic 302 519 86.1 the 120 questions drawn from the MKSAP III pool were Administrative 60 462 114.5 compared to the scores of all MKSAP III subscribers who How would you characterize your indicated they had taken the self-assessment test without practice mainly (more than 50%)? (84% response) aids. The tnean raw score of the ABIM recertification General internal medicine 1861 505 93.7 group was 94.8 and that of MKSAP subscribers was 85.6, Subspecialty medicine 953 513 88.1 with standard deviations of 14.95 and 14.41, respectively. Do you have a subspecialty interest This result, clearly a higher performance for the ABIM {with or without certification)? recertification group, could be interpreted as evidence that If so, in which area? (67% indi- cated an area of subspecialty interest) the group taking the Recertification Examination was Allergy and immunology 54 481 114.4 either more highly motivated or a superior group to those Cardiology 936 502 95.6 taking MKSAP III. Endocrinology and metabolism 264 526 82.1 A similar analysis was carried out on questions drawn G ast roenterology 253 494 94.6 Hematology 182 514 88.4 from the ABIM Certifying Examination pool. Performance Infectious disease 75 526 97.5 on these questions by recertification candidates was com- Nephrology 106 545 79.2 pared to the scores achieved on these questions by the Medical oncology 49 504 88.1 reference groups for the Certifying Examination in 1972- Nuclear medicine 31 523 75.1 74*. On the 99 questions (146 scoreable units) common Pulmonary disease 203 512 95.5 Rheumatology 94 518 93.7 to the Recertification and 1972-74 Certifying Examinations, Which of the following best describes the mean raw score was 121 for the Certifying Examina- the hospital where you see the tion reference group and for the younger-than-40 re- most patients? certification group and 112,0 for the total recertification City or county 268 508 96.6 group. The standard deviations were 8.5, 8.9, and 13.3, Community 1941 507 92.0 Federal (military, VA, etc.) 191 506 97.5 respectively. It is clear that the performance of candidates University hospital 403 514 87.6 for the Certifying Examination and of the younger age University affiliated 938 504 95.7 group for recertification was practically identical. These Nonaffiliated 815 509 92.3 data can be interpreted as indicating that the performance Which of the following best describes of the recertification group was of a high standard, as the geographic area of residence of most of your patients? (83% shown by comparing it with that of physicians who had response) recently completed their training and who had a high rate City of less than 100 000 922 507 91.1 of success on the Certifying Examination. City of between 100 000 and 1 million 979 503 91.9 Metropolitan area of 1 million The Standard Setting Process or more 901 514 93.2 An attempt was made to establish an absolute standard The majority of examinees identified themselves as prac- (as compared with a normative or "grading on the curve" ticing primarily general internal medicine rather than a standard). The Committee on Recertification graded the subspecialty. The performances of general internists and examination according to a method first proposed by suhspecialists are similar in mean score and standard de- Nedelsky (7), but it was found that the range of standards viation. Several hypotheses eould explain this unexpected determined by individual committee members was ex- result, including the widely held opinion that most suh- tremely wide and without a clear consensus. This result re- specialists practice some general internal medicine; definite flects the variahle hackgrounds and philosophies of the conclusions must await further analysis of the data. persons scoring the various items and emphasizes the Examinees who indicated a suhspecialty interest were difficulties in trying to set an absolute minimum pass-level tabulated according to their indicated field: the data are score for any broad examination. presented for interest only. Differences in mean scores may * The reference group for the Certifying F.xanimation con^iists of all tho5e examinees who received their M.D, from a U.S., Canadian, or U.K. relate to the breadth of content of the subspecialty area in medical sc:hool, are taking the Certifying Examination for the first time, general internal medicine or to self-selection of the ex- aad are taking the examination at the end of their third year of training in general internal mcdidnc. 580 April 1975 Annals of Internal Medicine Volume 82 Number4
no reviews yet
Please Login to review.