Computerized Adaptive Testing

What is Computerized Adaptive Testing?

Imagine a 1,000-item mathematics test with items ranging in difficulty from basic arithmetic through advanced calculus. Now consider two individuals, a fourth-grader and a graduate student in mathematics. Most questions will be uninformative for both individuals (too difficult for the first and too easy for the second). A better approach would be to begin by administering an item of intermediate difficulty, and based on the response scored as correct or incorrect, select the next item at a level of difficulty either lower or higher. This process would continue until the uncertainty in the estimated ability is smaller than a predefined threshold. This process is called computerized adaptive testing (CAT). 
Computerized Adaptive Testing
We no longer need to rely on classical test theory, in which all patients must be asked the same questions regardless of their severity of illness, in which all questions are weighted equally regardless of the severity, and then summed into a total score. The future of mental health measurement depends on questions drawn from large item banks that have been calibrated using modern psychometric methods and can be administered using computerized adaptive testing.

Modern item response theory (IRT) can accommodate multidimensional constructs such as mental health disorders. Through computerized adaptive testing based on multidimensional item response theory (MIRT), we can specify a level of precision, then administer a small, statistically optimal set of items targeted to each patient’s underlying level of severity at that particular time. 

MIRT-based computerized adaptive tests are now available for dimensional measurement in the CAT-MH® for adults and K-CAT® for youth.


Characteristics of Computerized Adaptive Testing

There are a number of characteristics that draws a distinction between computerized adaptive testing and conventional tests, including:


To create a CAT, either the entire item bank or subsets of the item bank must be previously administered to a group of individuals, and item discrimination parameters and item difficulty/severity parameters must be estimated.


Because items are selected based on an examinee’s previous answers, items must be scored as they are administered. The next item to be administered is then based on how the examinee answered all previously administered items.


Because the purpose of test administration is to obtain a test score for the examinee, the procedure for adaptive testing requries not only that items be scored as they are administered, but also that a test score of some type be determined at multiple points during the process.


In contrast to a conventional psychological test, the number of test items is not fixed in an adaptive test.

Computerized Adaptive Testing for Mental Health

Traditional mental health measurement has been based on classical test theory, in which a patients’ impairment level is estimated by a total score, which requires that the same items be administered to all respondents. These items are weighted equally so that the response to the question ‘I am sad’ is weighted of equal importance as the response to the question ‘I feel that those around me would be better off if I were dead.’ In an effort to decrease patient burden, mental health instruments are often restricted to a small number of symptom items. For a patient with a given level of depressive severity, only a few of the items will be discriminating. As a consequence, computer adaptive testing is immediately applicable to mental health measurement. 

Continuing with depression as an example, a depression inventory can be administered adaptively, such that an individual responds only to items that are most informative for assessing his or her level of depression. The net result is that a small, optimal number of items are administered to the individual without loss (and frequently with gains) of measurement precision. The CAT-DI is a dimensional measure that produces continuous severity scores of depression based on symptomatology experienced. Using an average of 12 questions, the computerized adaptive test maintains a correlation of close to = 0.95 with the entire bank of 389 depression items. This module takes an average of 66 seconds to complete.

To study the validity of our CAT-Depression Inventory (CAT-DI), 292 consecutive subjects received a full clinician-based DSM-IV diagnostic interview and the live CAT-DI (Gibbons et al, 2012). The Box-and-whiskers plot below displays the distributions of CAT-DI scores for patients with minor depression, MDD, and those not meeting criteria for depression. There is a clear linear progression between CAT-DI depression severity scores and the diagnostic categories from the Structured Clinical Interview for the DSM. Statistically significant differences were found between none and minor (p < 0.00001), none and MDD (p < 0.00001), and minor and MDD (p < 0.00001), with corresponding effect sizes of 1.271, 1.952, and 0.724 sd units, respectively. 

The paradigm shift is from traditional measurement, which fixes the number of items administered and allows the measurement of uncertainty to vary, to MIRT-based computerized adaptive testing, which fixes measurement uncertainty and allows the number of items to vary. The results are a dramatic reduction in the number of items needed to measure mental health constructs and increased precision of measurement. The information obtained in only 66 seconds during administration of the CAT-DI would take hours to obtain using traditional fixed-length tests and clinician DSM interviews.

Beyond the academic appeal of building a better and more efficient system of measurement, computerized adaptive testing for mental health constructs is important for our nation’s public health. In contrast to traditional fixed tests, adaptive tests can be repeatedly administered to the same patient over time without response set bias because the questions adapt to the changing level of severity. For the clinician, computerized adaptive testing provides a feedback loop that informs the treatment process by providing real-time outcomes measurement. For organizations, computerized adaptive testing provides the foundation for a performance-based behavioral health system and can detect those previously unidentified patients in primary care who are in need of behavioral health care and would otherwise be among the highest consumers of physical health-care resources. From a technological standpoint, these methods can be delivered globally through the Internet and therefore do not require the patient to be in a clinic or doctors office to be tested; rather, secure testing can be performed anywhere using any Internet-capable device. The testing results can be interfaced to an EMR and/or easily maintained in clinical portals that are accessible by clinicians from any Internet-capable device.

This is the future of mental health measurement.


Recommended Resource

For a more detailed discussion of fundamental and advanced topics in item response theory (IRT) written by pioneers in the field, check out Item Response Theory by R. Darrell Bock and Robert D. Gibbons. Item Response Theory will also benefit researchers interested in patient reported outcomes in health research.