Boosting medical diagnostics by pooling independent judgments (2016)

Abstract

Collective intelligence refers to the ability of groups to outperform individual decision makers when solving complex cognitive prob-lems. Despite its potential to revolutionize decision making in a wide range of domains, including medical, economic, and political decision making, at present, little is known about the conditions underlying collective intelligence in real-world contexts. We here focus on two key areas of medical diagnostics, breast and skin cancer detection. Using a simulation study that draws on large real-world datasets, involving more than 140 doctors making more than 20,000 diagnoses, we investigate when combining the independent judgments of multiple doctors outperforms the best doctor in a group. We find that similarity in diagnostic accuracy is a key condition for collective intelligence: Aggregating the independent judgments of doctors outperforms the best doctor in a group whenever the diagnostic accuracy of doctors is relatively similar, but not when doctors' diag-nostic accuracy differs too much. This intriguingly simple result is highly robust and holds across different group sizes, performance levels of the best doctor, and collective intelligence rules. The en-abling role of similarity, in turn, is explained by its systematic effects on the number of correct and incorrect decisions of the best doctor that are overruled by the collective. By identifying a key factor un-derlying collective intelligence in two important real-world contexts, our findings pave the way for innovative and more effective ap-proaches to complex real-world decision making, and to the scientific analyses of those approaches. collective intelligence | groups | medical diagnostics | dermatology | mammography C ollective intelligence, that is, the ability of groups to out-perform individual decision makers when solving complex cognitive problems, is a powerful approach for boosting decision accuracy (1–7). However, despite its potential to boost accuracy in a wide range of contexts, including lie detection, political fore-casting, investment decisions, and medical decision making (8–14), little is known about the conditions that underlie the emergence of collective intelligence in real-world domains. Which features of decision makers and decision contexts favor the emergence of collective intelligence? Which decision-making rules permit this potential to be harnessed? We here provide answers to these important questions in the domain of medical diagnostics. Our work builds on recent findings on combining decisions, a research paradigm known as " two heads better than one " (15–20). In their seminal study, Bahrami et al. (15) showed that two indi-viduals permitted to communicate freely while engaging in a visual perception task, achieved better results than the better of the two did alone. Koriat (17) subsequently demonstrated that this collec-tive intelligence effect also emerges in the absence of communica-tion when the " maximum-confidence slating algorithm " (hereafter called confidence rule) is used and the decision of the more con-fident dyad member is adopted. Importantly, in both studies, combining decisions led to better outcomes only when both indi-viduals had similar levels of discrimination ability, suggesting that similarity in the discrimination ability of group members is a crucial factor in predicting whether groups can outperform their best member. At present, however, it is unclear whether these findings can help to understand the emergence of collective intelligence in real-world decision-making contexts, where stakes are high and decisions are made by experts with a long history of training. We address this issue in the domain of medical diagnostics. In the United States alone, an estimated 200,000 patients die each year from preventable medical errors (21), including a large proportion of diagnostic errors (22, 23). Reducing the frequency of diagnostic errors is thus a major step toward improving health care (24, 25). Previous research on collective intelligence in medical diagnostics has yielded conflicting results: Some studies have found that group decision making boosts diagnostic accu-racy (9, 12, 26, 27), whereas others have found null or even detrimental effects (28, 29). We here investigated whether similarity in doctors' diagnostic accuracy explains whether combining the independent decisions of multiple doctors improves or deteriorates diagnostic accuracy. We examined this question in two medical domains in which diagnostic errors are rife: breast and skin cancer diagnostics (30, 31). Within each domain, our approach was to use a simulation study that draws on previously published datasets where a large number of medical experts had independently diagnosed the same medical cases. For all cases, the correct diagnosis (i.e., cancerous, non-cancerous) was available. In particular, the breast cancer dataset on which we drew comprises 16,813 diagnoses and subjective confi-dence estimates made by 101 radiologists of 182 mammograms (32), with a mean individual sensitivity ± SD = 0.766 ± 0.112 and specificity = 0.665 ± 0.113 (SI Appendix, Fig. S1); the skin cancer

Bibliographic entry

Kurvers, R. H. J. M., Herzog, S. M., Hertwig, R., Krause, J., Carney, P. A., Bogart, A., Argenziano, G., Zalaudek, I., & Wolf, M. (2016). Boosting medical diagnostics by pooling independent judgments. Proceedings of the National Academy of Sciences of the United States of America, 113, 8777-8782. doi:10.1073/pnas.1601827113 (Full text)

Miscellaneous

Publication year 2016
Document type: Article
Publication status: Published
External URL: http://dx.doi.org/10.1073/pnas.1601827113 View
Categories:
Keywords:

Edit | Publications overview