Efficiently measuring recognition performance with sparse data (2005)

Abstract

We examine methods for measuring performance in signal-detection-like tasks when each participant provides only a few observations. Monte Carlo simulations demonstrate that standard statistical techniques applied to a d' analysis can lead to large numbers of Type I errors (incorrectly rejecting a hypothesis of no difference). Various statistical methods were compared in terms of their Type I and Type II error (incorrectly accepting a hypothesis of no difference) rates. Our conclusions are the same whether these two types of errors are weighted equally or Type I errors are weighted more heavily. The most promising method is to combine an aggregate d' measure with a percentile bootstrap confidence interval, a computer-intensive nonparametric method of statistical inference. Researchers who prefer statistical techniques more commonly used in psychology, such as a repeated measures t test, should use gamma (Goodman & Kruskal, 1954), since it performs slightly better than or nearly as well as d'. In general, when repeated measures t tests are used, gamma is more conservative than d': It makes more Type II errors, but its Type I error rate tends to be much closer to that of the traditional .05 alpha level. It is somewhat surprising that gamma performs as well as it does, given that the simulations that generated the hypothetical data conformed completely to the d' model. Analyses in which H--FA was used had the highest Type I error rates. Detailed simulation results can be downloaded from www.psychonomic.org/archive/Schooler-BRM-2004.zip.

Bibliographic entry

Schooler, L. J., & Shiffrin, R. M. (2005). Efficiently measuring recognition performance with sparse data. Behavior Research Methods, 37, 3-10.

Miscellaneous

Publication year 2005
Document type: Article
Publication status: Published
External URL:
Categories: Statistical Inference
Keywords:

Edit | Publications overview