Constructing Test Collections using Multi-armed Bandits and Active Learning
Rahman, Md Mustafizur
MetadataShow full item record
While test collections provide the cornerstone of system-based evaluation in information retrieval, human relevance judging has become prohibitively expensive as collections have grown ever larger. Consequently, intelligently deciding which documents to judge has become increasingly important. We propose a two-phase approach to intelligent judging across topics which does not require document rankings from a shared task. In the first phase, we dynamically select the next topic to judge via a multi-armed bandit method. In the second phase, we employ active learning to select which document to judge next for that topic. Experiments on three TREC collections (varying scarcity of relevant documents) achieve ? ? 0.90 correlation for P@10 ranking and find 90% of the relevant documents at 48% of the original budget. To support reproducibility and follow-on work, we have shared our code online1. © 2019 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC-BY 4.0 License.