Amol Thakkar, Andrea Antonia Byekwaso, et al.
ACS Fall 2022
Obtaining diverse and high-quality labeled data for training efficient classifiers remains a practical challenge. Crowdsourcing, which involves employing multiple weak labelers, is a popular approach to address this issue. However, crowd labelers often introduce noise, inaccuracies, and possess limited domain knowledge. In this paper, we propose a novel framework CLA-RA to optimize the labeling process by determining what to label next and assigning tasks to the most suitable annotators. Our technique aims to optimize classifier efficiency by utilizing the collective wisdom of various annotators while limiting the influence of error-prone annotations. The key contributions of our work include an annotator disagreement based instance selection mechanism which identifies the noise present in annotations of the instances and an instance-dependent annotator confidence model, which identifies the annotator with the highest confidence to correctly label an instance. These methods, combined with a similarity based annotator inference method, result in improved classifier accuracy while reducing annotation efforts. Experimental results over 9 datasets demonstrate significant improvements over state-of-the-art multi-annotator active learning methods, highlighting the effectiveness of our approach in obtaining high-quality labeled data for training classifiers with minimal labeling costs and errors.
Amol Thakkar, Andrea Antonia Byekwaso, et al.
ACS Fall 2022
Dimitrios Christofidellis, Giorgio Giannone, et al.
MRS Spring Meeting 2023
Carla F. Griggio, Mayra D. Barrera Machuca, et al.
CSCW 2024
Praveen Chandar, Yasaman Khazaeni, et al.
INTERACT 2017