Considerations for data acquisition and modeling strategies: Mitosis detection in computational pathology


Preparing data for machine learning tasks in health and life science applications requires decisions that affect the cost, model properties and performance. In this work, we study the implication of data collection strategies, focusing on a case study of mitosis detection. Specifically, we investigate the use of expert and crowd-sourced labelers, the impact of aggregated vs single labels, and the framing of the problem as either classification or object detection. Our results demonstrate the value of crowd-sourced labels, importance of uncertainty quantification, and utility of negative samples.

Zongliang (Jerry) Ji
Zongliang (Jerry) Ji
PhD student @ U of Toronto

My research interests include machine learning for healthcare and computational biology.