FuseMedML: a framework for accelerated discovery in machine learning based biomedicine
Abstract
Machine Learning is at the forefront of scientific progress in Healthcare and Medicine. To accelerate scientific discovery, it is important to have tools that allow progress iterations to be collaborative, reproducible, reusable and easily built upon without “reinventing the wheel” for each task. FuseMedML, or $\textit{fuse}$, is a Python framework designed for accelerated Machine Learning (ML) based discovery in the medical domain. It is highly flexible and designed for easy collaboration, encouraging code reuse. Flexibility is enabled by a generic data object design where data is kept in a nested (hierarchical) Python dictionary (NDict), allowing to efficiently process and fuse information from multiple modalities. Functional components allow to specify input and output keys, to be read from and written to the nested dictionary. Easy code reuse is enabled through key components implemented as standalone packages under the main $\textit{fuse}$ repo using the same design principles. These include $\textit{fuse.data}$ - a flexible data processing pipeline, $\textit{fuse.dl}$ - reusable Deep Learning (DL) model architecture components and loss functions, and$\textit{fuse.eval}$ - a library for evaluating ML models.