Henrik Nolte, Miriam Rateike, et al.
FAccT 2025
Automated machine learning (AutoML) solutions can bridge the gap between new computational advances and their real-world applications by enabling experimental scientists to build their own custom models. Here, we consider the design of such a tool for developing peptide bioactivity predictors. We analyse different design choices concerning data acquisition and negative class definition, homology partitioning for the construction of independent evaluation sets, the use of protein language models as a general sequence featurization method, and model selection and hyperparameter optimisation. Finally, we integrate the conclusions drawn from this study into AutoPeptideML, an end-to-end, user-friendly application that enables experimental researchers to build their own custom models, facilitating compliance with community guidelines.
Source code, documentation, and data can be found in the project GitHub repository: https://github.com/IBM/AutoPeptideML.
Henrik Nolte, Miriam Rateike, et al.
FAccT 2025
George Kour, Itay Nakash, et al.
ACL 2025
Michael Hind, Dennis Wei, et al.
ICML 2020
Samuel Ackerman, Ella Rabinovich, et al.
EMNLP 2024