Matteo Manica, Loic Kwate Dassi, et al.
ISGC 2022
Trial-and-error approaches in chemistry generate abundant unsuccessful experiments, yet the potential of these so-called negative results remains largely underutilized. Here, we demonstrate that information from negative chemical reactions can be leveraged to improve reactivity-prediction models, offering advantages in scenarios with a limited volume of successful data. We extend the tuning of language models with reinforcement learning to the chemistry domain, training a transformer model for chemical reaction prediction. Our approach is evaluated using both a rigorously controlled dataset and a realistic high-throughput dataset comprising extensive reaction screenings across diverse catalysts sets and experimental conditions. The model achieves state-of-the-art performance by leveraging information from as few as 20 positive data points in the controlled dataset, supported by a negative dataset at least 40 times larger. Consistent results on both datasets demonstrate that, with an appropriate optimization strategy and the inclusion of unsuccessful experimental data, models can be effectively trained even when successful reactions are underrepresented.
Matteo Manica, Loic Kwate Dassi, et al.
ISGC 2022
Yves Gaetan Nana Teukam, Federico Zipoli, et al.
Briefings in Bioinformatics
Alain Vaucher, Philippe Schwaller, et al.
AMLD EPFL 2022
Jannis Born, Matteo Manica, et al.
Machine Learning: Science and Tech.