Learning to Optimize Molecules with a Chemical Language Model
Abstract
Transformer-based chemical language models (CLM), trained on large and general purpose datasets consisting of molecular strings, have recently emerged as a powerful tool for successfully modeling various structure-property relations, as well as for proposing novel candidates. In this work, we propose a novel approach that harnesses a recent generative CLM, namely GP-MoLFormer, to propose small molecules with more desirable properties. Specifically, we present a parameter-efficient fine-tuning method for the unconstrained property optimization, which uses property-ordered molecular pairs as input. We call this new approach pair-tuning. Our results show GP-MoLFormer outperforms existing baselines in terms of generating diverse molecules with desired properties across three popular property optimization tasks, namely drug likeliness, penalized logP, and dopamine type 2 receptor activity. Results demonstrate the general utility of pair-tuning together with a generative CLM for a variety of molecular optimization tasks.