Attempting Activity Modulation by Graph-based Small-Molecule Generative Modeling: Analysis on Training and Seed Bias and Implication of AI-driven Drug Discovery
Abstract
Generative modeling technologies have provided an unprecedented potential for the drug discovery acceleration. Thanks to efficient latent spatial representations, the data-driven deep learning approaches have enabled the effective generation of new molecules with desired properties at much higher diversity, as minimizing a possible human bias. However, factors like training data and seed bias on the design quality still remain to be scrutinized. In our study, we present the impact of seed and training bias on the output of our graph-based generative model built in an activity-conditioned variational autoencoder (VAE) architecture. Leveraging a massive, labeled data set corresponding to the dopamine D2 receptor, our graph-based generative model is shown to excel in producing desired conditioned activities and favorable unconditioned physical properties in generated molecules. We implement an activity-swapping method that allows for the activation, deactivation, or retention of activity of molecular seeds. For a more objective evaluation, we deployed independent deep learning classifiers, complementarily trained with a disjoint training subset from the generative model. This verified that the innate molecular traits relevant to target activities were effectively transferred into the generated molecules, revealing our model’s capability for the activity modulation. Overall, we uncover relationships between noise, molecular seeds, and training set selection across a range of latent-space sampling procedures, providing important insights for practical AI-driven molecule generation.