Adaptive Online Replanning with Diffusion Models
Siyuan Zhou, Yilun Du, et al.
NeurIPS 2023
Self-supervised learning (SSL) has great potential for molecular representation learning given the complexity of molecular graphs, the large amounts of unlabelled data available, the considerable cost of obtaining labels experimentally, and the hence often only small training datasets. The importance of the topic is reflected in the variety of paradigms and architectures that have been investigated recently. Yet the differences in performance seem often minor and are barely understood to date. In this paper, we study SSL based on persistent homology (PH), a mathematical tool for modeling topological features of data that persist across multiple scales. It has several unique features which particularly suit SSL, naturally offering: different views of the data, stability in terms of distance preservation, and the opportunity to flexibily incorporate domain knowledge. We propose (1) an autoencoder, which shows the general representational power of PH, and (2) a contrastive-learning-based loss, which flexibly can be applied on top of existing SSL approaches. We rigorously evaluate our approach for molecular property prediction and demonstrate its particular features: after SSL, the representations are better and offer considerably more predictive power than the baselines over different probing tasks; our loss increases baseline performance, sometimes largely; and we obtain consistent substantial improvements over very small datasets, a common scenario in practice.
Siyuan Zhou, Yilun Du, et al.
NeurIPS 2023
Erik Miehling, Rahul Nair, et al.
NeurIPS 2023
C.A. Micchelli, W.L. Miranker
Journal of the ACM
Kenneth L. Clarkson, Elad Hazan, et al.
Journal of the ACM