MAP: Multi-Human-Value Alignment Palette
Xinran Wang, Qi Le, et al.
ICLR 2025
We present SMI-TED (SMILE Transformer Encoder Decoder), a large-scale foundation model for materials and chemistry, trained on a massive dataset of 91 million SMILES samples (4 billion molecular tokens) from PubChem using self-supervised learning. Our encoder-decoder architecture enables a wide range of complex tasks, including the prediction of quantum chemical properties and reaction yields. We offer two model variants, with 289M and 8 X 289 parameters, respectively, to accommodate different use cases. Our model achieves state-of-the-art results across multiple benchmark datasets, demonstrating its versatility and effectiveness. Notably, our model's latent space exhibits compositionality and separability, essential properties for higher-level reasoning tasks and few-shot learning capabilities. To facilitate further research and applications, we make our model weights and source code publicly available on HuggingFace and GitHub, respectively.
Xinran Wang, Qi Le, et al.
ICLR 2025
Eduardo Almeida Soares, Dmitry Zubarev, et al.
ICLR 2025
Shawn Tan, Songlin Yang, et al.
ICLR 2025
Xinran Wang, Qi Le, et al.
ICLR 2025