Representing surfactants by foundation models
Abstract
This work presents a novel approach to predicting surfactant phase diagrams by leveraging the SMI-TED foundation model, a pre-trained encoder-decoder architecture based on SMILES representations. The methodology integrates molecular representations with environmental variables, including composition (wt%) and temperature ((^\circ)C), to enhance predictive performance. For phase diagram prediction, the latent space of SMI-TED was extended with thermodynamic parameters. Experimental results demonstrate accurate predictions for dominant phases such as liquid, ice and aqueous phases, with phase boundaries closely aligned with experimental data. However, the model exhibits limitations in boundary and transition regions, particularly for minority phases like lamellar, cubic and solid surfactant phases. These findings highlight the potential of integrating molecular and thermodynamic data within foundation models for predictive materials science, while also identifying opportunities for improvement through enhanced data representation, thermodynamic constraints, and uncertainty quantification.