Workshop paper

Dynamic Fusion for a Multimodal Foundation Model for Materials

Abstract

Recent advances in the field of AI and machine learning have revolutionized applications in material science. The rapid advancement has resulted several large scale foundation models trained on data across various modalities and domains. Multi-modal learning and fusion approaches attempt to adeptly capture these representations from different modalities to obtain richer insights compared to unimodal approaches. However, traditional multi-modal fusion techniques fail to dynamically adjust modality importance and often lead to suboptimal performance due to redundancy or missing modalities. In this work, we propose a Dynamic Multi-Modal Fusion approach, where a learnable gating mechanism assigns importance weights to different modalities dynamically, ensuring that complementary modalities contribute meaningfully. Our preliminary evaluations on Moleculenet dataset demonstrate that the proposed method improves multi-modal fusion efficiency, enhances robustness to missing data, and leads to superior performance on downstream tasks for property prediction.

Related