Benedikt Blumenstiel, Johannes Jakubik, et al.
NeurIPS 2023
Accurate multi-modal document retrieval iscrucial for Retrieval-Augmented Generation(RAG), yet existing benchmarks do not fullycapture real-world challenges with their currentdesign. We introduce REAL-MM-RAG, an au-tomatically generated benchmark designed toaddress four key properties essential for real-world retrieval: (i) multi-modal documents, (ii)enhanced difficulty, (iii) Realistic-RAG queriesand (iv) accurate labeling. Additionally, wepropose a multi-difficulty-level scheme basedon query rephrasing to evaluate models’ seman-tic understanding beyond keyword matching.Our benchmark reveals significant model weak-nesses, particularly in handling table-heavydocuments and robustness to query rephras-ing. To mitigate these shortcomings, we cu-rate a rephrased training set and introduce anew finance-focused, table-heavy dataset. Fine-tuning on these datasets enables models toachieve state-of-the-art retrieval performanceon REAL-MM-RAG benchmark. Our workoffers a better way to evaluate and improve re-trieval in multi-modal RAG systems while alsoproviding training data and models that addresscurrent limitations. Our benchmark is availableat this project page.
Benedikt Blumenstiel, Johannes Jakubik, et al.
NeurIPS 2023
Ankush Gupta, Aniya Aggarwal, et al.
ACL 2025
Guangnan Ye, Dong Liu, et al.
ICCV 2013
Ritwik Kumar, Arunava Banerjee, et al.
IEEE TPAMI