Dzung Phan, Vinicius Lima
INFORMS 2023
In a multimodal human-machine conversation, user inputs are often abbreviated or imprecise. Sometimes, merely fusing multimodal inputs together cannot derive a complete understanding. To address these inadequacies, we are building a semantics-based multimodal interpretation framework called MIND (Multimodal Interpretation for Natural Dialog). The unique feature of MIND is the use of a variety of contexts (e.g., domain context and conversation context) to enhance multimodal fusion. In this paper we present a semantically rich modeling scheme and a context-based approach that enable MIND to gain a full understanding of user inputs, including ambiguous and incomplete ones.
Dzung Phan, Vinicius Lima
INFORMS 2023
Jehanzeb Mirza, Leonid Karlinsky, et al.
NeurIPS 2023
Hagen Soltau, Lidia Mangu, et al.
ASRU 2011
Liya Fan, Fa Zhang, et al.
JPDC