Poster #15 - Jiayi Xin
- vitod24
- 6 days ago
- 1 min read
Interpretable Multimodal Interaction-aware Mixture-of-Experts
Jiayi Xin, BS, PhD Student, University of Pennsylvania, PA, USA Sukwon Yun, MS, PhD Student, University of North Carolina at Chapel Hill, NC, USA Jie Peng, BS, PhD Student, University of Science and Technology of China, Anhui, China Inyoung Choi, BS, PhD Student, University of Pennsylvania, PA, USA Jenna L. Ballard, BS, PhD Student, University of Pennsylvania, PA, USA Tianlong Chen, PhD, University of North Carolina at Chapel Hill, NC, USA Qi Long, PhD, University of Pennsylvania, PA, USA
Modern artificial intelligence systems frequently draw on multiple data sources, such as medical images, laboratory results, and electronic health records, to support critical decisions in domains like healthcare. However, existing fusion methods tend to operate as "black boxes," producing a single output without revealing how different pieces of information interact or which modalities drive the final prediction. To this end, we propose I2MoE (Interpretable Multimodal Interaction-aware Mixture-of-Experts), an end-to-end MoE framework designed to enhance modality fusion by explicitly modeling diverse multimodal interactions, as well as providing interpretation on a local and global level. First, I2MoE utilizes different interaction experts with weakly supervised interaction losses to learn multimodal interactions in a data-driven way. Second, I2MoE deploys a reweighting model that assigns importance scores for the output of each interaction expert, which offers sample-level and dataset-level interpretation. We evaluate I2MoE on both biomedical and general‐purpose multimodal benchmarks. Our results demonstrate that I2MoE can be seamlessly combined with different fusion backbones, consistently improves predictive performance, and makes the fusion process transparent, helping researchers and practitioners understand and trust AI-driven decisions.


Comments