MULTIMODAL MACHINE LEARNING COMBINING CLINICAL TEXTS AND FACIAL IMAGES IMPROVE DIAGNOSIS OF RARE ...

vitod1
Oct 4, 2023
2 min read

Da Wu1, Jingye Yang1, Kai Wang1 Detailed Affiliations 1Children's Hospital of Philadelphia, Philadelphia, PA, USA.

Poster # 28

Backgrounds Rare diseases are individually rare, but collectively common: there are more than 7,000 known rare diseases that affect about 1 in 10 people (or 30 million people) in the United States. Many of these disorders exhibit unique dysmorphic facial features that can provide valuable clues for recognizing a syndrome. So far, all the efforts have been devoted to using CNN-based image models to predict rare genetic disorders. Nonetheless, in numerous instances, facial images alone do not offer adequate information to achieve a precise diagnosis. Traits such as sleep disturbances, impaired balance, intellectual disability, etc. cannot be effectively captured by facial images alone. Recent advancements in multimodal machine learning (MML), enabled by the Transformer architecture, offer promising opportunities to leverage different data modalities and enhance predictive capabilities. In light of these breakthroughs, we propose a novel method, GestaltMML, utilizing a cutting-edge Transformer-based multimodal machine learning model (ViLT) capable of integrating all the frontal facial images, and clinical textual data together with patients' demographic information, to predict rare genetic disorders. As a byproduct of feature importance investigation, we also proposed a GPT-based model, GestaltGPT, to predict rare genetic disorders based on patients' demographic information and clinical texts data. Methods In our study, we conducted fine-tuning of the ViLT model using the GestaltMatcher Database (GMDB). This database comprises 7459 medical images, predominantly facial photos, of rare disorders obtained from publications and patients who provided appropriate consent through clinics. Additionally, the GMDB includes textual data containing Human Phenotype Ontology (HPO) terms and demographic information of patients. For GestaltGPT, we concatenate the training texts data in GestlatMML and disease name together and further fine-tune them on the recent Falcon-7B. Results By fine-tuning the model on the GMDB datasets, we were able to achieve an accuracy of ~80% in predicting relatively frequent disorders (those with more than 6 patients available), outperforming all the current image models. We anticipate that our model's accuracy will further improve as we incorporate additional training sets, such as image-text pairs from PubMed Central's article dataset. Conclusion Our study demonstrates the significant potential of Transformer-based multimodal learning models in predicting rare genetic disorders through the integration of demographic information, phenotypic textual data and facial images. Moving forward, we aim to extend the application of Transformer-based models to incorporate additional data modalities, such as video recordings, for the early detection of rare disorders. We believe that our multimodal biomedical AI (artificial intelligence) approach can be further generalized to address various other challenges in the biomedical domain, when multiple types of biomedical data modalities are available.

LIGHTNING TALK - 2023 MidAtlantic Bioinformatics Conference

MidAtlantic Bioinformatics Conference

Friday November 7, 2025

MULTIMODAL MACHINE LEARNING COMBINING CLINICAL TEXTS AND FACIAL IMAGES IMPROVE DIAGNOSIS OF RARE ...

Recent Posts

Comentários