Maryam Daniali, Peter D. Galer, David Lewis-Smith, Shridhar Parthasarathy, Edward Kim, Dario D. Salvucci, Jeffrey M. Miller, Scott Haag, Ingo Helbig
Poster # 83
Phenotype representation is a crucial task in artificial intelligence in medicine, as it allows efficient analysis of large-scale clinical data for downstream tasks such as similarity analysis and disease trajectory prediction. However, existing methods for measuring phenotypic similarity have limitations in capturing the complex relationships between phenotypes. In this work, we present a novel approach to phenotype representation by incorporating phenotypic frequencies based on >53 million full-text health care notes from >1.5 million individuals. We use the Human Phenotype Ontology (HPO) as a standardized vocabulary of clinical phenotypic terms and their relationships. We apply graph embedding techniques to map the HPO terms into a low-dimensional vector space, where phenotypes with similar roles and contexts are closer together. We further propose a frequency-based weighting mechanism to enhance graph embedding by prioritizing rare phenotypes over common ones as we hypothesize their effectiveness in disease diagnosis. We evaluate our proposed method by comparing it with existing techniques for measuring phenotypic similarity, such as the Resnik score [Resnik, 1995] and HPO2Vec [Shen et al., 2019]. Our results demonstrate that frequency-based phenotype embeddings can capture phenotypic similarities that surpass current computational models and exhibit a high degree of agreement with domain experts' judgment. Furthermore, our proposed enables method efficient representation of phenotypes for downstream tasks that require deep phenotyping in a dynamic setting, such as patient similarity analysis. This is done by providing a scalable and robust framework for phenotype representation that mitigates orders of magnitude of computation costs. For future work, we are interested in utilizing phenotype embeddings as input to measure patients' similarities over time for individuals with a specific gene variant. We are also interested in providing a shared gene-phenotype embedding space for the potential of finding correlations, causation, and new gene discovery.