Benchmark study of similarity measures from query phenotypic abnormalities to diseases based on the
Updated: Sep 29, 2022
Yu Hu, PhD 1, Joe Chan, MS 1, Kai Wang, PhD 1,2 1: Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA 2: Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Development of sequencing technologies make it possible for genotype data to be used in clinical diagnosis. However, it is still challenging for clinicians to understand the results of sequencing and make correct judgement based on them. Recent years, phenotype-based diagnosis has been improved with the establishment of the Human Phenotype Ontology (HPO) and the enrichment of phenotype-disease annotations. Here we simulated HPO terms for a groups of patients based on 30 complex diseases. We evaluated the performance of disease prediction based on 5 different similarity measures from the HPO terms to hereditary diseases and showed that they consistently achieved high accuracy (>95%) in top 3 candidate diseases. Resnik measure ranked the underlying disease in top 3 on 98.16% of the simulated dataset without noise and 96.32% of the simulated dataset with noise. Second best Jiang-Conrath measure ranked the underlying disease in top 3 on 97.37% of the simulated dataset without noise and 95.85% of the simulated dataset with noise. We also found that all 5 similarity measures provide accurate patient clustering based on simulation study. Our results not only demonstrate the feasibility of phenotype-based diagnosis using existing similarity measures from the HPO terms to hereditary diseases but also highlight necessary bioinformatics improvements for future EHR-based patient clustering tool development in clinical setting.