top of page
  • vitod1

Large language models for phenotype concept recognition from clinical notes on genetic diseases

Jingye Yang, PhD, UPenn & CHOP Cong Liu, PhD, Columbia University Wendy Deng, UPenn Da Wu, PhD, UPenn Chunhua Weng, PhD, Columbia University Yunyun Zhou, PhD, Fox Chase Cancer Center Kai Wang, PhD, UPenn & CHOP

Poster # 50

Despite the advancements in Natural Language Processing (NLP) to identify Human Phenotype Ontology (HPO) from clinical notes, current methods have limitations. Predominantly built on heuristic sets or rules, these tools sometimes fall short in capturing nuanced clinical phenotypes not well-represented in the HPO. In the poster, to bridge this gap, we will introduce and assess two novel transformer-based models: PhenoBCBERT, leveraging the Bio+Clinical BERT, and PhenoGPT, adaptable to a variety of GPT models, both open and closed-source. A comparative analysis against the recent PhenoTagger will be presented to showcase our models' superior ability to detect even uncharacterized phenotype concepts. We will cover detailed evaluations of our models on biomedical texts to show the potential for recognizing new phenotype data. Further, we will delve into the comparative strengths of BERT and GPT models in phenotype tagging, considering factors such as architecture, efficiency, accuracy, and data protection. Our findings underscore the promising capabilities of PhenoBCBERT and PhenoGPT, paving the way for automation in phenotype term detection and unlocking new perspectives on genetic diseases.

LIGHTNING TALK - 2023 MidAtlantic Bioinformatics Conference

5 views0 comments

Recent Posts

See All


bottom of page