top of page
Search

Poster #21 - T M Rubaith Bashar

  • vitod24
  • Oct 20
  • 2 min read

Relationship Classification in SNOMED CT Using Domain-Specific BERT Models


T M Rubaith Bashar¹, James Geller² ¹ PhD Student, Department of Data Science, New Jersey Institute of Technology ² PhD, Professor and Chair, Department of Data Science, New Jersey Institute of Technology


SNOMED CT is an important, large, and constantly evolving medical ontology organized around a good number of relationships among the medical concepts. Keeping its relationship correct is onerous and time-consuming. We utilized multiple domain-specific BERT models to classify relations between the SNOMED CT concept pairs. We utilize BioBERT, PubMedBERT, ClinicalBERT, BioLinkBERT, and SapBERT to embed the pair of concept names, train a lightweight neural network on the concatenated embeddings, which ultimately classifies the relationship among the pair of concepts. For our study, we created a dataset from SNOMED CT (US Edition, Sept 2025), covering 101 relationship types. Data was split stratifiedly into train (80%), validation (10%), and test (10%), resulting in over 1 million pairs for training and around 130 thousand each for validation and testing. We focused on five biomedical variants as previously mentioned. Each encoder captures different aspects of biomedical or clinical text. To support prediction across more than 100 relationship types, we placed a lightweight artificial neural network (ANN) with two hidden layers with ReLU activation, trained with Adam optimizer as a classification head on top of each expert. This ANN acts as a translator, converting BERT embeddings into class-level predictions. Among the five biomedical BERT models, BioLinkBERT achieved the highest performance with an accuracy of 0.89, macro-F1 of 0.81, and weighted-F1 of 0.89. PubMedBERT closely followed, with accuracy 0.89 and macro-F1 0.80. SapBERT performed moderately well (accuracy 0.87, macro-F1 0.79), while BioBERT (accuracy 0.86, macro-F1 0.76) and ClinicalBERT (accuracy 0.86, macro-F1 0.78) demonstrated competitive but comparatively lower results. Overall, Overall, BioLinkBERT and PubMedBERT consistently outperformed other models in both validation and test sets. Our work represents one of the first evaluations of multiple biomedical BERT models across more than 100 SNOMED CT relationship types, offering valuable insights into their comparative strengths for ontology-based relation classification.

 
 
 

Recent Posts

See All
Poster #9 - Yuheng Du

Cell-Type-Resolved Placental Epigenomics Identifies Clinically Distinct Subtypes of Preeclampsia Yuheng Du, Ph.D. Student, Department of Computational Medicine and Bioinformatics, University of Michig

 
 
 
Poster #15 - Jiayi Xin

Interpretable Multimodal Interaction-aware Mixture-of-Experts Jiayi Xin, BS, PhD Student, University of Pennsylvania, PA, USA Sukwon Yun, MS, PhD Student, University of North Carolina at Chapel Hil

 
 
 
Poster #14 - Aditya Shah

Tumor subtype and clinical factors mediate the impact of tumor PPARɣ expression on outcomes in patients with primary breast cancer. Aditya Shah1,2, Katie Liu1,3, Ryan Liu1, 4, Gautham Ramshankar1, Cur

 
 
 

Comments


bottom of page