Poster #21 - T M Rubaith Bashar

vitod24
Oct 20, 2025
2 min read

Relationship Classification in SNOMED CT Using Domain-Specific BERT Models

T M Rubaith Bashar¹, James Geller² ¹ PhD Student, Department of Data Science, New Jersey Institute of Technology ² PhD, Professor and Chair, Department of Data Science, New Jersey Institute of Technology

SNOMED CT is an important, large, and constantly evolving medical ontology organized around a good number of relationships among the medical concepts. Keeping its relationship correct is onerous and time-consuming. We utilized multiple domain-specific BERT models to classify relations between the SNOMED CT concept pairs. We utilize BioBERT, PubMedBERT, ClinicalBERT, BioLinkBERT, and SapBERT to embed the pair of concept names, train a lightweight neural network on the concatenated embeddings, which ultimately classifies the relationship among the pair of concepts. For our study, we created a dataset from SNOMED CT (US Edition, Sept 2025), covering 101 relationship types. Data was split stratifiedly into train (80%), validation (10%), and test (10%), resulting in over 1 million pairs for training and around 130 thousand each for validation and testing. We focused on five biomedical variants as previously mentioned. Each encoder captures different aspects of biomedical or clinical text. To support prediction across more than 100 relationship types, we placed a lightweight artificial neural network (ANN) with two hidden layers with ReLU activation, trained with Adam optimizer as a classification head on top of each expert. This ANN acts as a translator, converting BERT embeddings into class-level predictions. Among the five biomedical BERT models, BioLinkBERT achieved the highest performance with an accuracy of 0.89, macro-F1 of 0.81, and weighted-F1 of 0.89. PubMedBERT closely followed, with accuracy 0.89 and macro-F1 0.80. SapBERT performed moderately well (accuracy 0.87, macro-F1 0.79), while BioBERT (accuracy 0.86, macro-F1 0.76) and ClinicalBERT (accuracy 0.86, macro-F1 0.78) demonstrated competitive but comparatively lower results. Overall, Overall, BioLinkBERT and PubMedBERT consistently outperformed other models in both validation and test sets. Our work represents one of the first evaluations of multiple biomedical BERT models across more than 100 SNOMED CT relationship types, offering valuable insights into their comparative strengths for ontology-based relation classification.

MidAtlantic Bioinformatics Conference

Friday November 7, 2025

Poster #21 - T M Rubaith Bashar

Recent Posts

Comments