top of page
Search

Poster #44 - Ellyse Lai

  • vitod24
  • Oct 20
  • 2 min read

Harnessing Longitudinal Claims Data to Reveal Predictive Signals for Clinical Event Forecasting


Lai, E., Chen, J., & Dubrawski, A. Ellyse Lai, Student/Research Intern, CMU Auton Lab Jieshi Chen, Researcher, CMU Auton Lab Artur Dubrawski, Professor, CMU Auton Lab


Introduction: Large administrative health datasets are a powerful resource for public health surveillance, but their utility is often hindered by recording complexities, such as change of diagnostic coding systems. This creates artificial discontinuities that may confound traditional time-series analysis. We use a data harmonization and feature engineering pipeline that overcomes these challenges to create consistent, temporally stable signals for predictive modeling, targeting the severe medical outcome of amputation to explore a possible correlation between drug abuse and amputation. Methods: We used nine years of California non-public patient discharge claims data, encompassing the ICD-9 to ICD-10 diagnosis coding transition period. The data includes diagnosis, discharge date, and other demographic details. We developed a code mapping to harmonize diagnosis codes and flag key clinical concepts, and aggregated the data into a time series of claim counts. We created 182-day rolling-window features to capture trends and ensure there were no artificial breaks in the data. The predictive validity of these features was then tested by training an XGBoost model to forecast future amputation events. Results: Our data harmonization successfully created continuous and stable time-series features that were free from artifacts of the coding system change. The resulting features demonstrated predictive power, with models forecasting population-level amputation trends 182 days in the future with high predictive utility (R² > 0.80). Stimulant misuse emerged as a primary leading indicator, validating that our data processing successfully uncovered clinically relevant signals. Conclusion: This work reveals the possible ability of aggregated complex, longitudinal healthcare data to support robust predictive modeling. By focusing on the creation of temporally consistent features, we can effectively track and forecast significant clinical events for public health surveillance. Future work using this rich dataset will dig deeper into other possible effects of the significant increase of drug abuse, both on the population and individual level.

 
 
 

Recent Posts

See All
Poster #9 - Yuheng Du

Cell-Type-Resolved Placental Epigenomics Identifies Clinically Distinct Subtypes of Preeclampsia Yuheng Du, Ph.D. Student, Department of Computational Medicine and Bioinformatics, University of Michig

 
 
 
Poster #15 - Jiayi Xin

Interpretable Multimodal Interaction-aware Mixture-of-Experts Jiayi Xin, BS, PhD Student, University of Pennsylvania, PA, USA Sukwon Yun, MS, PhD Student, University of North Carolina at Chapel Hil

 
 
 
Poster #14 - Aditya Shah

Tumor subtype and clinical factors mediate the impact of tumor PPARɣ expression on outcomes in patients with primary breast cancer. Aditya Shah1,2, Katie Liu1,3, Ryan Liu1, 4, Gautham Ramshankar1, Cur

 
 
 

Comments


bottom of page