top of page
Search

Poster #90 - Junyoung Kim

  • vitod24
  • Oct 20
  • 2 min read

Evaluating LLM Agents for Insurance Coverage Workflow Automation/Translational Bioinformatics


, MA, Department of Pediatrics, Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Youssef Mokssit, MS, Department of Pediatrics, Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Mengshu Nie, MA, Department of Pediatrics, Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Cong Liu, PhD, Department of Pediatrics, Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA


Genetic testing is essential for diagnosing rare diseases, guiding therapies, and assessing patient risk, yet insurance policies remain difficult to navigate. Policies vary widely by payer and state, use complex terminology, and are frequently updated, creating administrative burdens and frequent claim rejections. While Large Language Models (LLMs) have transformed biomedical research and clinical decision-making, their application to insurance workflows is still emerging. Web-enabled LLM agents offer the potential to retrieve real-time information and automate form completion, streamlining access to coverage. In this study, we evaluate these agents with a focus on three core objectives. First, we assessed the accuracy of retrieving relevant information and policy documents, including identifying in-network insurance payers associated with a selected vendor (i.e., GeneDx) and retrieving their corresponding policy documents. GPT-4o-web-preview achieved a recall rate of 44.1% for in-network payers, compared to 2.6% with Perplexity. Second, we evaluated the ability of LLM agents to apply insurance policy criteria by answering nine standardized coverage questions spanning age requirements, medical necessity, and CPT codes. Using 789 curated policies across 106 synthetic patient cases for four representative genetic tests (WES, WGS, BRCA1/2, CMA) and three major payers (BCBS Federal Employee Program, Cigna, UnitedHealthcare), Internal-QA (RAG-based) agents achieved policy match rates up to 39.6% with OpenAI compared to 34.0% with Perplexity, and QnA accuracies of 71.5% with OpenAI versus 61.3% with Perplexity. Lastly, we examined the automated completion of the pre-authorization form using Connecticut Medicaid. We evaluated submission validity, field-level accuracy, and feedback effectiveness under a multi-agent setup. A baseline agent achieved 80.9% field-level accuracy, whereas introducing an "LLM-as-denier" critique agent reduced performance by 61.1%. This work represents a foundational effort to scale insurance policy reasoning and administrative automation in genomics/genetic services using LLM agents. Our study contributes to the advancement of the role of LLMs in clinical practice.

 
 
 

Recent Posts

See All
Poster #9 - Yuheng Du

Cell-Type-Resolved Placental Epigenomics Identifies Clinically Distinct Subtypes of Preeclampsia Yuheng Du, Ph.D. Student, Department of Computational Medicine and Bioinformatics, University of Michig

 
 
 
Poster #15 - Jiayi Xin

Interpretable Multimodal Interaction-aware Mixture-of-Experts Jiayi Xin, BS, PhD Student, University of Pennsylvania, PA, USA Sukwon Yun, MS, PhD Student, University of North Carolina at Chapel Hil

 
 
 
Poster #14 - Aditya Shah

Tumor subtype and clinical factors mediate the impact of tumor PPARɣ expression on outcomes in patients with primary breast cancer. Aditya Shah1,2, Katie Liu1,3, Ryan Liu1, 4, Gautham Ramshankar1, Cur

 
 
 

Comments


bottom of page