Poster #90 - Junyoung Kim
- vitod24
- Oct 20
- 2 min read
Evaluating LLM Agents for Insurance Coverage Workflow Automation/Translational Bioinformatics
, MA, Department of Pediatrics, Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Youssef Mokssit, MS, Department of Pediatrics, Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Mengshu Nie, MA, Department of Pediatrics, Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Cong Liu, PhD, Department of Pediatrics, Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
Genetic testing is essential for diagnosing rare diseases, guiding therapies, and assessing patient risk, yet insurance policies remain difficult to navigate. Policies vary widely by payer and state, use complex terminology, and are frequently updated, creating administrative burdens and frequent claim rejections. While Large Language Models (LLMs) have transformed biomedical research and clinical decision-making, their application to insurance workflows is still emerging. Web-enabled LLM agents offer the potential to retrieve real-time information and automate form completion, streamlining access to coverage. In this study, we evaluate these agents with a focus on three core objectives. First, we assessed the accuracy of retrieving relevant information and policy documents, including identifying in-network insurance payers associated with a selected vendor (i.e., GeneDx) and retrieving their corresponding policy documents. GPT-4o-web-preview achieved a recall rate of 44.1% for in-network payers, compared to 2.6% with Perplexity. Second, we evaluated the ability of LLM agents to apply insurance policy criteria by answering nine standardized coverage questions spanning age requirements, medical necessity, and CPT codes. Using 789 curated policies across 106 synthetic patient cases for four representative genetic tests (WES, WGS, BRCA1/2, CMA) and three major payers (BCBS Federal Employee Program, Cigna, UnitedHealthcare), Internal-QA (RAG-based) agents achieved policy match rates up to 39.6% with OpenAI compared to 34.0% with Perplexity, and QnA accuracies of 71.5% with OpenAI versus 61.3% with Perplexity. Lastly, we examined the automated completion of the pre-authorization form using Connecticut Medicaid. We evaluated submission validity, field-level accuracy, and feedback effectiveness under a multi-agent setup. A baseline agent achieved 80.9% field-level accuracy, whereas introducing an "LLM-as-denier" critique agent reduced performance by 61.1%. This work represents a foundational effort to scale insurance policy reasoning and administrative automation in genomics/genetic services using LLM agents. Our study contributes to the advancement of the role of LLMs in clinical practice.


Comments