E2EGAT: Interpretable graph attention networks for pathological stage prediction of prostate cancer
Wenkang Zhan, PhD Student, Temple University. Chen Song, PhD Student, Temple University. Supratim Das, Student, Indian Institution of Science Education and Research - Pune. Timothy R. Rebbeck, Harvard T.H. Chan School of Public Health, Dana Farber Cancer Institute. Xinghua Shi, PhD, Temple University.
Poster # 78
Prostate cancer is one of the deadliest cancers worldwide. An accurate prediction of pathological stages using the expressions of genes is effective for clinical assessment and treatment. However, there are two major barriers in pathological stage prediction using gene expression data.One is that complex interactions of genes are yet fully analyzed because identification of interactions using biological procedure is time consuming and prohibitively expensive. The second barrier comes from the transparency and trustworthiness of the predicting mechanism of the machine learning model which is expected to match the workflow of biomedical process. Thus, there is an increasing need to build efficient and feasible computation methods with the capacity to capture the internal interactions among genes for pathological stage prediction while providing interpretations on the prediction results.
In this study, we propose an interpretable graph neural network (E2EGAT) which can be trained in an end-to-end manner to identify genes whose expression profiles are important to predict pathological stages of prostate cancer. Firstly, aiming at constructing graph representations of gene expressions, we measure the co-effect with dot-product of expressions of any pair of genes and develop an adaptive threshold to build the adjacency matrix by which strongly interacted genes are connected as edges of graphs. Secondly, with the constructed graphs, we introduce a multi-head attention mechanism into E2EGAT so that gene-gene interactions can be captured. Thirdly, a co-training strategy is employed to optimize the graph construction and the prediction of pathological stages in an end-to-end manner. In this way, E2EGAT can construct the optimal graph representations of gene expressions which improve downstream pathological stage prediction. Finally, to ensure the transparency of E2EGAT, novel interpretive strategies are employed to help identify marker genes and their contribution to the prediction of pathological stages in prostate cancer.Results on the Cancer Genome Atlas (TCGA) data show that E2EGAT achieves state-of-the-art performance including accuracy, precision, recall and F1-score, compared with alternative methods for predicting pathological stages of prostate cancer. Besides, with integrated gradient saliency, attention mechanism and perturbation techniques, E2EGAT successfully identifies markers genes driving the pathological stage of prostate cancer, whose expressions are found to be associated with the cell growth of prostate cancer.