Manju Anandakrishnan1, Karen Ross2, Chuming Chen1, Vijay Shanker 1, Julie Cowart1, Cathy H. Wu1,2 1 Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA 2 Georgetown University Medical Center, Washington, D.C, USA
Poster # 9
Aberrant protein kinase regulation leading to abnormal substrate phosphorylation is associated with several human diseases. Despite the promise of therapies targeting kinases, many human kinases remain understudied. Most existing computational tools predicting phosphorylation cover less than 50% of known human kinases. They utilize local feature selection based on protein sequences, motifs, domains, structures, and/or functions, and fail to consider heterogeneous relationships of the proteins. We present KSFinder, a tool that predicts kinase-substrate links by capturing the inherent association of proteins in a network comprising 85% of the known human kinases, and use KSFinder substrate predictions to postulate the potential role of understudied kinases. KSFinder learns the semantic relationships in a phosphoproteome knowledge graph and represents the proteins as low-dimensional vectors. A binary classifier trains on the embedded vectors to discern kinase-substrate links. KSFinder uses a strategic negative generation approach eliminating biases in entity representation and combining data from experimentally validated non-interacting protein pairs, different subcellular location proteins, and random sampling. We assess KSFinder on four different datasets and compare its performance with other state-of-the-art prediction models. We predict substrates of 68 "dark" kinases considered understudied by the Illuminating the Druggable Genome program, using KSFinder and search literature for evidence of the predictions using our text-mining tool, RLIMS-P, and manual curation. In a case study, we performed functional enrichment analysis of HIPK3 and CAMKK1 (dark kinases) using their predicted substrates. KSFinder shows improved performance over other kinase-substrate prediction models and exhibits generalized prediction ability on different datasets. We identified literature evidence for 17 novel predictions of KSFinder. The enriched terms of HIPK3 substrates include the regulation of extracellular matrix and epigenetic gene expression, while CAMKK1 substrates include lipid storage regulation and glucose homeostasis. Overall, we demonstrate KSFinder's utility in identifying potential targets and understanding the biological role of understudied kinases.