Essential gene prediction in using machine learning approaches based on sequence and functional features.

Comput Struct Biotechnol J

Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany.

Published: March 2020

Genes are termed to be essential if their loss of function compromises viability or results in profound loss of fitness. On the genome scale, these genes can be determined experimentally employing RNAi or knockout screens, but this is very resource intensive. Computational methods for essential gene prediction can overcome this drawback, particularly when intrinsic (e.g. from the protein sequence) as well as extrinsic features (e.g. from transcription profiles) are considered. In this work, we employed machine learning to predict essential genes in . A total of 27,340 features were generated based on a large variety of different aspects comprising nucleotide and protein sequences, gene networks, protein-protein interactions, evolutionary conservation and functional annotations. Employing cross-validation, we obtained an excellent prediction performance. The best model achieved in . a ROC-AUC of 0.90, a PR-AUC of 0.30 and a F1 score of 0.34. Our approach considerably outperformed a benchmark method in which only features derived from the protein sequences were used (P < 0.001). Investigating which features contributed to this success, we found all categories of features, most prominently network topological, functional and sequence-based features. To evaluate our approach we performed the same workflow for essential gene prediction in human and achieved an ROC-AUC = 0.97, PR-AUC = 0.73, and F1 = 0.64. In summary, this study shows that using our well-elaborated assembly of features covering a broad range of intrinsic and extrinsic gene and protein features enabled intelligent systems to predict well the essentiality of genes in an organism.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7096750PMC
http://dx.doi.org/10.1016/j.csbj.2020.02.022DOI Listing

Publication Analysis

Top Keywords

essential gene
8
gene prediction
8
machine learning
8
protein sequences
8
essential
4
prediction machine
4
learning approaches
4
approaches based
4
based sequence
4
sequence functional
4

Similar Publications

Gammaherpesviruses are oncogenic pathogens that establish lifelong infections. There are no FDA-approved vaccines against Epstein-Barr virus or Kaposi sarcoma herpesvirus. Murine gammaherpesvirus-68 (MHV68) infection of mice provides a system for investigating gammaherpesvirus pathogenesis and testing vaccine strategies.

View Article and Find Full Text PDF

Epstein-Barr virus (EBV) and Kaposi's sarcoma-associated herpesvirus (KSHV), which are the only members of the gamma(γ) herpesviruses, are oncogenic viruses that significantly contribute to the development of various human cancers, such as Burkitt's lymphoma, nasopharyngeal carcinoma, Hodgkin's lymphoma, Kaposi's sarcoma, and primary effusion lymphoma. Oncogenesis triggered by γ-herpesviruses involves complex interactions between viral genetics, host cellular mechanisms, and immune evasion strategies. At the genetic level, crucial viral oncogenes participate in the disruption of cell signaling, leading to uncontrolled proliferation and inhibition of apoptosis.

View Article and Find Full Text PDF

During virus infection, the activation of the antiviral endoribonuclease, ribonuclease L (RNase L), by a unique ligand 2'-5'-oilgoadenylate (2-5A) causes the cleavage of single-stranded viral and cellular RNA targets, restricting protein synthesis, activating stress response pathways, and promoting cell death to establish broad antiviral effects. The immunostimulatory dsRNA cleavage products of RNase L activity (RL RNAs) recruit diverse dsRNA sensors to activate signaling pathways to amplify interferon (IFN) production and activate inflammasome, but the sensors that promote cell death are not known. In this study, we found that DEAH-box polypeptide 15 (DHX15) and retinoic acid-inducible gene I (Rig-I) are essential for apoptosis induced by RL RNAs and require mitochondrial antiviral signaling (MAVS), c-Jun amino terminal kinase (JNK), and p38 mitogen-activated protein kinase (p38 MAPK) for caspase-3-mediated intrinsic apoptosis.

View Article and Find Full Text PDF

Successful pollination and fertilization are crucial for grain setting in cereals. Wheat is an allohexaploid autogamous species. Due to its evolutionary history, the genetic diversity of current bread wheat () cultivars is limited.

View Article and Find Full Text PDF

The gene family plays a crucial role in plant growth, development, and responses to biotic and abiotic stresses. , a warm-season turfgrass with exceptional salt tolerance, can be irrigated with seawater. However, the gene family in seashore paspalum remains poorly understood.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!