A high-quality genome annotation greatly facilitates successful cell line engineering. Standard draft genome annotation pipelines are based largely on de novo gene prediction, homology, and RNA-Seq data. However, draft annotations can suffer from incorrect predictions of translated sequence, inaccurate splice isoforms, and missing genes. Here, we generated a draft annotation for the newly assembled Chinese hamster genome and used RNA-Seq, proteomics, and Ribo-Seq to experimentally annotate the genome. We identified 3529 new proteins compared to the hamster RefSeq protein annotation and 2256 novel translational events (e.g., alternative splices, mutations, and novel splices). Finally, we used this pipeline to identify the source of translated retroviruses contaminating recombinant products from Chinese hamster ovary (CHO) cell lines, including 119 type-C retroviruses, thus enabling future efforts to eliminate retroviruses to reduce the costs incurred with retroviral particle clearance. In summary, the improved annotation provides a more accurate resource for CHO cell line engineering, by facilitating the interpretation of omics data, defining of cellular pathways, and engineering of complex phenotypes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6571120PMC
http://dx.doi.org/10.1021/acs.jproteome.8b00935DOI Listing

Publication Analysis

Top Keywords

genome annotation
8
cell engineering
8
chinese hamster
8
cho cell
8
annotation
5
proteogenomic annotation
4
annotation chinese
4
chinese hamsters
4
hamsters reveals
4
reveals extensive
4

Similar Publications

Long-read sequencing has emerged as a transformative technology in recent years, offering significant potential for the molecular diagnosis of unresolved genetic disorders. Despite its promise, the comprehensive detection and clinical annotation of genomic variants remain intricate and technically demanding. We present SUMMER, an integrated and structured workflow specifically designed to process raw Nanopore sequencing reads.

View Article and Find Full Text PDF

Graphical Model Selection to Infer the Partial Correlation Network of Allelic Effects in Genomic Prediction With an Application in Dairy Cattle.

J Anim Breed Genet

January 2025

Departamento de Ciencias Agrícolas y Pecuarias, Universidad Francisco de Paula Santander, Cúcuta, Colombia.

We addressed genomic prediction accounting for partial correlation of marker effects, which entails the estimation of the partial correlation network/graph (PCN) and the precision matrix of an unobservable m-dimensional random variable. To this end, we developed a set of statistical models and methods by extending the canonical model selection problem in Gaussian concentration, and directed acyclic graph models. Our frequentist formulations combined existing methods with the EM algorithm and were termed Glasso-EM, Concord-EM and CSCS-EM, whereas our Bayesian formulations corresponded to hierarchical models termed Bayes G-Sel and Bayes DAG-Sel.

View Article and Find Full Text PDF

MjCyc: Rediscovering the pathway-genome landscape of the first sequenced archaeon, .

iScience

January 2025

Biological Computation & Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica, Greece.

The genome of () DSM 2661 was the first Archaeal genome to be sequenced in 1996. Subsequent sequence-based annotation cycles led to its first metabolic reconstruction in 2005. Leveraging new experimental results and function assignments, we have now re-annotated creating an updated resource with novel information and testable predictions in a pathway-genome database available at BioCyc.

View Article and Find Full Text PDF

Backfat thickness (BFT) and feed conversion ratio (FCR) are important commercial traits in the pig industry. With the increasing demand for human health and meat production, identifying functional genomic regions and genes associated with these commercial traits is critical for enhancing production efficiency. In this research, we conducted a genome-wide association study (GWAS) on a Landrace population comprising 4,295 individuals with chip data for BFT and FCR.

View Article and Find Full Text PDF

Background And Objective: Accurate identification and prioritization of driver-mutations in cancer is critical for effective patient management. Despite the presence of numerous bioinformatic algorithms for estimating mutation pathogenicity, there is significant variation in their assessments. This inconsistency is evident even for well-established cancer driver mutations.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!