Background: The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future.

Results: We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios.

Conclusions: eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2885371PMC
http://dx.doi.org/10.1186/1471-2105-11-240DOI Listing

Publication Analysis

Top Keywords

genomic analysis
8
protein homology
8
genome alignments
8
system
6
ehive
4
ehive artificial
4
artificial intelligence
4
intelligence workflow
4
workflow system
4
system genomic
4

Similar Publications

Venous Endothelial Cell Transcriptomic Profiling Implicates METAP1 in Preeclampsia.

Circ Res

December 2024

Cardiovascular Research Center, Massachusetts General Hospital, Boston. (C.C., P.X., Z.Y., Y.S., E.S.L., J.D.R., M.C.H.).

Background: Preeclampsia is a hypertensive disorder of pregnancy characterized by systemic endothelial dysfunction. The pathophysiology of preeclampsia remains incompletely understood. This study used human venous endothelial cell (EC) transcriptional profiling to investigate potential novel mechanisms underlying EC dysfunction in preeclampsia.

View Article and Find Full Text PDF

Risk factors for isolated congenital heart defects in infants from Western Mexico.

Congenit Anom (Kyoto)

December 2024

Center for Registry and Research in Congenital Anomalies (CRIAC), Service of Genetics and Cytogenetics Unit, Pediatrics Division, "Dr. Juan I. Menchaca" Civil Hospital of Guadalajara, Guadalajara, Jalisco, Mexico.

Congenital heart defects (CHDs) are caused by a complex interaction between numerous genetic and environmental risk factors, some of which may differ between different populations. A case-control study was conducted among 1232 newborns, including 308 patients with isolated CHDs (cases) and 924 infants without birth defects (controls), born all during the period 2009-2023 at the Hospital Civil de Guadalajara "Dr. Juan I.

View Article and Find Full Text PDF

Genetic improvement of low-lignin poplars: a new strategy based on molecular recognition, chemical reactions and empirical breeding.

Physiol Plant

December 2024

Laboratory of Tumor Targeted and Immune Therapy, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center for Biotherapy, Chengdu, China.

As an important source of pollution in the papermaking process, the presence of lignin in poplar can seriously affect the quality and process of pulping. During lignin synthesis, Caffeoyl-CoA-O methyltransferase (CCoAOMT), as a specialized catalytic transferase, can effectively regulate the methylation of caffeoyl-coenzyme A (CCoA) to feruloyl-coenzyme A. Targeting CCoAOMT, this study investigated the substrate recognition mechanism and the possible reaction mechanism, the key residues of lignin binding were mutated and the lignin content was validated by deep convolutional neural-network model based on genome-wide prediction (DCNGP).

View Article and Find Full Text PDF

Purpose: This study aimed to conduct a comprehensive genetic analysis of patients with Langerhans cell histiocytosis (LCH), focusing on the frequency of MAPK pathway mutations, detailed mutation profiles of MAPK pathway genes, and their correlation with clinical features and prognosis in Korean LCH patients.

Materials And Methods: We performed targeted next-generation sequencing, capable of capturing exons from 382 cancer-related genes, on genomic DNA extracted from formaldehyde-fixed and paraffin-embedded samples of 45 pathologically confirmed LCH patients.

Results: The majority of patients (91.

View Article and Find Full Text PDF

Dysfunctional copper homeostasis in affects genomic and neuronal stability.

Redox Biochem Chem

December 2024

Food Chemistry with Focus on Toxicology, Faculty of Mathematics and Natural Sciences, University of Wuppertal, Germany.

While copper (Cu) is an essential trace element for biological systems due to its redox properties, excess levels may lead to adverse effects partly due to overproduction of reactive species. Thus, a tightly regulated Cu homeostasis is crucial for health. Cu dyshomeostasis and elevated labile Cu levels are associated with oxidative stress and neurodegenerative disorders, but the underlying mechanisms have yet to be fully characterized.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!