Publications by authors named "Eoghan D Harrington"

Regulation of transcript structure generates transcript diversity and plays an important role in human disease. The advent of long-read sequencing technologies offers the opportunity to study the role of genetic variation in transcript structure. In this Article, we present a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from Genotype-Tissue Expression (GTEx) tissues and cell lines, complementing the GTEx resource.

View Article and Find Full Text PDF

Background: The circum-basmati group of cultivated Asian rice (Oryza sativa) contains many iconic varieties and is widespread in the Indian subcontinent. Despite its economic and cultural importance, a high-quality reference genome is currently lacking, and the group's evolutionary history is not fully resolved. To address these gaps, we use long-read nanopore sequencing and assemble the genomes of two circum-basmati rice varieties.

View Article and Find Full Text PDF

The MinION sequencer has made in situ sequencing feasible in remote locations. Following our initial demonstration of its high performance off planet with Earth-prepared samples, we developed and tested an end-to-end, sample-to-sequencer process that could be conducted entirely aboard the International Space Station (ISS). Initial experiments demonstrated the process with a microbial mock community standard.

View Article and Find Full Text PDF

Segmented filamentous bacteria (SFB) are host-specific intestinal symbionts that comprise a distinct clade within the Clostridiaceae, designated Candidatus Arthromitus. SFB display a unique life cycle within the host, involving differentiation into multiple cell types. The latter include filaments that attach intimately to intestinal epithelial cells, and from which "holdfasts" and spores develop.

View Article and Find Full Text PDF

Summary: Recent advances in single-cell manipulation technology, whole genome amplification and high-throughput sequencing have now made it possible to sequence the genome of an individual cell. The bioinformatic analysis of these genomes, however, is far more complicated than the analysis of those generated using traditional, culture-based methods. In order to simplify this analysis, we have developed SmashCell (Simple Metagenomics Analysis SHell-for sequences from single Cells).

View Article and Find Full Text PDF

Summary: SmashCommunity is a stand-alone metagenomic annotation and analysis pipeline suitable for data from Sanger and 454 sequencing technologies. It supports state-of-the-art software for essential metagenomic tasks such as assembly and gene prediction. It provides tools to estimate the quantitative phylogenetic and functional compositions of metagenomes, to compare compositions of multiple metagenomes and to produce intuitive visual representations of such analyses.

View Article and Find Full Text PDF

Cis-acting short sequence motifs play important roles in alternative splicing. It is now possible to identify such sequence motifs as conserved sequence patterns in genome sequence alignments. Here, we report the systematic search for motifs in the neighboring introns of alternatively spliced exons by using comparative analysis of mammalian genome alignments.

View Article and Find Full Text PDF

Unlabelled: Sircah is a flexible tool for the detection, analysis and visualization of alternative transcripts. It takes as input gene models or spliced alignments and creates a database of alternative transcription events: alternative transcription initiation and polyadenylation, alternative 3' and 5' splice-site usage, skipped exons and retained introns. The results can be visualized in a variety of ways, allowing the creation of publication quality images.

View Article and Find Full Text PDF

Background: Across the fully sequenced microbial genomes there are thousands of examples of overlapping genes. Many of these are only a few nucleotides long and are thought to function by permitting the coordinated regulation of gene expression. However, there should also be selective pressure against long overlaps, as the existence of overlapping reading frames increases the risk of deleterious mutations.

View Article and Find Full Text PDF

Background: Environments and their organic content are generally not static and isolated, but in a constant state of exchange and interaction with each other. Through physical or biological processes, organisms, especially microbes, may be transferred between environments whose characteristics may be quite different. The transferred microbes may not survive in their new environment, but their DNA will be deposited.

View Article and Find Full Text PDF

Continuing improvements in DNA sequencing technologies are providing us with vast amounts of genomic data from an ever-widening range of organisms. The resulting challenge for bioinformatics is to interpret this deluge of data and place it back into its biological context. Biological networks provide a conceptual framework with which we can describe part of this context, namely the different interactions that occur between the molecular components of a cell.

View Article and Find Full Text PDF

Given that the number of protein functions on earth is finite, the rapid expansion of biological knowledge and the concomitant exponential increase in the number of protein sequences should, at some point, enable the estimation of the limits of protein function space. The functional coverage of protein sequences can be investigated using computational methods, especially given the massive amount of data being generated by large-scale environmental sequencing (metagenomics). In completely sequenced genomes, the fraction of proteins to which at least some functional features can be assigned has recently risen to as much as approximately 85%.

View Article and Find Full Text PDF

We suggest an annotation strategy for genes encoded by retroviruses and transposable elements (RETRA genes) based on a set of marker protein domains. Usually RETRA genes are masked in vertebrate genomes prior to the application of automated gene prediction pipelines under the assumption that they provide no selective advantage to the host. Yet, we show that about 1000 genes in four vertebrate gene sets analyzed contain at least one RETRA gene marker domain.

View Article and Find Full Text PDF

Orthologous genes that maintain a single-copy status in a broad range of species may indicate a selection against gene duplication. If this is the case, then duplicates of such genes that do survive may have escaped the dosage control by rapid and sizable changes in their function. To test this hypothesis and to develop a strategy for the identification of novel gene functions, we have analyzed 22 primate-specific intrachromosomal duplications of genes with a single-copy ortholog in all other completely sequenced metazoans.

View Article and Find Full Text PDF