Publications by Istrail S | LitMetric

Publications by authors named "Istrail S"

Page 1 of 2

Michael Waterman's Contributions to Computational Biology and Bioinformatics.

Pavel Pevzner Martin Vingron Christian Reidys Fengzhu Sun Sorin Istrail

J Comput Biol

July 2022

On the occasion of Dr. Michael Waterman's 80th birthday, we review his major contributions to the field of computational biology and bioinformatics including the famous Smith-Waterman algorithm for sequence alignment, the probability and statistics theory related to sequence alignment, algorithms for sequence assembly, the Lander-Waterman model for genome physical mapping, combinatorics and predictions of ribonucleic acid structures, word counting statistics in molecular sequences, alignment-free sequence comparison, and algorithms for haplotype block partition and tagSNP selection related to the International HapMap Project. His books for graduate students and geared toward undergraduate students played key roles in computational biology and bioinformatics education.

View Article and Find Full Text PDF

Proteinarium: Multi-sample protein-protein interaction analysis and visualization tool.

David Armanious Jessica Schuster George A Tollefson Anthony Agudelo Andrew T DeWan

Genomics

November 2020

We posit the likely architecture of complex diseases is that subgroups of patients share variants in genes in specific networks which are sufficient to give rise to a shared phenotype. We developed Proteinarium, a multi-sample protein-protein interaction (PPI) tool, to identify clusters of patients with shared gene networks. Proteinarium converts user defined seed genes to protein symbols and maps them onto the STRING interactome.

View Article and Find Full Text PDF

Combinatorial and statistical prediction of gene expression from haplotype sequence.

Berk A Alpay Pinar Demetci Sorin Istrail Derek Aguiar

Bioinformatics

July 2020

Motivation: Genome-wide association studies (GWAS) have discovered thousands of significant genetic effects on disease phenotypes. By considering gene expression as the intermediary between genotype and disease phenotype, expression quantitative trait loci studies have interpreted many of these variants by their regulatory effects on gene expression. However, there remains a considerable gap between genotype-to-gene expression association and genotype-to-gene expression prediction.

View Article and Find Full Text PDF

Preface Special Issue: RECOMB 2018.

J Comput Biol

March 2020

View Article and Find Full Text PDF

Eric Davidson's Regulatory Genome for Computer Science: Causality, Logic, and Proof Principles of the Genomic -Regulatory Code.

J Comput Biol

July 2019

View Article and Find Full Text PDF

How Does the Regulatory Genome Work?

Sorin Istrail Isabelle S Peter

J Comput Biol

July 2019

View Article and Find Full Text PDF

Global Comparison of Drug Resistance Mutations After First-Line Antiretroviral Therapy Across Human Immunodeficiency Virus-1 Subtypes.

Austin Huang Joseph W Hogan Xi Luo Allison DeLong Shanmugam Saravanan

Open Forum Infect Dis

April 2016

Background. Human immunodeficiency virus (HIV)-1 drug resistance mutations (DRMs) often accompany treatment failure. Although subtype differences are widely studied, DRM comparisons between subtypes either focus on specific geographic regions or include populations with heterogeneous treatments.

View Article and Find Full Text PDF

Eric Davidson: Master of the universe.

Dev Biol

April 2016

View Article and Find Full Text PDF

Transcriptome of American oysters, Crassostrea virginica, in response to bacterial challenge: insights into potential mechanisms of disease resistance.

Ian C McDowell Chamilani Nikapitiya Derek Aguiar Christopher E Lane Sorin Istrail

PLoS One

March 2016

The American oyster Crassostrea virginica, an ecologically and economically important estuarine organism, can suffer high mortalities in areas in the Northeast United States due to Roseovarius Oyster Disease (ROD), caused by the gram-negative bacterial pathogen Roseovarius crassostreae. The goals of this research were to provide insights into: 1) the responses of American oysters to R. crassostreae, and 2) potential mechanisms of resistance or susceptibility to ROD.

View Article and Find Full Text PDF

Tumor haplotype assembly algorithms for cancer genomics.

Derek Aguiar Wendy S W Wong Sorin Istrail

Pac Symp Biocomput

August 2014

The growing availability of inexpensive high-throughput sequence data is enabling researchers to sequence tumor populations within a single individual at high coverage. But, cancer genome sequence evolution and mutational phenomena like driver mutations and gene fusions are difficult to investigate without first reconstructing tumor haplotype sequences. Haplotype assembly of single individual tumor populations is an exceedingly difficult task complicated by tumor haplotype heterogeneity, tumor or normal cell sequence contamination, polyploidy, and complex patterns of variation.

View Article and Find Full Text PDF

Pathway-based analysis of genomic variation data.

Nir Atias Sorin Istrail Roded Sharan

Curr Opin Genet Dev

December 2013

A holy grail of genetics is to decipher the mapping from genotype to phenotype. Recent advances in sequencing technologies allow the efficient genotyping of thousands of individuals carrying a particular phenotype in an effort to reveal its genetic determinants. However, the interpretation of these data entails tackling significant statistical and computational problems that stem from the complexity of human phenotypes and the huge genotypic search space.

View Article and Find Full Text PDF

Intellectual disability is associated with increased runs of homozygosity in simplex autism.

Ece D Gamsiz Emma W Viscidi Abbie M Frederick Shailender Nagpal Stephan J Sanders

Am J Hum Genet

July 2013

Intellectual disability (ID), often attributed to autosomal-recessive mutations, occurs in 40% of autism spectrum disorders (ASDs). For this reason, we conducted a genome-wide analysis of runs of homozygosity (ROH) in simplex ASD-affected families consisting of a proband diagnosed with ASD and at least one unaffected sibling. In these families, probands with an IQ ≤ 70 show more ROH than their unaffected siblings, whereas probands with an IQ > 70 do not show this excess.

View Article and Find Full Text PDF

Haplotype assembly in polyploid genomes and identical by descent shared tracts.

Derek Aguiar Sorin Istrail

Bioinformatics

July 2013

Motivation: Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing high-throughput sequencing data must scale favorably in terms of both accuracy and computational efficiency. Furthermore, current models and methodologies for haplotype assembly (i) do not consider individuals sharing haplotypes jointly, which reduces the size and accuracy of assembled haplotypes, and (ii) are unable to model genomes having more than two sets of homologous chromosomes (polyploidy).

View Article and Find Full Text PDF

A quantitative reference transcriptome for Nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems.

Sarah Tulin Derek Aguiar Sorin Istrail Joel Smith

Evodevo

May 2014

Background: The de novo assembly of transcriptomes from short shotgun sequences raises challenges due to random and non-random sequencing biases and inherent transcript complexity. We sought to define a pipeline for de novo transcriptome assembly to aid researchers working with emerging model systems where well annotated genome assemblies are not available as a reference. To detail this experimental and computational method, we used early embryos of the sea anemone, Nematostella vectensis, an emerging model system for studies of animal body plan evolution.

View Article and Find Full Text PDF

Pathway-based genetic analysis of preterm birth.

Alper Uzun Andrew T Dewan Sorin Istrail James F Padbury

Genomics

March 2013

Preterm birth in the United States is now 12%. Multiple genes, gene networks, and variants have been associated with this disease. Using a custom database for preterm birth (dbPTB) with a refined set of genes extensively curated from literature and biological databases, we analyzed GWAS of preterm birth for complete genotype data on nearly 2000 preterm and term mothers.

View Article and Find Full Text PDF

QColors: an algorithm for conservative viral quasispecies reconstruction from short and non-contiguous next generation sequencing reads.

Austin Huang Rami Kantor Allison DeLong Leeann Schreier Sorin Istrail

In Silico Biol

May 2013

Next generation sequencing technologies have recently been applied to characterize mutational spectra of the heterogeneous population of viral genotypes (known as a quasispecies) within HIV-infected patients. Such information is clinically relevant because minority genetic subpopulations of HIV within patients enable viral escape from selection pressures such as the immune response and antiretroviral therapy. However, methods for quasispecies sequence reconstruction from next generation sequencing reads are not yet widely used and remains an emerging area of research.

View Article and Find Full Text PDF

Global analysis of sequence diversity within HIV-1 subtypes across geographic regions.

Austin Huang Joseph W Hogan Sorin Istrail Allison Delong David A Katzenstein

Future Virol

May 2012

AIMS: HIV-1 sequence diversity can affect host immune responses and phenotypic characteristics such as antiretroviral drug resistance. Current HIV-1 sequence diversity classification uses phylogeny-based methods to identify subtypes and recombinants, which may overlook distinct subpopulations within subtypes. While local epidemic studies have characterized sequence-level clustering within subtypes using phylogeny, identification of new genotype - phenotype associations are based on mutational correlations at individual sequence positions.

View Article and Find Full Text PDF

HapCompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data.

Derek Aguiar Sorin Istrail

J Comput Biol

June 2012

Genome assembly methods produce haplotype phase ambiguous assemblies due to limitations in current sequencing technologies. Determining the haplotype phase of an individual is computationally challenging and experimentally expensive. However, haplotype phase information is crucial in many bioinformatics workflows such as genetic association studies and genomic imputation.

View Article and Find Full Text PDF

DELISHUS: an efficient and exact algorithm for genome-wide detection of deletion polymorphism in autism.

Derek Aguiar Bjarni V Halldórsson Eric M Morrow Sorin Istrail

Bioinformatics

June 2012

Motivation: The understanding of the genetic determinants of complex disease is undergoing a paradigm shift. Genetic heterogeneity of rare mutations with deleterious effects is more commonly being viewed as a major component of disease. Autism is an excellent example where research is active in identifying matches between the phenotypic and genomic heterogeneities.

View Article and Find Full Text PDF

dbPTB: a database for preterm birth.

Alper Uzun Alyse Laliberte Jeremy Parker Caroline Andrew Emily Winterrowd

Database (Oxford)

May 2012

Genome-wide association studies (GWAS) query the entire genome in a hypothesis-free, unbiased manner. Since they have the potential for identifying novel genetic variants, they have become a very popular approach to the investigation of complex diseases. Nonetheless, since the success of the GWAS approach varies widely, the identification of genetic variants for complex diseases remains a difficult problem.

View Article and Find Full Text PDF

The Clark phaseable sample size problem: long-range phasing and loss of heterozygosity in GWAS.

Bjarni V Halldórsson Derek Aguiar Ryan Tarpine Sorin Istrail

J Comput Biol

March 2011

A phase transition is taking place today. The amount of data generated by genome resequencing technologies is so large that in some cases it is now less expensive to repeat the experiment than to store the information generated by the experiment. In the next few years, it is quite possible that millions of Americans will have been genotyped.

View Article and Find Full Text PDF

Haplotype phasing by multi-assembly of shared haplotypes: phase-dependent interactions between rare variants.

Bjarni V Halldórsson Derek Aguiar Sorin Istrail

Pac Symp Biocomput

November 2013

In this paper we propose algorithmic strategies, Lander-Waterman-like statistical estimates, and genome-wide software for haplotype phasing by multi-assembly of shared haplotypes. Specifically, we consider four types of results which together provide a comprehensive workflow of GWAS data sets: (1) statistics of multi-assembly of shared haplotypes (2) graph theoretic algorithms for haplotype assembly based on conflict graphs of sequencing reads (3) inference of pedigree structure through haplotype sharing via tract finding algorithms and (4) multi-assembly of shared haplotypes of cases, controls, and trios. The input for the workflows that we consider are any of the combination of: (A) genotype data (B) next generation sequencing (NGS) (C) pedigree information.

View Article and Find Full Text PDF

Practical computational methods for regulatory genomics: a cisGRN-Lexicon and cisGRN-browser for gene regulatory networks.

Sorin Istrail Ryan Tarpine Kyle Schutter Derek Aguiar

Methods Mol Biol

December 2010

The CYRENE Project focuses on the study of cis-regulatory genomics and gene regulatory networks (GRN) and has three components: a cisGRN-Lexicon, a cisGRN-Browser, and the Virtual Sea Urchin software system. The project has been done in collaboration with Eric Davidson and is deeply inspired by his experimental work in genomic regulatory systems and gene regulatory networks. The current CYRENE cisGRN-Lexicon contains the regulatory architecture of 200 transcription factors encoding genes and 100 other regulatory genes in eight species: human, mouse, fruit fly, sea urchin, nematode, rat, chicken, and zebrafish, with higher priority on the first five species.

View Article and Find Full Text PDF

The imperfect ancestral recombination graph reconstruction problem: upper bounds for recombination and homoplasy.

Fumei Lam Ryan Tarpine Sorin Istrail

J Comput Biol

June 2010

One of the central problems in computational biology is the reconstruction of evolutionary histories. While models incorporating recombination and homoplasy have been studied separately, a missing component in the theory is a robust and flexible unifying model which incorporates both of these major biological events shaping genetic diversity. In this article, we introduce the first such unifying model and develop algorithms to find the optimal ancestral recombination graph incorporating recombinations and homoplasy events.

View Article and Find Full Text PDF

Functional cis-regulatory genomics for systems biology.

Jongmin Nam Ping Dong Ryan Tarpine Sorin Istrail Eric H Davidson

Proc Natl Acad Sci U S A

February 2010

Gene expression is controlled by interactions between trans-regulatory factors and cis-regulatory DNA sequences, and these interactions constitute the essential functional linkages of gene regulatory networks (GRNs). Validation of GRN models requires experimental cis-regulatory tests of predicted linkages to authenticate their identities and proposed functions. However, cis-regulatory analysis is, at present, at a severe bottleneck in genomic system biology because of the demanding experimental methodologies currently in use for discovering cis-regulatory modules (CRMs), in the genome, and for measuring their activities.

View Article and Find Full Text PDF