Publications by authors named "Ross Lippert"

The evaluation of electrostatic energy for a set of point charges in a periodic lattice is a computationally expensive part of molecular dynamics simulations (and other applications) because of the long-range nature of the Coulomb interaction. A standard approach is to decompose the Coulomb potential into a near part, typically evaluated by direct summation up to a cutoff radius, and a far part, typically evaluated in Fourier space. In practice, all decomposition approaches involve approximations-such as cutting off the near-part direct sum-but it may be possible to find new decompositions with improved trade-offs between accuracy and performance.

View Article and Find Full Text PDF

In molecular dynamics simulations, control over temperature and pressure is typically achieved by augmenting the original system with additional dynamical variables to create a thermostat and a barostat, respectively. These variables generally evolve on timescales much longer than those of particle motion, but typical integrator implementations update the additional variables along with the particle positions and momenta at each time step. We present a framework that replaces the traditional integration procedure with separate barostat, thermostat, and Newtonian particle motion updates, allowing thermostat and barostat updates to be applied infrequently.

View Article and Find Full Text PDF

Since the behavior of biomolecules can be sensitive to temperature, the ability to accurately calculate and control the temperature in molecular dynamics (MD) simulations is important. Standard analysis of equilibrium MD simulations-even constant-energy simulations with negligible long-term energy drift-often yields different calculated temperatures for different motions, however, in apparent violation of the statistical mechanical principle of equipartition of energy. Although such analysis provides a valuable warning that other simulation artifacts may exist, it leaves the actual value of the temperature uncertain.

View Article and Find Full Text PDF

Algorithms for exact string matching have substantial application in computational biology. Time-efficient data structures which support a variety of exact string matching queries, such as the suffix tree and the suffix array, have been applied to such problems. As sequence databases grow, more space-efficient approaches to exact matching are becoming more important.

View Article and Find Full Text PDF

Recent sequencing of the human and other mammalian genomes has brought about the necessity to align them, to identify and characterize their commonalities and differences. Programs that align whole genomes generally use a seed-and-extend technique, starting from exact or near-exact matches and selecting a reliable subset of these, called anchors, and then filling in the remaining portions between the anchors using a combination of local and global alignment algorithms, but their choices for the parameters so far have been primarily heuristic. We present a statistical framework and practical methods for selecting a set of matches that is both sensitive and specific and can constitute a reliable set of anchors for a one-to-one mapping of two genomes from which a whole-genome alignment can be built.

View Article and Find Full Text PDF

The starting point for any alignment of mammalian genomes is the computation of exact matches satisfying various criteria. Time-efficient, O(n), data structures for this computation, such as the suffix tree, require O(n log(n)) space, several times the space of the genomes themselves. Thus, any reasonable whole-genome comparative project finds itself requiring tens of Gigabytes of RAM to maintain time-efficiency.

View Article and Find Full Text PDF

The extent and patterns of linkage disequilibrium (LD) determine the feasibility of association studies to map genes that underlie complex traits. Here we present a comparison of the patterns of LD across four major human populations (African-American, Caucasian, Chinese, and Japanese) with a high-resolution single-nucleotide polymorphism (SNP) map covering almost the entire length of chromosomes 6, 21, and 22. We constructed metric LD maps formulated such that the units measure the extent of useful LD for association mapping.

View Article and Find Full Text PDF

Nearly one in eight US women will develop breast cancer in their lifetime. Most breast cancer is not associated with a hereditary syndrome, occurs in postmenopausal women, and is estrogen and progesterone receptor-positive. Estrogen exposure is an epidemiologic risk factor for breast cancer and estrogen is a potent mammary mitogen.

View Article and Find Full Text PDF

It is widely hoped that the study of sequence variation in the human genome will provide a means of elucidating the genetic component of complex diseases and variable drug responses. A major stumbling block to the successful design and execution of genome-wide disease association studies using single-nucleotide polymorphisms (SNPs) and linkage disequilibrium is the enormous number of SNPs in the human genome. This results in unacceptably high costs for exhaustive genotyping and presents a challenging problem of statistical inference.

View Article and Find Full Text PDF

We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.

View Article and Find Full Text PDF

When comparing two sequences, a natural approach is to count the number of k-letter words the two sequences have in common. No positional information is used in the count, but it has the virtue that the comparison time is linear with sequence length. For this reason this statistic D(2) and certain transformations of D(2) are used for EST sequence database searches.

View Article and Find Full Text PDF

With the consensus human genome sequenced and many other sequencing projects at varying stages of completion, greater attention is being paid to the genetic differences among individuals and the abilities of those differences to predict phenotypes. A significant obstacle to such work is the difficulty and expense of determining haplotypes--sets of variants genetically linked because of their proximity on the genome--for large numbers of individuals for use in association studies. This paper presents some algorithmic considerations in a new approach for haplotype determination: inferring haplotypes from localised polymorphism data gathered from short genome 'fragments.

View Article and Find Full Text PDF