Publications by authors named "Daniel Ashlock"

Background: The clustering of immune repertoire data is challenging due to the computational cost associated with a very large number of pairwise sequence comparisons. To overcome this limitation, we developed Anchor Clustering, an unsupervised clustering method designed to identify similar sequences from millions of antigen receptor gene sequences. First, a Point Packing algorithm is used to identify a set of maximally spaced anchor sequences.

View Article and Find Full Text PDF

Intrinsically disordered proteins (IDPs) are proteins that lack a stable 3D structure but maintain a biological function. It has been frequently suggested that IDPs are difficult to align because they tend to have fewer conserved residues compared to ordered proteins, but to our knowledge this has never been directly tested. To compare the alignments of ordered proteins to IDPs, their multiple sequence alignments (MSAs) were assessed using two different methods.

View Article and Find Full Text PDF

Social dilemma games are studied to gain insight into why humans cooperate with other unrelated people. The canonical game has cooperation and defection as the two strategies. Cooperation benefits the group, but a self-interested player can always do better by defecting.

View Article and Find Full Text PDF

Background: Detection of central nodes in asymmetrically directed biological networks depends on centrality metrics quantifying individual nodes' importance in a network. In topological analyses on metabolic networks, various centrality metrics have been mostly applied to metabolite-centric graphs. However, centrality metrics including those not depending on high connections are largely unexplored for directed reaction-centric graphs.

View Article and Find Full Text PDF

There have been longstanding concerns about the stability of hierarchical clustering. A suggested explanation for this instability is the presence of "rogue taxa", i.e.

View Article and Find Full Text PDF

Dehydrins, plant proteins that are upregulated during dehydration stress conditions, have modular sequences that can contain three conserved motifs (the Y-, S-, and K-segments). The presence and order of these motifs are used to classify dehydrins into one of five architectures: Kn, SKn, KnS, YnKn, and YnSKn, where the subscript n describes the number of copies of that motif. In this study, an architectural and phylogenetic analysis was performed on 426 dehydrin sequences that were identified in 53 angiosperm and 3 gymnosperm genomes.

View Article and Find Full Text PDF

Starch is a water-insoluble polyglucan synthesized inside the plastid stroma within plant cells, serving a crucial role in the carbon budget of the whole plant by acting as a short-term and long-term store of energy. The highly complex, hierarchical structure of the starch granule arises from the actions of a large suite of enzyme activities, in addition to physicochemical self-assembly mechanisms. This review outlines current knowledge of the starch biosynthetic pathway operating in plant cells in relation to the micro- and macro-structures of the starch granule.

View Article and Find Full Text PDF

Graphs can be used as contact networks in models of epidemic spread. Most research seeks to extract the properties of an extant graph, derived from questionnaires or other sources of contact information. The inverse problem of searching the space of graphs for those that exhibit specific properties has received little attention and that is the focus of this study.

View Article and Find Full Text PDF

DNA Fragment assembly - an NP-Hard problem - is one of the major steps in of DNA sequencing. Multiple strategies have been used for this problem, including greedy graph-based algorithms, deBruijn graphs, and the overlap-layout-consensus approach. This study focuses on the overlap-layout-consensus approach.

View Article and Find Full Text PDF

This paper examines the use of evolutionary algorithms in the development of antibiotic regimens given to production animals. A model is constructed that combines the lifespan of the animal and the bacteria living in the animal's gastro-intestinal tract from the early finishing stage until the animal reaches market weight. This model is used as the fitness evaluation for a set of graph based evolutionary algorithms to assess the impact of diversity control on the evolving antibiotic regimens.

View Article and Find Full Text PDF

The explosion of available sequence data necessitates the development of sophisticated machine learning tools with which to analyze them. This study introduces a sequence-learning technology called side effect machines. It also applies a model of evolution which simulates the evolution of a ring species to the training of the side effect machines.

View Article and Find Full Text PDF

Background: The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs.

View Article and Find Full Text PDF

DNA error correcting codes over the edit metric consist of embeddable markers for sequencing projects that are tolerant of sequencing errors. When a genetic library has multiple sources for its sequences, use of embedded markers permit tracking of sequence origin. This study compares different methods for synthesizing DNA error correcting codes.

View Article and Find Full Text PDF

Ring species are a biological complex that theoretically forms when an ancestral population extends its range around a geographic barrier and, despite low-level gene flow, differentiates until reproductive isolation exists when terminal populations come into secondary contact. Due to their rarity in nature, little is known about the biological factors that promote the formation of ring species. We use evolutionary algorithms operating on two simple computational problems (SAW and K-max) to study the process of speciation under the conditions which may yield ring species.

View Article and Find Full Text PDF

Background: Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance.

View Article and Find Full Text PDF

The maize (Zea mays) spikelet consists of two florets, each of which contains three developmentally synchronized anthers. Morphologically, the anthers in the upper and lower florets proceed through apparently similar developmental programs. To test for global differences in gene expression and to identify genes that are coordinately regulated during maize anther development, RNA samples isolated from upper and lower floret anthers at six developmental stages were hybridized to cDNA microarrays.

View Article and Find Full Text PDF

As an ancient segmental tetraploid, the maize (Zea mays L.) genome contains large numbers of paralogs that are expected to have diverged by a minimum of 10% over time. Nearly identical paralogs (NIPs) are defined as paralogous genes that exhibit > or = 98% identity.

View Article and Find Full Text PDF

A new genetic map of maize, ISU-IBM Map4, that integrates 2029 existing markers with 1329 new indel polymorphism (IDP) markers has been developed using intermated recombinant inbred lines (IRILs) from the intermated B73xMo17 (IBM) population. The website http://magi.plantgenomics.

View Article and Find Full Text PDF

Recent sequencing efforts have targeted the gene-rich regions of the maize (Zea mays L.) genome. We report the release of an improved assembly of maize assembled genomic islands (MAGIs).

View Article and Find Full Text PDF

Five ab initio programs (FGENESH, GeneMark.hmm, GENSCAN, GlimmerR and Grail) were evaluated for their accuracy in predicting maize genes. Two of these programs, GeneMark.

View Article and Find Full Text PDF

In recent years, access to complete genomic sequences, coupled with rapidly accumulating data related to RNA and protein expression patterns, has made it possible to determine comprehensively how genes contribute to complex phenotypes. However, for major crop plants, publicly available, standard platforms for parallel expression analysis have been limited. We report the conception and design of the new publicly available, 22K Barley1 GeneChip probe array, a model for plants without a fully sequenced genome.

View Article and Find Full Text PDF

Unlabelled: Because the bulk of the maize (Zea mays L.) genome consists of repetitive sequences, sequencing efforts are being targeted to its 'gene-rich' fraction. Traditional assembly programs are inadequate for this approach because they are optimized for a uniform sampling of the genome and inherently lack the ability to differentiate highly similar paralogs.

View Article and Find Full Text PDF

To enhance gene discovery, expressed sequence tag (EST) projects often make use of cDNA libraries produced using diverse mixtures of mRNAs. As such, expression data are lost because the origins of the resulting ESTs cannot be determined. Alternatively, multiple libraries can be prepared, each from a more restricted source of mRNAs.

View Article and Find Full Text PDF

Even in the absence of excisional loss of the associated Mu transposons, some Mu-induced mutant alleles of maize can lose their capacity to condition a mutant phenotype. Three of five Mu-derived rf2a alleles are susceptible to such Mu suppression. The suppressible rf2a-m9437 allele has a novel Mu transposon insertion (Mu10) in its 5' untranslated region (UTR).

View Article and Find Full Text PDF