Publications by authors named "Von Bing Yap"

Purpose: Invasion of carcinoma cells into surrounding tissue affects breast cancer staging, influences choice of treatment, and impacts on patient outcome. KIF21A is a member of the kinesin superfamily that has been well-studied in congenital extraocular muscle fibrosis. However, its biological relevance in breast cancer is unknown.

View Article and Find Full Text PDF

Why some individuals seek social engagement while others shy away has profound implications for normal and pathological human behavior. Evidence suggests that oxytocin (OT), the paramount human social hormone, and CD38 that governs OT release, contribute to individual differences in social skills from intense social involvement to extreme avoidance that characterize autism. To explore the neurochemical underpinnings of sociality, CD38 expression of peripheral blood leukocytes (PBL) was measured in Han Chinese undergraduates.

View Article and Find Full Text PDF

Estimation of natural selection on protein-coding sequences is a key comparative genomics approach for de novo prediction of lineage-specific adaptations. Selective pressure is measured on a per-gene basis by comparing the rate of nonsynonymous substitutions to the rate of synonymous substitutions. All published codon substitution models have been time-reversible and thus assume that sequence composition does not change over time.

View Article and Find Full Text PDF

Mutation processes differ between types of point mutation, genomic locations, cells, and biological species. For some point mutations, specific neighboring bases are known to be mechanistically influential. Beyond these cases, numerous questions remain unresolved, including: what are the sequence motifs that affect point mutations? How large are the motifs? Are they strand symmetric? And, do they vary between samples? We present new log-linear models that allow explicit examination of these questions, along with sequence logo style visualization to enable identifying specific motifs.

View Article and Find Full Text PDF

The genetic distance between biological sequences is a fundamental quantity in molecular evolution. It pertains to questions of rates of evolution, existence of a molecular clock, and phylogenetic inference. Under the class of continuous-time substitution models, the distance is commonly defined as the expected number of substitutions at any site in the sequence.

View Article and Find Full Text PDF

Introduction: While overexpression of syndecan-1 has been associated with aggressive breast cancer in the Caucasian population, the expression pattern of syndecan-1 in Asian women remains unclear. Triple-positive breast carcinoma, in particular, is a unique subtype that has not been extensively studied. We aimed to evaluate the role of syndecan-1 as a potential biomarker and prognostic factor for triple-positive breast carcinoma in Asian women.

View Article and Find Full Text PDF

Continuous-time Markov processes are often used to model the complex natural phenomenon of sequence evolution. To make the process of sequence evolution tractable, simplifying assumptions are often made about the sequence properties and the underlying process. The validity of one such assumption, time-homogeneity, has never been explored.

View Article and Find Full Text PDF

The most general context-dependent Markov substitution process, where each substitution event involves only one site and substitution rates depend on the whole sequence, is presented for the first time. The focus is on circular DNA sequences, where the problem of specifying the behaviour of the first and last sites in a linear sequence does not arise. Important special cases include (1) the established models where each site behaves independently, (2) models which are increasingly applied to non-coding DNA, where each site depends on only the immediate neighbouring sites, and (3) models where each site depends on two closest neighbours on both sides, such as the codon models.

View Article and Find Full Text PDF

For a reversible finite-state continuous-time Markov chain containing similar states, the computation of the transition matrix can be expressed quite elegantly in terms of the transition matrix of an associated lumped Markov chain. This result is immensely useful for obtaining explicit transition matrices for many DNA substitution models, without diagonalizing a matrix or solving a differential equation. Furthermore, the technique works for the analogous problem in the discrete-time DNA substitution models.

View Article and Find Full Text PDF

We wish to suggest the categorical analysis of variance as a means of quantifying the proportion of total genetic variation attributed to different sources of variation. This method potentially challenges researchers to rethink conclusions derived from a well-known method known as the analysis of molecular variance (AMOVA). The CATANOVA framework allows explicit definition, and estimation, of two measures of genetic differentiation.

View Article and Find Full Text PDF

Background: Quantitative trait loci analysis assumes that the trait is normally distributed. In reality, this is often not observed and one strategy is to transform the trait. However, it is not clear how much normality is required and which transformation works best in association studies.

View Article and Find Full Text PDF

Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection.

View Article and Find Full Text PDF

Background: Continuous-time Markov models allow flexible, parametrically succinct descriptions of sequence divergence. Non-reversible forms of these models are more biologically realistic but are challenging to develop. The instantaneous rate matrices defined for these models are typically transformed into substitution probability matrices using a matrix exponentiation algorithm that employs eigendecomposition, but this algorithm has characteristic vulnerabilities that lead to significant errors when a rate matrix possesses certain 'pathological' properties.

View Article and Find Full Text PDF

Background: Neighboring nucleotides exert a striking influence on mutation, with the hypermutability of CpG dinucleotides in many genomes being an exemplar. Among the approaches employed to measure the relative importance of sequence neighbors on molecular evolution have been continuous-time Markov process models for substitutions that treat sequences as a series of independent tuples. The most widely used examples are the codon substitution models.

View Article and Find Full Text PDF

Background: The nucleotide substitution rate matrix is a key parameter of molecular evolution. Several methods for inferring this parameter have been proposed, with different mathematical bases. These methods include counting sequence differences and taking the log of the resulting probability matrices, methods based on Markov triples, and maximum likelihood methods that infer the substitution probabilities that lead to the most likely model of evolution.

View Article and Find Full Text PDF

Background: We compared two methods of rooting a phylogenetic tree: the stationary and the nonstationary substitution processes. These methods do not require an outgroup.

Methods: Given a multiple alignment and an unrooted tree, the maximum likelihood estimates of branch lengths and substitution parameters for each associated rooted tree are found; rooted trees are compared using their likelihood values.

View Article and Find Full Text PDF

Major histocompatibility complex class I molecules present peptides of 8-10 residues to CD8+ T cells. We used 19 predicted proteomes to determine the influence of CD8+ T cell immune surveillance on protein evolution in humans and microbial pathogens by predicting immunopeptidomes, i.e.

View Article and Find Full Text PDF

We describe a whole-genome comparative analysis of the human, mouse, and rat genomes to describe the average substitution patterns of four genomic regions: ancient repeats, rodent-specific DNA, exons, and conserved (coding and noncoding) regions, and to identify rodent evolutionary hotspots. In all types of regions, except the rodent-specific DNA, the rat branch is slightly longer than the mouse branch. Moreover, the mouse-rat distance is longer in the rodent-specific DNA than in the ancient repeats.

View Article and Find Full Text PDF

The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome.

View Article and Find Full Text PDF

We studied the substitution patterns in 7661 well-conserved human-mouse alignments corresponding to the intergenic regions of human chromosome 22. Alignments with a high average GC content tend to have a higher human GC content than mouse GC content, indicating a lack of stationarity. Segmenting the alignments into four groups of GC content and fitting the general reversible substitution model (REV) separately gave significantly better fits than the overall fit and the levels of fit are close to that expected under an REV model.

View Article and Find Full Text PDF