Publications by authors named "Henry C M Leung"

Mutation study for high-risk breast and ovarian cancer (HBOC) has been extensively studied in patients of different ethnicities. Here we compared the germline mutation rate and mutation spectrum of patients ( = 4341) with benign breast diseases or breast cancers, with and without other risk factors. Three cohorts of Chinese patients were recruited.

View Article and Find Full Text PDF

Structural variation (SV) is a major cause of genetic disorders. In this paper, we show that low-depth (specifically, 4×) whole-genome sequencing using a single Oxford Nanopore MinION flow cell suffices to support sensitive detection of SV, particularly pathogenic SV for supporting clinical diagnosis. When using 4× ONT WGS data, existing SV calling software often fails to detect pathogenic SV, especially in the form of long deletion, terminal deletion, duplication, and unbalanced translocation.

View Article and Find Full Text PDF

Objective: We designed and tested a Nanopore sequencing panel for direct tuberculosis drug resistance profiling. The panel targeted 10 resistance-associated loci. We assessed the feasibility of amplifying and sequencing these loci from 23 clinical specimens with low bacillary burden.

View Article and Find Full Text PDF

Single-molecule sequencing technologies produce much longer reads compared with next-generation sequencing, greatly improving the contiguity of de novo assembly of genomes. However, the relatively high error rates in long reads make it challenging to obtain high-quality assemblies. A computationally intensive consensus step is needed to resolve the discrepancies in the reads.

View Article and Find Full Text PDF

Background: Flavonoids in Chinese Medicine have been proven in animal studies that could aid in osteogenesis and bone formation. However, there is no consented mechanism for how these phytochemicals action on the bone-forming osteoblasts, and henceforth the prediction model of chemical screening for this specific biochemical function has not been established. The purpose of this study was to develop a novel selection and effective approach of flavonoids on the prediction of bone-forming ability via osteoblastic voltage-gated calcium (CaV) activation and inhibition using molecular modelling technique.

View Article and Find Full Text PDF

Background: Because of the short read length of high throughput sequencing data, assembly errors are introduced in genome assembly, which may have adverse impact to the downstream data analysis. Several tools have been developed to eliminate these errors by either 1) comparing the assembled sequences with some similar reference genome, or 2) analyzing paired-end reads aligned to the assembled sequences and determining inconsistent features alone mis-assembled sequences. However, the former approach cannot distinguish real structural variations between the target genome and the reference genome while the latter approach could have many false positive detections (correctly assembled sequence being considered as mis-assembled sequence).

View Article and Find Full Text PDF

Predicting drug-target interaction using computational approaches is an important step in drug discovery and repositioning. To predict whether there will be an interaction between a drug and a target, most existing methods identify similar drugs and targets in the database. The prediction is then made based on the known interactions of these drugs and targets.

View Article and Find Full Text PDF

Metatranscriptomic analysis provides information on how a microbial community reacts to environmental changes. Using next-generation sequencing (NGS) technology, biologists can study the microbe community by sampling short reads from a mixture of mRNAs (metatranscriptomic data). As most microbial genome sequences are unknown, it would seem that de novo assembly of the mRNAs is needed.

View Article and Find Full Text PDF

Since the read lengths of high throughput sequencing (HTS) technologies are short, de novo assembly which plays significant roles in many applications remains a great challenge. Most of the state-of-the-art approaches base on de Bruijn graph strategy and overlap-layout strategy. However, these approaches which depend on k-mers or read overlaps do not fully utilize information of paired-end and single-end reads when resolving branches.

View Article and Find Full Text PDF

Sequence assembling is an important step for bioinformatics study. With the help of next generation sequencing (NGS) technology, high throughput DNA fragment (reads) can be randomly sampled from DNA or RNA molecular sequence. However, as the positions of reads being sampled are unknown, assembling process is required for combining overlapped reads to reconstruct the original DNA or RNA sequence.

View Article and Find Full Text PDF

High-throughput next-generation sequencing technology provides a great opportunity for analyzing metatranscriptomic data. However, the reads produced by these technologies are short and an assembling step is required to combine the short reads into longer contigs. As there are many repeat patterns in mRNAs from different genomes and the abundance ratio of mRNAs in a sample varies a lot, existing assemblers for genomic data, transcriptomic data, and metagenomic data do not work on metatranscriptomic data and produce chimeric contigs, that is, incorrect contigs formed by merging multiple mRNA sequences.

View Article and Find Full Text PDF

Motivation: RNA sequencing based on next-generation sequencing technology is effective for analyzing transcriptomes. Like de novo genome assembly, de novo transcriptome assembly does not rely on any reference genome or additional annotation information, but is more difficult. In particular, isoforms can have very uneven expression levels (e.

View Article and Find Full Text PDF

Motivation: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets.

View Article and Find Full Text PDF

Motivation: Next-generation sequencing allows us to sequence reads from a microbial environment using single-cell sequencing or metagenomic sequencing technologies. However, both technologies suffer from the problem that sequencing depth of different regions of a genome or genomes from different species are highly uneven. Most existing genome assemblers usually have an assumption that sequencing depths are even.

View Article and Find Full Text PDF

Next-generation sequencing (NGS) technologies allow the sequencing of microbial communities directly from the environment without prior culturing. The output of environmental DNA sequencing consists of many reads from genomes of different unknown species, making the clustering together reads from the same (or similar) species (also known as binning) a crucial step. The difficulties of the binning problem are due to the following four factors: (1) the lack of reference genomes; (2) uneven abundance ratio of species; (3) short NGS reads; and (4) a large number of species (can be more than a hundred).

View Article and Find Full Text PDF

Motivation: Next-generation sequencing techniques allow us to generate reads from a microbial environment in order to analyze the microbial community. However, assembling of a set of mixed reads from different species to form contigs is a bottleneck of metagenomic research. Although there are many assemblers for assembling reads from a single genome, there are no assemblers for assembling reads in metagenomic data without reference genome sequences.

View Article and Find Full Text PDF

Motivation: With the rapid development of next-generation sequencing techniques, metagenomics, also known as environmental genomics, has emerged as an exciting research area that enables us to analyze the microbial environment in which we live. An important step for metagenomic data analysis is the identification and taxonomic characterization of DNA fragments (reads or contigs) resulting from sequencing a sample of mixed species. This step is referred to as 'binning'.

View Article and Find Full Text PDF

Background: H.C. Rumke coined the term 'praecox-feeling' to denote a specific unease experienced by the clinician reflecting the 'impossibility of empathy' and 'lack of exchange of affect' that has been reported to occur early on when examining schizophrenia patients.

View Article and Find Full Text PDF

Background: DNA assembling is the problem of determining the nucleotide sequence of a genome from its substrings, called reads. In the experiments, there may be some errors on the reads which affect the performance of the DNA assembly algorithms. Existing algorithms, e.

View Article and Find Full Text PDF

Unlabelled: Protein complexes play a critical role in many biological processes. Identifying the component proteins in a protein complex is an important step in understanding the complex as well as the related biological activities. This paper addresses the problem of predicting protein complexes from the protein-protein interaction (PPI) network of one species using a computational approach.

View Article and Find Full Text PDF

Most motif discovery algorithms from DNA sequences require the motif's length as input. Styczynski et al. introduced the Extended (l,d)-Motif Problem (EMP) where the motif's length is not an input parameter.

View Article and Find Full Text PDF

The problem of discovering novel motifs of binding sites is important to the understanding of gene regulatory networks. Motifs are generally represented by matrices (position weight matrix (PWM) or position specific scoring matrix (PSSM) or strings. However, these representations cannot model biological binding sites well because they fail to capture nucleotide interdependence.

View Article and Find Full Text PDF

We introduce a new motif-discovery algorithm, DIMDom, which exploits two additional kinds of information not commonly exploited: (a) the characteristic pattern of binding site classes, where class is determined based on biological information about transcription factor domains and (b) posterior probabilities of these classes. We compared the performance of DIMDom with MEME on all the transcription factors of Drosophila with at least one known binding site in the TRANSFAC database and found that DOMDom outperformed MEME with 2.5 times the number of successes and 1.

View Article and Find Full Text PDF

Finding motif pairs from a set of protein sequences based on the protein-protein interaction data is a challenging computational problem. Existing effective approaches usually rely on additional information such as some prior knowledge on protein groupings based on protein domains. In reality, this kind of knowledge is not always available.

View Article and Find Full Text PDF

Motivation: Finding common patterns, motifs, from a set of promoter regions of coregulated genes is an important problem in molecular biology. Most existing motif-finding algorithms consider a set of sequences bound by the transcription factor as the only input. However, we can get better results by considering sequences that are not bound by the transcription factor as an additional input.

View Article and Find Full Text PDF