Publications by Henry C M Leung | LitMetric

Publications by authors named "Henry C M Leung"

Page 1 of 2

Mutation Spectrum Comparison between Benign Breast Lesion Cohort, Unselected Cancer Cohort and High-Risk Breast Cancer Cohort.

Ava Kwong Cecilia Y S Ho Henry C M Leung Amy W S Leung Chun-Hang Au

Cancers (Basel)

September 2024

Mutation study for high-risk breast and ovarian cancer (HBOC) has been extensively studied in patients of different ethnicities. Here we compared the germline mutation rate and mutation spectrum of patients ( = 4341) with benign breast diseases or breast cancers, with and without other risk factors. Three cohorts of Chinese patients were recruited.

View Article and Find Full Text PDF

Detecting structural variations with precise breakpoints using low-depth WGS data from a single oxford nanopore MinION flowcell.

Henry C M Leung Huijing Yu Yifan Zhang Wing Sze Leung Ivan F M Lo

Sci Rep

March 2022

Structural variation (SV) is a major cause of genetic disorders. In this paper, we show that low-depth (specifically, 4×) whole-genome sequencing using a single Oxford Nanopore MinION flow cell suffices to support sensitive detection of SV, particularly pathogenic SV for supporting clinical diagnosis. When using 4× ONT WGS data, existing SV calling software often fails to detect pathogenic SV, especially in the form of long deletion, terminal deletion, duplication, and unbalanced translocation.

View Article and Find Full Text PDF

Rapid and economical drug resistance profiling with Nanopore MinION for clinical specimens with low bacillary burden of Mycobacterium tuberculosis.

Wai Sing Chan Chun Hang Au Yvonne Chung Henry Chi Ming Leung Dona N Ho Tsun Leung Chan

BMC Res Notes

September 2020

Objective: We designed and tested a Nanopore sequencing panel for direct tuberculosis drug resistance profiling. The panel targeted 10 resistance-associated loci. We assessed the feasibility of amplifying and sequencing these loci from 23 clinical specimens with low bacillary burden.

View Article and Find Full Text PDF

CONNET: Accurate Genome Consensus in Assembling Nanopore Sequencing Data via Deep Learning.

Yifan Zhang Chi-Man Liu Henry C M Leung Ruibang Luo Tak-Wah Lam

iScience

May 2020

Single-molecule sequencing technologies produce much longer reads compared with next-generation sequencing, greatly improving the contiguity of de novo assembly of genomes. However, the relatively high error rates in long reads make it challenging to obtain high-quality assemblies. A computationally intensive consensus step is needed to resolve the discrepancies in the reads.

View Article and Find Full Text PDF

Predictive QSAR model confirms flavonoids in Chinese medicine can activate voltage-gated calcium (CaV) channel in osteogenesis.

Ki Chan Henry Chi Ming Leung James Kit-Hon Tsoi

Chin Med

March 2020

Background: Flavonoids in Chinese Medicine have been proven in animal studies that could aid in osteogenesis and bone formation. However, there is no consented mechanism for how these phytochemicals action on the bone-forming osteoblasts, and henceforth the prediction model of chemical screening for this specific biochemical function has not been established. The purpose of this study was to develop a novel selection and effective approach of flavonoids on the prediction of bone-forming ability via osteoblastic voltage-gated calcium (CaV) activation and inhibition using molecular modelling technique.

View Article and Find Full Text PDF

misFinder: identify mis-assemblies in an unbiased manner using reference and paired-end reads.

Xiao Zhu Henry C M Leung Rongjie Wang Francis Y L Chin Siu Ming Yiu

BMC Bioinformatics

November 2015

Background: Because of the short read length of high throughput sequencing data, assembly errors are introduced in genome assembly, which may have adverse impact to the downstream data analysis. Several tools have been developed to eliminate these errors by either 1) comparing the assembled sequences with some similar reference genome, or 2) analyzing paired-end reads aligned to the assembled sequences and determining inconsistent features alone mis-assembled sequences. However, the former approach cannot distinguish real structural variations between the target genome and the reference genome while the latter approach could have many false positive detections (correctly assembled sequence being considered as mis-assembled sequence).

View Article and Find Full Text PDF

Predicting drug-target interaction for new drugs using enhanced similarity measures and super-target clustering.

Jian-Yu Shi Siu-Ming Yiu Yiming Li Henry C M Leung Francis Y L Chin

Methods

July 2015

Predicting drug-target interaction using computational approaches is an important step in drug discovery and repositioning. To predict whether there will be an interaction between a drug and a target, most existing methods identify similar drugs and targets in the database. The prediction is then made based on the known interactions of these drugs and targets.

View Article and Find Full Text PDF

IDBA-MTP: A Hybrid Metatranscriptomic Assembler Based on Protein Information.

Henry C M Leung Siu-Ming Yiu Francis Y L Chin

J Comput Biol

May 2015

Metatranscriptomic analysis provides information on how a microbial community reacts to environmental changes. Using next-generation sequencing (NGS) technology, biologists can study the microbe community by sampling short reads from a mixture of mRNAs (metatranscriptomic data). As most microbial genome sequences are unknown, it would seem that de novo assembly of the mRNAs is needed.

View Article and Find Full Text PDF

PERGA: a paired-end read guided de novo assembler for extending contigs using SVM and look ahead approach.

Xiao Zhu Henry C M Leung Francis Y L Chin Siu Ming Yiu Guangri Quan

PLoS One

January 2016

Since the read lengths of high throughput sequencing (HTS) technologies are short, de novo assembly which plays significant roles in many applications remains a great challenge. Most of the state-of-the-art approaches base on de Bruijn graph strategy and overlap-layout strategy. However, these approaches which depend on k-mers or read overlaps do not fully utilize information of paired-end and single-end reads when resolving branches.

View Article and Find Full Text PDF

Sequence assembly using next generation sequencing data--challenges and solutions.

Francis Y L Chin Henry C M Leung S M Yiu

Sci China Life Sci

November 2014

Sequence assembling is an important step for bioinformatics study. With the help of next generation sequencing (NGS) technology, high throughput DNA fragment (reads) can be randomly sampled from DNA or RNA molecular sequence. However, as the positions of reads being sampled are unknown, assembling process is required for combining overlapped reads to reconstruct the original DNA or RNA sequence.

View Article and Find Full Text PDF

IDBA-MT: de novo assembler for metatranscriptomic data generated from next-generation sequencing technology.

Henry C M Leung Siu-Ming Yiu John Parkinson Francis Y L Chin

J Comput Biol

July 2013

High-throughput next-generation sequencing technology provides a great opportunity for analyzing metatranscriptomic data. However, the reads produced by these technologies are short and an assembling step is required to combine the short reads into longer contigs. As there are many repeat patterns in mRNAs from different genomes and the abundance ratio of mRNAs in a sample varies a lot, existing assemblers for genomic data, transcriptomic data, and metagenomic data do not work on metatranscriptomic data and produce chimeric contigs, that is, incorrect contigs formed by merging multiple mRNA sequences.

View Article and Find Full Text PDF

IDBA-tran: a more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels.

Yu Peng Henry C M Leung Siu-Ming Yiu Ming-Ju Lv Xin-Guang Zhu

Bioinformatics

July 2013

Motivation: RNA sequencing based on next-generation sequencing technology is effective for analyzing transcriptomes. Like de novo genome assembly, de novo transcriptome assembly does not rely on any reference genome or additional annotation information, but is more difficult. In particular, isoforms can have very uneven expression levels (e.

View Article and Find Full Text PDF

MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample.

Yi Wang Henry C M Leung S M Yiu Francis Y L Chin

Bioinformatics

September 2012

Motivation: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets.

View Article and Find Full Text PDF

IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth.

Yu Peng Henry C M Leung S M Yiu Francis Y L Chin

Bioinformatics

June 2012

Motivation: Next-generation sequencing allows us to sequence reads from a microbial environment using single-cell sequencing or metagenomic sequencing technologies. However, both technologies suffer from the problem that sequencing depth of different regions of a genome or genomes from different species are highly uneven. Most existing genome assemblers usually have an assumption that sequencing depths are even.

View Article and Find Full Text PDF

MetaCluster 4.0: a novel binning algorithm for NGS reads and huge number of species.

Yi Wang Henry C M Leung S M Yiu Francis Y L Chin

J Comput Biol

February 2012

Next-generation sequencing (NGS) technologies allow the sequencing of microbial communities directly from the environment without prior culturing. The output of environmental DNA sequencing consists of many reads from genomes of different unknown species, making the clustering together reads from the same (or similar) species (also known as binning) a crucial step. The difficulties of the binning problem are due to the following four factors: (1) the lack of reference genomes; (2) uneven abundance ratio of species; (3) short NGS reads; and (4) a large number of species (can be more than a hundred).

View Article and Find Full Text PDF

Meta-IDBA: a de Novo assembler for metagenomic data.

Yu Peng Henry C M Leung S M Yiu Francis Y L Chin

Bioinformatics

July 2011

Motivation: Next-generation sequencing techniques allow us to generate reads from a microbial environment in order to analyze the microbial community. However, assembling of a set of mixed reads from different species to form contigs is a bottleneck of metagenomic research. Although there are many assemblers for assembling reads from a single genome, there are no assemblers for assembling reads in metagenomic data without reference genome sequences.

View Article and Find Full Text PDF

A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio.

Henry C M Leung S M Yiu Bin Yang Yu Peng Yi Wang

Bioinformatics

June 2011

Motivation: With the rapid development of next-generation sequencing techniques, metagenomics, also known as environmental genomics, has emerged as an exciting research area that enables us to analyze the microbial environment in which we live. An important step for metagenomic data analysis is the identification and taxonomic characterization of DNA fragments (reads or contigs) resulting from sequencing a sample of mixed species. This step is referred to as 'binning'.

View Article and Find Full Text PDF

Diagnosis of schizophrenia: reliability of an operationalized approach to 'praecox-feeling'.

Gabor S Ungvari Yu-Tao Xiang Yu Hong Henry C M Leung Helen F K Chiu

Psychopathology

November 2010

Background: H.C. Rumke coined the term 'praecox-feeling' to denote a specific unease experienced by the clinician reflecting the 'impossibility of empathy' and 'lack of exchange of affect' that has been reported to occur early on when examining schizophrenia patients.

View Article and Find Full Text PDF

Finding optimal threshold for correction error reads in DNA assembling.

Francis Y L Chin Henry C M Leung Wei-Lin Li Siu-Ming Yiu

BMC Bioinformatics

January 2009

Background: DNA assembling is the problem of determining the nucleotide sequence of a genome from its substrings, called reads. In the experiments, there may be some errors on the reads which affect the performance of the DNA assembly algorithms. Existing algorithms, e.

View Article and Find Full Text PDF

Predicting protein complexes from PPI data: a core-attachment approach.

Henry C M Leung Qian Xiang S M Yiu Francis Y L Chin

J Comput Biol

February 2009

Unlabelled: Protein complexes play a critical role in many biological processes. Identifying the component proteins in a protein complex is an important step in understanding the complex as well as the related biological activities. This paper addresses the problem of predicting protein complexes from the protein-protein interaction (PPI) network of one species using a computational approach.

View Article and Find Full Text PDF

An efficient motif discovery algorithm with unknown motif length and number of binding sites.

Henry C M Leung Francis Y L Chin

Int J Data Min Bioinform

May 2008

Most motif discovery algorithms from DNA sequences require the motif's length as input. Styczynski et al. introduced the Extended (l,d)-Motif Problem (EMP) where the motif's length is not an input parameter.

View Article and Find Full Text PDF

DNA motif representation with nucleotide dependency.

Francis Chin Henry C M Leung

IEEE/ACM Trans Comput Biol Bioinform

May 2008

The problem of discovering novel motifs of binding sites is important to the understanding of gene regulatory networks. Motifs are generally represented by matrices (position weight matrix (PWM) or position specific scoring matrix (PSSM) or strings. However, these representations cannot model biological binding sites well because they fail to capture nucleotide interdependence.

View Article and Find Full Text PDF

Discovering motifs with transcription factor domain knowledge.

Henry C M Leung Francis Y L Chin Bethany M Y Chan

Pac Symp Biocomput

December 2007

We introduce a new motif-discovery algorithm, DIMDom, which exploits two additional kinds of information not commonly exploited: (a) the characteristic pattern of binding site classes, where class is determined based on biological information about transcription factor domains and (b) posterior probabilities of these classes. We compared the performance of DIMDom with MEME on all the transcription factors of Drosophila with at least one known binding site in the TRANSFAC database and found that DOMDom outperformed MEME with 2.5 times the number of successes and 1.

View Article and Find Full Text PDF

Finding linear motif pairs from protein interaction networks: a probabilistic approach.

Henry C M Leung M H Siu S M Yiu Francis Y L Chin Ken W K Sung

Comput Syst Bioinformatics Conf

December 2007

Finding motif pairs from a set of protein sequences based on the protein-protein interaction data is a challenging computational problem. Existing effective approaches usually rely on additional information such as some prior knowledge on protein groupings based on protein domains. In reality, this kind of knowledge is not always available.

View Article and Find Full Text PDF

Finding motifs from all sequences with and without binding sites.

Henry C M Leung Francis Y L Chin

Bioinformatics

September 2006

Motivation: Finding common patterns, motifs, from a set of promoter regions of coregulated genes is an important problem in molecular biology. Most existing motif-finding algorithms consider a set of sequences bound by the transcription factor as the only input. However, we can get better results by considering sequences that are not bound by the transcription factor as an additional input.

View Article and Find Full Text PDF