Publications by Kuan-Hao Chao

Publications by authors named "Kuan-Hao Chao"

Page 1 of 1

Splam: a deep-learning-based splice site predictor that improves spliced alignments.

Kuan-Hao Chao Alan Mao Steven L Salzberg Mihaela Pertea

Genome Biol

September 2024

The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. We describe Splam, a novel method for predicting splice junctions in DNA using deep residual convolutional neural networks. Unlike previous models, Splam looks at a 400-base-pair window flanking each splice site, reflecting the biological splicing process that relies primarily on signals within this window.

View Article and Find Full Text PDF

Combining DNA and protein alignments to improve genome annotation with LiftOn.

Kuan-Hao Chao Jakob M Heinz Celine Hoh Alan Mao Alaina Shumate

bioRxiv

May 2024

As the number and variety of assembled genomes continues to grow, the number of annotated genomes is falling behind, particularly for eukaryotes. DNA-based mapping tools help to address this challenge, but they are only able to transfer annotation between closely-related species. Here we introduce LiftOn, a homology-based software tool that integrates DNA and protein alignments to enhance the accuracy of genome-scale annotation and to allow mapping between relatively distant species.

View Article and Find Full Text PDF

EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes.

Ida Shinder Richard Hu Hyun Joo Ji Kuan-Hao Chao Mihaela Pertea

Nat Commun

November 2023

Accurate alignment of transcribed RNA to reference genomes is a critical step in the analysis of gene expression, which in turn has broad applications in biomedical research and in the basic sciences. We reveal that widely used splice-aware aligners, such as STAR and HISAT2, can introduce erroneous spliced alignments between repeated sequences, leading to the inclusion of falsely spliced transcripts in RNA-seq experiments. In some cases, the 'phantom' introns resulting from these errors make their way into widely-used genome annotation databases.

View Article and Find Full Text PDF

CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure.

Ales Varabyou Markus J Sommer Beril Erdogdu Ida Shinder Ilia Minkin Kuan-Hao Chao

Genome Biol

October 2023

CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, and new protein structure prediction methods. CHESS 3 contains 41,356 genes, including 19,839 protein-coding genes and 158,377 transcripts, with 14,863 protein-coding transcripts not in other catalogs.

View Article and Find Full Text PDF

WGT: Tools and algorithms for recognizing, visualizing, and generating Wheeler graphs.

Kuan-Hao Chao Pei-Wei Chen Sanjit A Seshia Ben Langmead

iScience

August 2023

A Wheeler graph represents a collection of strings in a way that is particularly easy to index and query. Such a graph is a practical choice for representing a graph-shaped pangenome, and it is the foundation for current graph-based pangenome indexes. However, there are no practical tools to visualize or to check graphs that may have the Wheeler properties.

View Article and Find Full Text PDF

Splam: a deep-learning-based splice site predictor that improves spliced alignments.

Kuan-Hao Chao Alan Mao Steven L Salzberg Mihaela Pertea

bioRxiv

July 2023

The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. Here we describe Splam, a novel method for predicting splice junctions in DNA based on deep residual convolutional neural networks. Unlike some previous models, Splam looks at a relatively limited window of 400 base pairs flanking each splice site, motivated by the observation that the biological process of splicing relies primarily on signals within this window.

View Article and Find Full Text PDF

A feature extraction free approach for protein interactome inference from co-elution data.

Yu-Hsin Chen Kuan-Hao Chao Jin Yung Wong Chien-Fu Liu Jun-Yi Leu

Brief Bioinform

July 2023

Protein complexes are key functional units in cellular processes. High-throughput techniques, such as co-fractionation coupled with mass spectrometry (CF-MS), have advanced protein complex studies by enabling global interactome inference. However, dealing with complex fractionation characteristics to define true interactions is not a simple task, since CF-MS is prone to false positives due to the co-elution of non-interacting proteins by chance.

View Article and Find Full Text PDF

The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual.

Kuan-Hao Chao Aleksey V Zimin Mihaela Pertea Steven L Salzberg

G3 (Bethesda)

March 2023

We used long-read DNA sequencing to assemble the genome of a Southern Han Chinese male. We organized the sequence into chromosomes and filled in gaps using the recently completed T2T-CHM13 genome as a guide, yielding a gap-free genome, Han1, containing 3,099,707,698 bases. Using the T2T-CHM13 annotation as a reference, we mapped all genes onto the Han1 genome and identified additional gene copies, generating a total of 60,708 putative genes, of which 20,003 are protein-coding.

View Article and Find Full Text PDF

sangeranalyseR: Simple and Interactive Processing of Sanger Sequencing Data in R.

Kuan-Hao Chao Kirston Barton Sarah Palmer Robert Lanfear

Genome Biol Evol

March 2021

sangeranalyseR is feature-rich, free, and open-source R package for processing Sanger sequencing data. It allows users to go from loading reads to saving aligned contigs in a few lines of R code by using sensible defaults for most actions. It also provides complete flexibility for determining how individual reads and contigs are processed, both at the command-line in R and via interactive Shiny applications.

View Article and Find Full Text PDF

RNASeqR: An R Package for Automated Two-Group RNA-Seq Analysis Workflow.

Kuan-Hao Chao Yi-Wen Hsiao Yi-Fang Lee Chien-Yueh Lee Liang-Chuan Lai

IEEE/ACM Trans Comput Biol Bioinform

January 2022

RNA-Seq analysis has revolutionized researchers' understanding of the transcriptome in biological research. Assessing the differences in transcriptomic profiles between tissue samples or patient groups enables researchers to explore the underlying biological impact of transcription. RNA-Seq analysis requires multiple processing steps and huge computational capabilities.

View Article and Find Full Text PDF

Publications by authors named "Kuan-Hao Chao"

Splam: a deep-learning-based splice site predictor that improves spliced alignments.

Combining DNA and protein alignments to improve genome annotation with LiftOn.

EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes.

CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure.

WGT: Tools and algorithms for recognizing, visualizing, and generating Wheeler graphs.

Splam: a deep-learning-based splice site predictor that improves spliced alignments.

A feature extraction free approach for protein interactome inference from co-elution data.

The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual.

sangeranalyseR: Simple and Interactive Processing of Sanger Sequencing Data in R.

RNASeqR: An R Package for Automated Two-Group RNA-Seq Analysis Workflow.

A PHP Error was encountered

A PHP Error was encountered