Publications by authors named "Jianqiu Michelle Zhang"

Epitranscriptome is an exciting area that studies different types of modifications in transcripts and the prediction of such modification sites from the transcript sequence is of significant interest. However, the scarcity of positive sites for most modifications imposes critical challenges for training robust algorithms. To circumvent this problem, we propose MR-GAN, a generative adversarial network (GAN) based model, which is trained in an unsupervised fashion on the entire pre-mRNA sequences to learn a low dimensional embedding of transcriptomic sequences.

View Article and Find Full Text PDF

Motivation: Transcription factor (TF) binds to the promoter region of a gene to control gene expression. Identifying precise TF binding sites (TFBSs) is essential for understanding the detailed mechanisms of TF-mediated gene regulation. However, there is a shortage of computational approach that can deliver single base pair resolution prediction of TFBS.

View Article and Find Full Text PDF

Background: Identifying disease correlated features early before large number of molecules are impacted by disease progression with significant abundance change is very advantageous to biologists for developing early disease diagnosis biomarkers. Disease correlated features have relatively low level of abundance change at early stages. Finding them using existing bioinformatic tools in high throughput data is a challenging task since the technology suffers from limited dynamic range and significant noise.

View Article and Find Full Text PDF

In the past decades, a few synergistic feature selection algorithms have been published, which includes Cooperative Index (CI) and K-Top Scoring Pair (k-TSP). These algorithms consider the synergistic behavior of features when they are included in a feature panel. Although promising results have been shown for these algorithms, there is lack of a comprehensive and fair comparison with other feature selection algorithms across a large number of microarray datasets in terms of classification accuracy and computational complexity.

View Article and Find Full Text PDF

Rationale: Without accurate peak linking/alignment, only the expression levels of a small percentage of proteins can be compared across multiple samples in Liquid Chromatography/Mass Spectrometry/Tandem Mass Spectrometry (LC/MS/MS) due to the selective nature of tandem MS peptide identification. This greatly hampers biomedical research that aims at finding biomarkers for disease diagnosis, treatment, and the understanding of disease mechanisms. A recent algorithm, PeakLink, has allowed the accurate linking of LC/MS peaks without tandem MS identifications to their corresponding ones with identifications across multiple samples collected from different instruments, tissues and labs, which greatly enhanced the ability of comparing proteins.

View Article and Find Full Text PDF

In liquid chromatography-mass spectrometry (LC-MS), parts of LC peaks are often corrupted by their co-eluting peptides, which results in increased quantification variance. In this paper, we propose to apply accurate LC peak boundary detection to remove the corrupted part of LC peaks. Accurate LC peak boundary detection is achieved by checking the consistency of intensity patterns within peptide elution time ranges.

View Article and Find Full Text PDF

Background: Transcriptional regulation by transcription factor (TF) controls the time and abundance of mRNA transcription. Due to the limitation of current proteomics technologies, large scale measurements of protein level activities of TFs is usually infeasible, making computational reconstruction of transcriptional regulatory network a difficult task.

Results: We proposed here a novel Bayesian non-negative factor model for TF mediated regulatory networks.

View Article and Find Full Text PDF