The protein-peptide interaction plays a pivotal role in fields such as drug development, yet remains underexplored experimentally and challenging to model computationally. Herein, we introduce PepCA, a sequence-based approach for predicting peptide-binding sites on proteins. A primary obstacle in predicting peptide-protein interactions is the difficulty in acquiring precise protein structures, coupled with the uncertainty of polypeptide configurations.
View Article and Find Full Text PDFCancer Epidemiol Biomarkers Prev
May 2023
Background: Early diagnosis is critical to lung adenocarcinoma patients' survival but faces inadequacies in convenient early detection.
Methods: We applied a comprehensive microarray of 130,000 peptides to detect "autoantibody signature" that is autoantibodies binding to mimotopes for early detection of stage 0-I LUAD. Plasma samples were collected from 147 early-stage lung adenocarcinoma (Early-LUAD), 108 benign lung disease (BLD), and 122 normal healthy controls (NHC).
FDR control has been a huge challenge for large-scale metabolome annotation. Although recent research indicated that the target-decoy strategy could be implemented to estimate FDR, it is hard to perform FDR control due to the difficulty of getting a reliable decoy database because of the complex fragmentation mechanism of metabolites and ubiquitous isomers. To tackle this problem, we developed a decoy generation method, which generates forged spectra from the reference target database by preserving the original reference signals to simulate the presence of isomers of metabolites.
View Article and Find Full Text PDFHuman identification and paternity testing are usually based on the study of STRs depending on their particular characteristics in the forensic investigation. In this paper, we developed a sensitive genotyping system, SiFaSTR 23-plex, which is able to characterize 18 expanded Combined DNA Index System STRs (D3S1358, D5S818, D2S1338, TPOX, CSF1PO, TH01, vWA, D7S820, D21S11, D10S1248, D8S1179, D1S1656, D18S51, D12S391, D19S433, D16S539, D13S317, and FGA), three highly polymorphic STRs among Chinese people (Penta D, Penta E, and D6S1043), one Y-chromosome Indel and amelogenin using a multiplex PCR; the PCR amplified products were analyzed using the Applied Biosystems 3500 Genetic Analyzer. Full genotyping profiles were obtained using only 31.
View Article and Find Full Text PDFSTRs vary not only in the length of the repeat units and the number of repeats but also in the region with which they conform to an incremental repeat pattern. Massively parallel sequencing (MPS) offers new possibilities in the analysis of STRs since they can simultaneously sequence multiple targets in a single reaction and capture potential internal sequence variations. Here, we sequenced 34 STRs applied in the forensic community of China with a custom-designed panel.
View Article and Find Full Text PDFBackground: Global disparities in prostate cancer (PCa) incidence highlight the urgent need to identify genomic abnormalities in prostate tumors in different ethnic populations including Asian men.
Objective: To systematically explore the genomic complexity and define disease-driven genetic alterations in PCa.
Design, Setting, And Participants: The study sequenced whole-genome and transcriptome of tumor-benign paired tissues from 65 treatment-naive Chinese PCa patients.
The custom-designed single nucleotide polymorphism (SNP) panel amplified 231 autosomal SNPs in one PCR reaction and subsequently sequenced with massively parallel sequencing (MPS) technology and Ion Torrent personal genome machine (PGM). SNPs were chosen from SNPforID, IISNP, HapMap, dbSNP, and related published literatures. Full concordance was obtained between available MPS calling and Sanger sequencing with 9947A and 9948 controls.
View Article and Find Full Text PDFUtilizing massively parallel sequencing (MPS) technology for SNP testing in forensic genetics is becoming attractive because of the shortcomings of STR markers, such as their high mutation rates and disadvantages associated with the current PCR-CE method as well as its limitations regarding multiplex capabilities. MPS offers the potential to genotype hundreds to thousands of SNPs from multiple samples in a single experimental run. In this study, we designed a customized SNP panel that includes 273 forensically relevant identity SNPs chosen from SNPforID, IISNP, and the HapMap database as well as previously related studies and evaluated the levels of genotyping precision, sequence coverage, sensitivity and SNP performance using the Ion Torrent PGM.
View Article and Find Full Text PDFIntroduction: The incidence rate of lung adenocarcinoma (LUAD), the predominant histological subtype of lung cancer, is elevated in Asians, particularly in female nonsmokers. The mutation patterns in LUAD in Asians might be distinct from those in LUAD in whites.
Methods: We profiled 271 resected LUAD tumors (mainly stage I) to characterize the genomic landscape of LUAD in Asians with a focus on female nonsmokers.
SNPs, abundant in human genome with lower mutation rate, are attractive to genetic application like forensic, anthropological and evolutionary studies. Universal SNPs showing little allelic frequency variation among populations while remaining highly informative for human identification were obtained from previous studies. However, genotyping tools target only dozens of markers simultaneously, limiting their applications.
View Article and Find Full Text PDFThe human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods.
View Article and Find Full Text PDFWe did whole-transcriptome sequencing and whole-genome sequencing on nine pairs of Hepatocellular carcinoma (HCC) tumors and matched adjacent tissues to identify RNA editing events. We identified mean 26,982 editing sites with mean 89.5% canonical A→G edits in each sample using an improved bioinformatics pipeline.
View Article and Find Full Text PDFSingle-cell sequencing is a powerful tool for delineating clonal relationship and identifying key driver genes for personalized cancer management. Here we performed single-cell sequencing analysis of a case of colon cancer. Population genetics analyses identified two independent clones in tumor cell population.
View Article and Find Full Text PDFUltra-low coverage sequencing (ULCS) is one of the most promising strategies for sequencing based clinical application. These clinical applications, especially prenatal diagnosis, have a strict requirement of turn-around-time; therefore, the application of ULCS is restricted by current high throughput sequencing platforms. Recently, the emergence of rapid sequencing platforms, such as MiSeq and Ion Proton, brings ULCS strategy into a new era.
View Article and Find Full Text PDFBackground: To gain biological insights into lung metastases from hepatocellular carcinoma (HCC), we compared the whole-genome sequencing profiles of primary HCC and paired lung metastases.
Methods: We used whole-genome sequencing at 33X-43X coverage to profile somatic mutations in primary HCC (HBV+) and metachronous lung metastases (> 2 years interval).
Results: In total, 5,027-13,961 and 5,275-12,624 somatic single-nucleotide variants (SNVs) were detected in primary HCC and lung metastases, respectively.
Hepatocellular carcinoma (HCC) is one of the most deadly cancers worldwide and has no effective treatment, yet the molecular basis of hepatocarcinogenesis remains largely unknown. Here we report findings from a whole-genome sequencing (WGS) study of 88 matched HCC tumor/normal pairs, 81 of which are Hepatitis B virus (HBV) positive, seeking to identify genetically altered genes and pathways implicated in HBV-associated HCC. We find beta-catenin to be the most frequently mutated oncogene (15.
View Article and Find Full Text PDFBackground: The applications of massively parallel sequencing technology to fetal cell-free DNA (cff-DNA) have brought new insight to non-invasive prenatal diagnosis. However, most previous research based on maternal plasma sequencing has been restricted to fetal aneuploidies. To detect specific parentally inherited mutations, invasive approaches to obtain fetal DNA are the current standard in the clinic because of the experimental complexity and resource consumption of previously reported non-invasive approaches.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
January 2013
The genetic diversity of Yersinia pestis, the etiologic agent of plague, is extremely limited because of its recent origin coupled with a slow clock rate. Here we identified 2,326 SNPs from 133 genomes of Y. pestis strains that were isolated in China and elsewhere.
View Article and Find Full Text PDFDe novo mutation plays an important role in autism spectrum disorders (ASDs). Notably, pathogenic copy number variants (CNVs) are characterized by high mutation rates. We hypothesize that hypermutability is a property of ASD genes and may also include nucleotide-substitution hot spots.
View Article and Find Full Text PDFTo survey hepatitis B virus (HBV) integration in liver cancer genomes, we conducted massively parallel sequencing of 81 HBV-positive and 7 HBV-negative hepatocellular carcinomas (HCCs) and adjacent normal tissues. We found that HBV integration is observed more frequently in the tumors (86.4%) than in adjacent liver tissues (30.
View Article and Find Full Text PDFTumor heterogeneity presents a challenge for inferring clonal evolution and driver gene identification. Here, we describe a method for analyzing the cancer genome at a single-cell nucleotide level. To perform our analyses, we first devised and validated a high-throughput whole-genome single-cell sequencing method using two lymphoblastoid cell line single cells.
View Article and Find Full Text PDFGenome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
February 2012
Surveying genome-wide coding variation within and among species gives unprecedented power to study the genetics of adaptation, in particular the proportion of amino acid substitutions fixed by positive selection. Additionally, contrasting the autosomes and the X chromosome holds information on the dominance of beneficial (adaptive) and deleterious mutations. Here we capture and sequence the complete exomes of 12 chimpanzees and present the largest set of protein-coding polymorphism to date.
View Article and Find Full Text PDFBackground: Cancers arise through an evolutionary process in which cell populations are subjected to selection; however, to date, the process of bladder cancer, which is one of the most common cancers in the world, remains unknown at a single-cell level.
Results: We carried out single-cell exome sequencing of 66 individual tumor cells from a muscle-invasive bladder transitional cell carcinoma (TCC). Analyses of the somatic mutant allele frequency spectrum and clonal structure revealed that the tumor cells were derived from a single ancestral cell, but that subsequent evolution occurred, leading to two distinct tumor cell subpopulations.
Transitional cell carcinoma (TCC) is the most common type of bladder cancer. Here we sequenced the exomes of nine individuals with TCC and screened all the somatically mutated genes in a prevalence set of 88 additional individuals with TCC with different tumor stages and grades. In our study, we discovered a variety of genes previously unknown to be mutated in TCC.
View Article and Find Full Text PDF