Background: Advanced long-read sequencing technologies, such as those from Oxford Nanopore Technologies and Pacific Biosciences, are finding a wide use in de novo genome sequencing projects. However, long reads typically have higher error rates relative to short reads. If left unaddressed, subsequent genome assemblies may exhibit high base error rates that compromise the reliability of downstream analysis.
View Article and Find Full Text PDFHuman papillomavirus (HPV) integration has been implicated in transforming HPV infection into cancer. To resolve genome dysregulation associated with HPV integration, we performed Oxford Nanopore long-read sequencing on 72 cervical cancer genomes from an Ugandan dataset that was previously characterized using short-read sequencing. We found recurrent structural rearrangement patterns at HPV integration events, which we categorized as: del(etion)-like, dup(lication)-like, translocation, multibreakpoint, or repeat region integrations.
View Article and Find Full Text PDFCannabis plants produce a spectrum of secondary metabolites, encompassing cannabinoids and more than 300 non-cannabinoid compounds. Among these, anthocyanins have important functions in plants and also have well documented health benefits. Anthocyanins are largely responsible for the red/purple color phenotypes in plants.
View Article and Find Full Text PDFAntimicrobial resistance is a critical public health concern, necessitating the exploration of alternative treatments. While antimicrobial peptides (AMPs) show promise, assessing their toxicity using traditional wet lab methods is both time-consuming and costly. We introduce tAMPer, a novel multi-modal deep learning model designed to predict peptide toxicity by integrating the underlying amino acid sequence composition and the three-dimensional structure of peptides.
View Article and Find Full Text PDFAntibiotic resistance is recognized as an imminent and growing global health threat. New antimicrobial drugs are urgently needed due to the decreasing effectiveness of conventional small-molecule antibiotics. Antimicrobial peptides (AMPs), a class of host defense peptides, are emerging as promising candidates to address this need.
View Article and Find Full Text PDFWith the increasing availability of long-read sequencing data, high-quality human genome assemblies, and software for fully characterizing tandem repeats, genome-wide genotyping of tandem repeat loci on a population scale becomes more feasible. Such efforts not only expand our knowledge of the tandem repeat landscape in the human genome but also enhance our ability to differentiate pathogenic tandem repeat mutations from benign polymorphisms. To this end, we analyzed 272 genomes assembled using datasets from three public initiatives that employed different long-read sequencing technologies.
View Article and Find Full Text PDFNucleotide-binding domain and leucine-rich repeat (NLR) immune receptor genes form a major line of defense in plants, acting in both pathogen recognition and resistance machinery activation. NLRs are reported to form large gene clusters in limber pine (Pinus flexilis), but it is unknown how widespread this genomic architecture may be among the extant species of conifers (Pinophyta). We used comparative genomic analyses to assess patterns in the abundance, diversity, and genomic distribution of NLR genes.
View Article and Find Full Text PDFEnabled by the explosion of data and substantial increase in computational power, deep learning has transformed fields such as computer vision and natural language processing (NLP) and it has become a successful method to be applied to many transcriptomic analysis tasks. A core advantage of deep learning is its inherent capability to incorporate feature computation within the machine learning models. This results in a comprehensive and machine-readable representation of sequences, facilitating the downstream classification and clustering tasks.
View Article and Find Full Text PDFComp Biochem Physiol Part D Genomics Proteomics
June 2024
As amphibians undergo thyroid hormone (TH)-dependent metamorphosis from an aquatic tadpole to the terrestrial frog, their innate immune system must adapt to the new environment. Skin is a primary line of defense, yet this organ undergoes extensive remodelling during metamorphosis and how it responds to TH is poorly understood. Temperature modulation, which regulates metamorphic timing, is a unique way to uncover early TH-induced transcriptomic events.
View Article and Find Full Text PDFConifers are long-lived and slow-evolving, thus requiring effective defences against their fast-evolving insect natural enemies. The copy number variation (CNV) of two key acetophenone biosynthesis genes Ugt5/Ugt5b and βglu-1 may provide a plausible mechanism underlying the constitutively variable defence in white spruce (Picea glauca) against its primary defoliator, spruce budworm. This study develops a long-insert sequence capture probe set (Picea_hung_p1.
View Article and Find Full Text PDFMotivation: -mer hashing is a common operation in many foundational bioinformatics problems. However, generic string hashing algorithms are not optimized for this application. Strings in bioinformatics use specific alphabets, a trait leveraged for nucleic acid sequences in earlier work.
View Article and Find Full Text PDFBackground: The mountain pine beetle, Dendroctonus ponderosae, is an irruptive bark beetle that causes extensive mortality to many pine species within the forests of western North America. Driven by climate change and wildfire suppression, a recent mountain pine beetle (MPB) outbreak has spread across more than 18 million hectares, including areas to the east of the Rocky Mountains that comprise populations and species of pines not previously affected. Despite its impacts, there are few tactics available to control MPB populations.
View Article and Find Full Text PDFLong-read sequencing technologies have improved significantly since their emergence. Their read lengths, potentially spanning entire transcripts, is advantageous for reconstructing transcriptomes. Existing long-read transcriptome assembly methods are primarily reference-based and to date, there is little focus on reference-free transcriptome assembly.
View Article and Find Full Text PDFCurrent state-of-the-art de novo long read genome assemblers follow the Overlap-Layout-Consensus paradigm. While read-to-read overlap - its most costly step - was improved in modern long read genome assemblers, these tools still often require excessive RAM when assembling a typical human dataset. Our work departs from this paradigm, foregoing all-vs-all sequence alignments in favor of a dynamic data structure implemented in GoldRush, a de novo long read genome assembly algorithm with linear time complexity.
View Article and Find Full Text PDFMotivation: -mer hashing is a common operation in many foundational bioinformatics problems. However, generic string hashing algorithms are not optimized for this application. Strings in bioinformatics use specific alphabets, a trait leveraged for nucleic acid sequences in earlier work.
View Article and Find Full Text PDFWith the increasing affordability and accessibility of genome sequencing data, de novo genome assembly is an important first step to a wide variety of downstream studies and analyses. Therefore, bioinformatics tools that enable the generation of high-quality genome assemblies in a computationally efficient manner are essential. Recent developments in long-read sequencing technologies have greatly benefited genome assembly work, including scaffolding, by providing long-range evidence that can aid in resolving the challenging repetitive regions of complex genomes.
View Article and Find Full Text PDFBackground: Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, nonuniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical algorithms. The use of simulated datasets with characteristics that are true to the sequencing platform under evaluation is a cost-effective way to assess the performance of bioinformatics tools with the ground truth in a controlled environment.
View Article and Find Full Text PDFBMC Res Notes
February 2023
Objectives: Antibiotic resistance is a rising global threat to human health and is prompting researchers to seek effective alternatives to conventional antibiotics, which include antimicrobial peptides (AMPs). Recently, we have reported AMPlify, an attentive deep learning model for predicting AMPs in databases of peptide sequences. In our tests, AMPlify outperformed the state-of-the-art.
View Article and Find Full Text PDFAntimicrobial peptides (AMPs) are a diverse class of short, often cationic biological molecules that present promising opportunities in the development of new therapeutics to combat antimicrobial resistance. Newly developed in silico methods offer the ability to rapidly discover numerous novel AMPs with a variety of physiochemical properties. Herein, using the rAMPage AMP discovery pipeline, we bioinformatically identified 51 AMP candidates from amphibia and insect RNA-seq data and present their in-depth characterization.
View Article and Find Full Text PDFWe assembled the 9.8-Gbp genome of western redcedar (WRC; ), an ecologically and economically important conifer species of the Cupressaceae. The genome assembly, derived from a uniquely inbred tree produced through five generations of self-fertilization (selfing), was determined to be 86% complete by BUSCO analysis, one of the most complete genome assemblies for a conifer.
View Article and Find Full Text PDF