Publications by authors named "Inanc Birol"

Background: Advanced long-read sequencing technologies, such as those from Oxford Nanopore Technologies and Pacific Biosciences, are finding a wide use in de novo genome sequencing projects. However, long reads typically have higher error rates relative to short reads. If left unaddressed, subsequent genome assemblies may exhibit high base error rates that compromise the reliability of downstream analysis.

View Article and Find Full Text PDF

Human papillomavirus (HPV) integration has been implicated in transforming HPV infection into cancer. To resolve genome dysregulation associated with HPV integration, we performed Oxford Nanopore long-read sequencing on 72 cervical cancer genomes from an Ugandan dataset that was previously characterized using short-read sequencing. We found recurrent structural rearrangement patterns at HPV integration events, which we categorized as: del(etion)-like, dup(lication)-like, translocation, multibreakpoint, or repeat region integrations.

View Article and Find Full Text PDF

Cannabis plants produce a spectrum of secondary metabolites, encompassing cannabinoids and more than 300 non-cannabinoid compounds. Among these, anthocyanins have important functions in plants and also have well documented health benefits. Anthocyanins are largely responsible for the red/purple color phenotypes in plants.

View Article and Find Full Text PDF

Antimicrobial resistance is a critical public health concern, necessitating the exploration of alternative treatments. While antimicrobial peptides (AMPs) show promise, assessing their toxicity using traditional wet lab methods is both time-consuming and costly. We introduce tAMPer, a novel multi-modal deep learning model designed to predict peptide toxicity by integrating the underlying amino acid sequence composition and the three-dimensional structure of peptides.

View Article and Find Full Text PDF

Antibiotic resistance is recognized as an imminent and growing global health threat. New antimicrobial drugs are urgently needed due to the decreasing effectiveness of conventional small-molecule antibiotics. Antimicrobial peptides (AMPs), a class of host defense peptides, are emerging as promising candidates to address this need.

View Article and Find Full Text PDF

With the increasing availability of long-read sequencing data, high-quality human genome assemblies, and software for fully characterizing tandem repeats, genome-wide genotyping of tandem repeat loci on a population scale becomes more feasible. Such efforts not only expand our knowledge of the tandem repeat landscape in the human genome but also enhance our ability to differentiate pathogenic tandem repeat mutations from benign polymorphisms. To this end, we analyzed 272 genomes assembled using datasets from three public initiatives that employed different long-read sequencing technologies.

View Article and Find Full Text PDF
Article Synopsis
  • * They generated over 427 million long-read sequences and found that longer, more accurate sequences yield better transcript detection, while increased read depth enhances quantification.
  • * The study suggests that using reference-based tools works best for well-annotated genomes and recommends incorporating extra data to better identify rare transcripts, providing a benchmark for improving transcriptome analysis techniques in the future.
View Article and Find Full Text PDF

Nucleotide-binding domain and leucine-rich repeat (NLR) immune receptor genes form a major line of defense in plants, acting in both pathogen recognition and resistance machinery activation. NLRs are reported to form large gene clusters in limber pine (Pinus flexilis), but it is unknown how widespread this genomic architecture may be among the extant species of conifers (Pinophyta). We used comparative genomic analyses to assess patterns in the abundance, diversity, and genomic distribution of NLR genes.

View Article and Find Full Text PDF

Enabled by the explosion of data and substantial increase in computational power, deep learning has transformed fields such as computer vision and natural language processing (NLP) and it has become a successful method to be applied to many transcriptomic analysis tasks. A core advantage of deep learning is its inherent capability to incorporate feature computation within the machine learning models. This results in a comprehensive and machine-readable representation of sequences, facilitating the downstream classification and clustering tasks.

View Article and Find Full Text PDF

As amphibians undergo thyroid hormone (TH)-dependent metamorphosis from an aquatic tadpole to the terrestrial frog, their innate immune system must adapt to the new environment. Skin is a primary line of defense, yet this organ undergoes extensive remodelling during metamorphosis and how it responds to TH is poorly understood. Temperature modulation, which regulates metamorphic timing, is a unique way to uncover early TH-induced transcriptomic events.

View Article and Find Full Text PDF

Conifers are long-lived and slow-evolving, thus requiring effective defences against their fast-evolving insect natural enemies. The copy number variation (CNV) of two key acetophenone biosynthesis genes Ugt5/Ugt5b and βglu-1 may provide a plausible mechanism underlying the constitutively variable defence in white spruce (Picea glauca) against its primary defoliator, spruce budworm. This study develops a long-insert sequence capture probe set (Picea_hung_p1.

View Article and Find Full Text PDF

Motivation: -mer hashing is a common operation in many foundational bioinformatics problems. However, generic string hashing algorithms are not optimized for this application. Strings in bioinformatics use specific alphabets, a trait leveraged for nucleic acid sequences in earlier work.

View Article and Find Full Text PDF
Article Synopsis
  • HPV integration may transform an infection into cancer, and studying its effects has been challenging with traditional sequencing methods.
  • Using long-read sequencing on 63 cervical cancer genomes, researchers identified six types of HPV integration events and discovered a phenomenon called heterologous integration, where 24% of integrants had variable HPV copies.
  • The study also revealed that the methylation status of HPV integrations affects gene expression and the surrounding human epigenome, offering insights into how integrated HPV contributes to cervical cancer development.
View Article and Find Full Text PDF
Article Synopsis
  • The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium aimed to evaluate long-read sequencing for analyzing transcripts by generating over 427 million sequences from various species.
  • The findings highlighted that longer, accurate sequences yield better transcript identification, while increased read depth enhances quantification accuracy, particularly in well-annotated genomes.
  • The study serves as a benchmark for transcriptome analysis strategies and suggests using additional data for detecting rare transcripts or employing reference-free methods.
View Article and Find Full Text PDF

Background: The mountain pine beetle, Dendroctonus ponderosae, is an irruptive bark beetle that causes extensive mortality to many pine species within the forests of western North America. Driven by climate change and wildfire suppression, a recent mountain pine beetle (MPB) outbreak has spread across more than 18 million hectares, including areas to the east of the Rocky Mountains that comprise populations and species of pines not previously affected. Despite its impacts, there are few tactics available to control MPB populations.

View Article and Find Full Text PDF

Long-read sequencing technologies have improved significantly since their emergence. Their read lengths, potentially spanning entire transcripts, is advantageous for reconstructing transcriptomes. Existing long-read transcriptome assembly methods are primarily reference-based and to date, there is little focus on reference-free transcriptome assembly.

View Article and Find Full Text PDF

Current state-of-the-art de novo long read genome assemblers follow the Overlap-Layout-Consensus paradigm. While read-to-read overlap - its most costly step - was improved in modern long read genome assemblers, these tools still often require excessive RAM when assembling a typical human dataset. Our work departs from this paradigm, foregoing all-vs-all sequence alignments in favor of a dynamic data structure implemented in GoldRush, a de novo long read genome assembly algorithm with linear time complexity.

View Article and Find Full Text PDF

Motivation: -mer hashing is a common operation in many foundational bioinformatics problems. However, generic string hashing algorithms are not optimized for this application. Strings in bioinformatics use specific alphabets, a trait leveraged for nucleic acid sequences in earlier work.

View Article and Find Full Text PDF

With the increasing affordability and accessibility of genome sequencing data, de novo genome assembly is an important first step to a wide variety of downstream studies and analyses. Therefore, bioinformatics tools that enable the generation of high-quality genome assemblies in a computationally efficient manner are essential. Recent developments in long-read sequencing technologies have greatly benefited genome assembly work, including scaffolding, by providing long-range evidence that can aid in resolving the challenging repetitive regions of complex genomes.

View Article and Find Full Text PDF

Background: Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, nonuniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical algorithms. The use of simulated datasets with characteristics that are true to the sequencing platform under evaluation is a cost-effective way to assess the performance of bioinformatics tools with the ground truth in a controlled environment.

View Article and Find Full Text PDF

Objectives: Antibiotic resistance is a rising global threat to human health and is prompting researchers to seek effective alternatives to conventional antibiotics, which include antimicrobial peptides (AMPs). Recently, we have reported AMPlify, an attentive deep learning model for predicting AMPs in databases of peptide sequences. In our tests, AMPlify outperformed the state-of-the-art.

View Article and Find Full Text PDF

Antimicrobial peptides (AMPs) are a diverse class of short, often cationic biological molecules that present promising opportunities in the development of new therapeutics to combat antimicrobial resistance. Newly developed in silico methods offer the ability to rapidly discover numerous novel AMPs with a variety of physiochemical properties. Herein, using the rAMPage AMP discovery pipeline, we bioinformatically identified 51 AMP candidates from amphibia and insect RNA-seq data and present their in-depth characterization.

View Article and Find Full Text PDF

We assembled the 9.8-Gbp genome of western redcedar (WRC; ), an ecologically and economically important conifer species of the Cupressaceae. The genome assembly, derived from a uniquely inbred tree produced through five generations of self-fertilization (selfing), was determined to be 86% complete by BUSCO analysis, one of the most complete genome assemblies for a conifer.

View Article and Find Full Text PDF

A PHP Error was encountered

Severity: Warning

Message: fopen(/var/lib/php/sessions/ci_sessioniplbscg7eeg8g2c00698e638pkfn5l53): Failed to open stream: No space left on device

Filename: drivers/Session_files_driver.php

Line Number: 177

Backtrace:

File: /var/www/html/index.php
Line: 316
Function: require_once

A PHP Error was encountered

Severity: Warning

Message: session_start(): Failed to read session data: user (path: /var/lib/php/sessions)

Filename: Session/Session.php

Line Number: 137

Backtrace:

File: /var/www/html/index.php
Line: 316
Function: require_once