Genomes exhibit large regions with segmental copy number variation, many of which include entire genes and are multiallelic. We have developed a computational method GeneToCN that counts the frequencies of gene-specific k-mers in FASTQ files and uses this information to infer copy number of the gene. We validated the copy number predictions for amylase genes (AMY1, AMY2A, AMY2B) using experimental data from digital droplet PCR (ddPCR) on 39 individuals and observed a strong correlation (R = 0.
View Article and Find Full Text PDFMotivation: Accurate estimation of next-generation sequencing depth of coverage is needed for detecting the copy number of repeated elements in the human genome. The common methods for estimating sequencing depth are based on counting the number of reads mapped to the genome or subgenomic regions. Such methods are sensitive to the mapping quality.
View Article and Find Full Text PDFis a potential chassis for microbial cell factories as this yeast can metabolise different substrates into a diverse range of natural products, but the lack of efficient synthetic biology tools hinders its applicability. In this study, the modular, versatile and efficient Golden Gate DNA assembly system (RtGGA) was adapted to the first basidiomycete, an oleaginous yeast . CCT 0783 was sequenced, and used for the GGA design.
View Article and Find Full Text PDFAssociations between leguminous plants and symbiotic nitrogen-fixing rhizobia are a classic example of mutualism between a eukaryotic host and a specific group of prokaryotic microbes. Although this symbiosis is in part species specific, different rhizobial strains may colonize the same nodule. Some rhizobial strains are commonly known as better competitors than others, but detailed analyses that aim to predict rhizobial competitive abilities based on genomes are still scarce.
View Article and Find Full Text PDFIn this study, we aimed to characterize the population structure, drug resistance mechanisms, and virulence genes of isolates in Estonia. Sixty-one and 34 isolates were collected between 2012 and 2014 across the country from various sites and sources, including farm animals and poultry ( = 53), humans ( = 12), environment ( = 24), and wild birds ( = 44). Clonal relationships of the strains were determined by whole-genome sequencing and analyzed by multi-locus sequence typing.
View Article and Find Full Text PDFKATK is a fast and accurate software tool for calling variants directly from raw next-generation sequencing reads. It uses predefined k-mers to retrieve only the reads of interest from the FASTQ file and calls genotypes by aligning retrieved reads locally. KATK does not use data about known polymorphisms and has NC (no call) as the default genotype.
View Article and Find Full Text PDFTribbles homolog 3 (TRIB3) is pseudokinase involved in intracellular regulatory processes and has been implicated in several diseases. In this article, we report that human TRIB3 promoter contains a 33-bp variable number tandem repeat (VNTR) and characterize the heterogeneity and function of this genetic element. Analysis of human populations around the world uncovered the existence of alleles ranging from 1 to 5 copies of the repeat, with 2-, 3- and 5-copy alleles being the most common but displaying considerable geographical differences in frequency.
View Article and Find Full Text PDFFast and reliable analytical methods for the identification of plants from metagenomic samples play an important role in identifying the components of complex mixtures of processed biological materials, including food, herbal products, gut contents or environmental samples. Different PCR-based methods that are commonly used for plant identification from metagenomic samples are often inapplicable due to DNA degradation, a low level of successful amplification or a lack of detection power. We introduce a method that combines metagenomic sequencing and an alignment-free -mer based approach for the identification of plant DNA in processed metagenomic samples.
View Article and Find Full Text PDFChromosomal toxin-antitoxin (TA) systems are widespread genetic elements among bacteria, yet, despite extensive studies in the last decade, their biological importance remains ambivalent. The ability of TA-encoded toxins to affect stress tolerance when overexpressed supports the hypothesis of TA systems being associated with stress adaptation. However, the deletion of TA genes has usually no effects on stress tolerance, supporting the selfish elements hypothesis.
View Article and Find Full Text PDFWhen nutrients run out, bacteria enter a dormant metabolic state. This low or undetectable metabolic activity helps bacteria to preserve their scant reserves for the future needs, yet it also diminishes their ability to scan the environment for new growth-promoting substrates. However, neighboring microbial growth is a reliable indicator of a favorable environment and can thus serve as a cue for exiting dormancy.
View Article and Find Full Text PDFThis study has evaluated the correlation between different carbapenemases detection methods on carbapenem non-susceptible strains from Northern and Eastern Europe; 31 institutions in 9 countries participated in the research project, namely Finland, Estonia, Latvia, Lithuania, Russia, St. Petersburg, Poland, Belarus, Ukraine, and Georgia. During the research program, a total of 5,001 clinical isolates were screened for any carbapenem non-susceptibility by the disk diffusion method, Vitek 2 or Phoenix system following the EUCAST guideline on detection of resistance mechanisms, version 1.
View Article and Find Full Text PDFBackground: Recently, alignment-free sequence analysis methods have gained popularity in the field of personal genomics. These methods are based on counting frequencies of short -mer sequences, thus allowing faster and more robust analysis compared to traditional alignment-based methods.
Results: We have created a fast alignment-free method, AluMine, to analyze polymorphic insertions of Alu elements in the human genome.
In this study, we compare the genetic ancestry of individuals from two as yet genetically unstudied cultural traditions in Estonia in the context of available modern and ancient datasets: 15 from the Late Bronze Age stone-cist graves (1200-400 BC) (EstBA) and 6 from the Pre-Roman Iron Age tarand cemeteries (800/500 BC-50 AD) (EstIA). We also included 5 Pre-Roman to Roman Iron Age Ingrian (500 BC-450 AD) (IngIA) and 7 Middle Age Estonian (1200-1600 AD) (EstMA) individuals to build a dataset for studying the demographic history of the northern parts of the Eastern Baltic from the earliest layer of Mesolithic to modern times. Our findings are consistent with EstBA receiving gene flow from regions with strong Western hunter-gatherer (WHG) affinities and EstIA from populations related to modern Siberians.
View Article and Find Full Text PDFPlants contain endophytic bacteria, whose communities both influence plant growth and can be an important source of probiotics. Here we used deep sequencing of a 16S rRNA gene fragment and bacterial cultivation to independently characterize the microbiomes of five plant species from divergent taxonomic orders-potato (Solanum tuberosum), carrot (Daucus sativus), beet (Beta vulgaris), neep (Brassica napus spp. napobrassica), and topinambur (Helianthus tuberosus).
View Article and Find Full Text PDFPharmacogenomics aims to tailor pharmacological treatment to each individual by considering associations between genetic polymorphisms and adverse drug effects (ADEs). With technological advances, pharmacogenomic research has evolved from candidate gene analyses to genome-wide association studies. Here, we integrate deep whole-genome sequencing (WGS) information with drug prescription and ADE data from Estonian electronic health record (EHR) databases to evaluate genome- and pharmacome-wide associations on an unprecedented scale.
View Article and Find Full Text PDFWe have developed an easy-to-use and memory-efficient method called PhenotypeSeeker that (a) identifies phenotype-specific k-mers, (b) generates a k-mer-based statistical model for predicting a given phenotype and (c) predicts the phenotype from the sequencing data of a given bacterial isolate. The method was validated on 167 Klebsiella pneumoniae isolates (virulence), 200 Pseudomonas aeruginosa isolates (ciprofloxacin resistance) and 459 Clostridium difficile isolates (azithromycin resistance). The phenotype prediction models trained from these datasets obtained the F1-measure of 0.
View Article and Find Full Text PDFBackground: We aimed to identify the main spreading clones, describe the resistance mechanisms associated with carbapenem- and/or multidrug-resistant P. aeruginosa and characterize patients at risk of acquiring these strains in Estonian hospitals.
Methods: Ninety-two non-duplicated carbapenem- and/or multidrug-resistant P.
Background: Plasmids play an important role in the dissemination of antibiotic resistance, making their detection an important task. Using whole genome sequencing (WGS), it is possible to capture both bacterial and plasmid sequence data, but short read lengths make plasmid detection a complex problem.
Results: We developed a tool named PlasmidSeeker that enables the detection of plasmids from bacterial WGS data without read assembly.
Polymerase chain reaction and different barcoding methods commonly used for plant identification from metagenomics samples are based on the amplification of a limited number of pre-selected barcoding regions. These methods are often inapplicable due to DNA degradation, low amplification success or low species discriminative power of selected genomic regions. Here we introduce a method for the rapid identification of plant taxon-specific -mers, that is applicable for the fast detection of plant taxa directly from raw sequencing reads without aligning, mapping or assembling the reads.
View Article and Find Full Text PDFSummary: Designing PCR primers for amplifying regions of eukaryotic genomes is a complicated task because the genomes contain a large number of repeat sequences and other regions unsuitable for amplification by PCR. We have developed a novel k-mer based masking method that uses a statistical model to detect and mask failure-prone regions on the DNA template prior to primer design. We implemented the software as a standalone software primer3_masker and integrated it into the primer design program Primer3.
View Article and Find Full Text PDF