Mind the Gap: A Neural Network Framework for Imputing Genotypes in Non-Model Species.

Mol Ecol Resour

Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Copenhagen, Denmark.

Published: January 2025

Reduced representation sequencing (RRS) has proven to be a cost-effective solution for sequencing subsets of the genome in non-model species for large-scale studies. However, the targeted nature of RRS approaches commonly introduces large amounts of missing data, leading to reduced statistical power and biased estimates in downstream analyses. Genotype imputation, the statistical inference of missing sites across the genome, is a powerful alternative to overcome the caveats associated with missing sites. Typically, genotype imputation requires the presence of a reference panel of haplotypes, however, this is not always feasible for non-model species. In this issue of Molecular Ecology Resources, Mora-Márquez et al. (2024) develop gtImputation, an unsupervised machine learning imputation tool with an interactive GUI, which leverages information from the underlying data structure itself, without the need for a reference panel. They showcase that their method performs equally well and even surpasses existing haplotype-clustering and unsupervised machine learning algorithms, particularly for sites with low minor allele frequency (MAF) and for data sets with strong underlying population structure. This innovative framework adds to the ongoing efforts to expand the applicability of imputation to non-model species, offering the opportunity to apply varied types of analyses requiring dense sets of markers, while also maintaining lower sequencing costs.

Download full-text PDF

Source
http://dx.doi.org/10.1111/1755-0998.14066DOI Listing

Publication Analysis

Top Keywords

non-model species
16
genotype imputation
8
missing sites
8
reference panel
8
unsupervised machine
8
machine learning
8
mind gap
4
gap neural
4
neural network
4
network framework
4

Similar Publications

Mind the Gap: A Neural Network Framework for Imputing Genotypes in Non-Model Species.

Mol Ecol Resour

January 2025

Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Copenhagen, Denmark.

Reduced representation sequencing (RRS) has proven to be a cost-effective solution for sequencing subsets of the genome in non-model species for large-scale studies. However, the targeted nature of RRS approaches commonly introduces large amounts of missing data, leading to reduced statistical power and biased estimates in downstream analyses. Genotype imputation, the statistical inference of missing sites across the genome, is a powerful alternative to overcome the caveats associated with missing sites.

View Article and Find Full Text PDF

Single-cell RNA sequencing (scRNA-seq) is widely used in plant biology and is a powerful tool for studying cell identity and differentiation. However, the scarcity of known cell-type marker genes and the divergence of marker expression patterns limit the accuracy of cell-type identification and our capacity to investigate cell-type conservation in many species. To tackle this challenge, we devise a novel computational strategy called Orthologous Marker Gene Groups (OMGs), which can identify cell types in both model and non-model plant species and allows for rapid comparison of cell types across many published single-cell maps.

View Article and Find Full Text PDF

Genomic microsatellite characterization and development of polymorphic microsatellites in Eospalax baileyi.

Sci Rep

January 2025

Key Laboratory of Grassland Ecosystem (Ministry of Education), Pratacultural College, Gansu Agricultural University, Lanzhou, 730070, China.

Microsatellite markers are cost-effective, rapid, efficient, and show great advantages in in large-sample kinship analysis and population structure studies. However, microsatellite loci are seriously underdeveloped in non-model organisms. The plateau zokor (Eospalax baileyi) is a key species living underground in the Tibetan Plateau, the effective management of which has long been challenging.

View Article and Find Full Text PDF

The microRNAs and phasiRNAs of plant are small non-coding RNAs with important functions through regulating gene expression at the post-transcriptional level. However, identifying miRNAs, phasiRNAs and their target genes from numerous sequencing raw data requires multiple software and command-line operations, which are time-consuming and labor-intensive for non-model plants. Therefore, we present CsMPDB (miRNAs and phasiRNAs database of Camellia sinensis), an interactive web application with multiple analysis modules developed to visualize and explore miRNA and phasiRNA in tea plants based on 259 sRNA-seq samples and 24 degradome-seq samples in NCBI.

View Article and Find Full Text PDF

Objective: Extracting DNA is essential in wildlife genetic studies, and numerous methods are available. However, the process is costly and time-consuming for non-model organisms, including most wildlife species. Therefore, we optimized a cost-efficient protocol to extract DNA from the muscle tissue of White-tailed Deer using the DNAdvance kit (Beckman Coulter), a magnetic-bead-based approach.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!