The effect of missing data on evolutionary analysis of sequence capture bycatch, with application to an agricultural pest.

Mol Genet Genomics

Research School of Biology, Division of Ecology and Evolution, Australian National University, Canberra, ACT, 2601, Australia.

Published: February 2024

Sequence capture is a genomic technique that selectively enriches target sequences before high throughput next-generation sequencing, to generate specific sequences of interest. Off-target or 'bycatch' data are often discarded from capture experiments, but can be leveraged to address evolutionary questions under some circumstances. Here, we investigated the effects of missing data on a variety of evolutionary analyses using bycatch from an exon capture experiment on the global pest moth, Helicoverpa armigera. We added > 200 new samples from across Australia in the form of mitogenomes obtained as bycatch from targeted sequence capture, and combined these into an additional larger dataset to total > 1000 mitochondrial cytochrome c oxidase subunit I (COI) sequences across the species' global distribution. Using discriminant analysis of principal components and Bayesian coalescent analyses, we showed that mitogenomes assembled from bycatch with up to 75% missing data were able to return evolutionary inferences consistent with higher coverage datasets and the broader literature surrounding H. armigera. For example, low-coverage sequences broadly supported the delineation of two H. armigera subspecies and also provided new insights into the potential for geographic turnover among these subspecies. However, we also identified key effects of dataset coverage and composition on our results. Thus, low-coverage bycatch data can offer valuable information for population genetic and phylodynamic analyses, but caution is required to ensure the reduced information does not introduce confounding factors, such as sampling biases, that drive inference. We encourage more researchers to consider maximizing the potential of the targeted sequence approach by examining evolutionary questions with their off-target bycatch where possible-especially in cases where no previous mitochondrial data exists-but recommend stratifying data at different genome coverage thresholds to separate sampling effects from genuine genomic signals, and to understand their implications for evolutionary research.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10881687PMC
http://dx.doi.org/10.1007/s00438-024-02097-7DOI Listing

Publication Analysis

Top Keywords

missing data
12
sequence capture
12
evolutionary questions
8
targeted sequence
8
evolutionary
6
bycatch
6
data
6
capture
5
data evolutionary
4
evolutionary analysis
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!