Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based (blast hit distribution) and two sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1038/nmeth1043 | DOI Listing |
Sci Data
January 2025
Marine Biotechnology Fish Nutrition and Health Division, Central Marine Fisheries Research Institute, Post Box No 1603 Ernakulam North PO., Kochi, 682018, Kerala, India.
Mussels, particularly Perna viridis, are vital sentinel species for toxicology and biomonitoring in environmental health. This species plays a crucial role in aquaculture and significantly impacts the fisheries sector. Despite the ecological and economic importance of this species, its omics resources are still scarce.
View Article and Find Full Text PDFSci Data
January 2025
Department of Infectious Diseases and Public Health, City University of Hong Kong, Kowloon Tong, Hong Kong.
Black carp (Mylopharyngodon piceus) is one of the "four famous domestic fishes" in China and an important economic fish in freshwater aquaculture. A high-quality genome is essential for advancing future biological research and breeding programs for this species. In this study, we aimed to generate a high-quality chromosome-level genome assembly of black carp using Nanopore and Hi-C technologies.
View Article and Find Full Text PDFSci Rep
January 2025
Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of Freiburg, Freiburg, Germany.
The characteristics of data produced by omics technologies are pivotal, as they critically influence the feasibility and effectiveness of computational methods applied in downstream analyses, such as data harmonization and differential abundance analyses. Furthermore, variability in these data characteristics across datasets plays a crucial role, leading to diverging outcomes in benchmarking studies, which are essential for guiding the selection of appropriate analysis methods in all omics fields. Additionally, downstream analysis tools are often developed and applied within specific omics communities due to the presumed differences in data characteristics attributed to each omics technology.
View Article and Find Full Text PDFSci Data
January 2025
School of Medicine, Anhui University of Science and Technology, Huainan, 232001, China.
Ultrasound is a primary diagnostic tool commonly used to evaluate internal body structures, including organs, blood vessels, the musculoskeletal system, and fetal development. Due to challenges such as operator dependence, noise, limited field of view, difficulty in imaging through bone and air, and variability across different systems, diagnosing abnormalities in ultrasound images is particularly challenging for less experienced clinicians. The development of artificial intelligence (AI) technology could assist in the diagnosis of ultrasound images.
View Article and Find Full Text PDFSci Data
January 2025
Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
As the occurrence of human diseases and conditions increase, questions continue to arise about their linkages to chemical exposure, especially for per-and polyfluoroalkyl substances (PFAS). Currently, many chemicals of concern have limited experimental information available for their use in analytical assessments. Here, we aim to increase this knowledge by providing the scientific community with multidimensional characteristics for 175 PFAS and their resulting 281 ion types.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!