The ability to manipulate sequence, alignment, and phylogenetic tree files has become an increasingly important skill in the life sciences, whether to generate summary information or to prepare data for further downstream analysis. The command line can be an extremely powerful environment for interacting with these resources, but only if the user has the appropriate general-purpose tools on hand. BuddySuite is a collection of four independent yet interrelated command-line toolkits that facilitate each step in the workflow of sequence discovery, curation, alignment, and phylogenetic reconstruction. Most common sequence, alignment, and tree file formats are automatically detected and parsed, and over 100 tools have been implemented for manipulating these data. The project has been engineered to easily accommodate the addition of new tools, is written in the popular programming language Python, and is hosted on the Python Package Index and GitHub to maximize accessibility. Documentation for each BuddySuite tool, including usage examples, is available at http://tiny.cc/buddysuite_wiki. All software is open source and freely available through http://research.nhgri.nih.gov/software/BuddySuite.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5850830 | PMC |
http://dx.doi.org/10.1093/molbev/msx089 | DOI Listing |
Front Plant Sci
December 2024
Tobacco Research Institute, Chinese Academy of Agricultural Sciences, Qingdao, China.
Genomic prediction is a powerful approach for improving genetic gain and shortening the breeding cycles in animal and crop breeding programs. A series of statistical and machine learning models has been developed to increase the prediction performance continuously. However, the application of these models requires advanced R programming skills and command-line tools to perform quality control, format input files, and install packages and dependencies, posing challenges for breeders.
View Article and Find Full Text PDFBioinform Adv
December 2024
Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States.
Motivation: The expansion of genetic association data from genome-wide association studies has increased the importance of methodologies like Polygenic Risk Scores (PRS) and Mendelian Randomization (MR) in genetic epidemiology. However, their application is often impeded by complex, multi-step workflows requiring specialized expertise and the use of disparate tools with varying data formatting requirements. Existing solutions are frequently standalone packages or command-line based-largely due to dependencies on tools like PLINK-limiting accessibility for researchers without computational experience.
View Article and Find Full Text PDFBioinform Adv
November 2024
Department of Biology and Evolution of Marine Organisms (BEOM), Stazione Zoologica Anton Dohrn, Roma I-00198, Italy.
Motivation: Accurate sequence length profiling is essential in bioinformatics, particularly in genomics and proteomics. Existing tools like SeqKit and the Trinity toolkit provide basic sequence statistics but often fall short in offering comprehensive analytics and plotting options. For instance, SeqKit is a very complete and fast tool for sequence analysis, delivering useful metrics (e.
View Article and Find Full Text PDFBMC Bioinformatics
December 2024
Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain.
Background: Phenotypic data comparison is essential for disease association studies, patient stratification, and genotype-phenotype correlation analysis. To support these efforts, the Global Alliance for Genomics and Health (GA4GH) established Phenopackets v2 and Beacon v2 standards for storing, sharing, and discovering genomic and phenotypic data. These standards provide a consistent framework for organizing biological data, simplifying their transformation into computer-friendly formats.
View Article and Find Full Text PDFJ Chem Inf Model
November 2024
Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.
The construction, management, and analysis of large molecular libraries is critical in many areas of modern chemistry. Herein, we introduce the MOLecular LIibrary toolkit, "molli", which is a Python 3 cheminformatics module that provides a streamlined interface for manipulating large libraries. Three-dimensional, combinatorial molecule libraries can be expanded directly from two-dimensional chemical structure fragments stored in CDXML files with high stereochemical fidelity.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!