Annotating protein-coding genes can be challenging, especially when searching for the best hits against multiple functional databases. This is partly because of "bad words" appearing as top hits, such as hypothetical or uncharacterized proteins. To help alleviate some of these issues, we designed a bioinformatics tool called NoBadWordsCombiner, which efficiently merges the hits from various databases, strengthening gene definitions by minimizing functional descriptions containing "bad words." Unlike other available tools, NoBadWordsCombiner is user friendly, but it does require users to have some general bioinformatics skills, including a basic understanding of the BLAST package and dash shell in Linux/Unix environments. For complete details on the use and execution of this protocol, please refer to Zhang et al. (2021a).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8521201PMC
http://dx.doi.org/10.1016/j.xpro.2021.100888DOI Listing

Publication Analysis

Top Keywords

"bad words"
12
hits multiple
8
protocol nobadwordscombiner
4
nobadwordscombiner merge
4
merge minimize
4
minimize "bad
4
words" blast
4
hits
4
blast hits
4
multiple eukaryotic
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!