A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 176

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3122
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 316
Function: require_once

Search for SINE repeats in the rice genome using correlation-based position weight matrices. | LitMetric

AI Article Synopsis

  • Transposable elements (TEs), specifically Short Interspersed Nuclear Elements (SINEs), play a major role in eukaryotic genomes and are challenging to identify due to rapid mutations after insertion.
  • The Highly Divergent Repeat Search Method (HDRSM) outperformed the RepeatMasker program in identifying and accurately determining the boundaries of highly divergent SINE copies in the rice genome, revealing 14,030 hits – with 5,704 missed by RepeatMasker.
  • To achieve a complete understanding of SINE distribution, using both HDRSM and RepeatMasker is advised, as HDRSM excels in detecting divergent copies while RepeatMasker is more effective for shorter, more similar copies.

Article Abstract

Background: Transposable elements (TEs) constitute a significant part of eukaryotic genomes. Short interspersed nuclear elements (SINEs) are non-autonomous TEs, which are widely represented in mammalian genomes and also found in plants. After insertion in a new position in the genome, TEs quickly accumulate mutations, which complicate their identification and annotation by modern bioinformatics methods. In this study, we searched for highly divergent SINE copies in the genome of rice (Oryza sativa subsp. japonica) using the Highly Divergent Repeat Search Method (HDRSM).

Results: The HDRSM considers correlations of neighboring symbols to construct position weight matrix (PWM) for a SINE family, which is then used to perform a search for new copies. In order to evaluate the accuracy of the method and compare it with the RepeatMasker program, we generated a set of SINE copies containing nucleotide substitutions and indels and inserted them into an artificial chromosome for analysis. The HDRSM showed better results both in terms of the number of identified inserted repeats and the accuracy of determining their boundaries. A search for the copies of 39 SINE families in the rice genome produced 14,030 hits; among them, 5704 were not detected by RepeatMasker.

Conclusions: The HDRSM could find divergent SINE copies, correctly determine their boundaries, and offer a high level of statistical significance. We also found that RepeatMasker is able to find relatively short copies of the SINE families with a higher level of similarity, while HDRSM is able to find more diverged copies. To obtain a comprehensive profile of SINE distribution in the genome, combined application of the HDRSM and RepeatMasker is recommended.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7852121PMC
http://dx.doi.org/10.1186/s12859-021-03977-0DOI Listing

Publication Analysis

Top Keywords

sine copies
12
rice genome
8
position weight
8
highly divergent
8
divergent sine
8
search copies
8
copies sine
8
sine families
8
hdrsm find
8
sine
7

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!