A beginner's guide to manual curation of transposable elements.

Mob DNA

Department of Pathology, Tennis Court Road, Cambridge, CB1 2PQ, UK.

Published: March 2022

Background: In the study of transposable elements (TEs), the generation of a high confidence set of consensus sequences that represent the diversity of TEs found in a given genome is a key step in the path to investigate these fascinating genomic elements. Many algorithms and pipelines are available to automatically identify putative TE families present in a genome. Despite the availability of these valuable resources, producing a library of high-quality full-length TE consensus sequences largely remains a process of manual curation. This know-how is often passed on from mentor-to-mentee within research groups, making it difficult for those outside the field to access this highly specialised skill.

Results: Our manuscript attempts to fill this gap by providing a set of detailed computer protocols, software recommendations and video tutorials for those aiming to manually curate TEs. Detailed step-by-step protocols, aimed at the complete beginner, are presented in the Supplementary Methods.

Conclusions: The proposed set of programs and tools presented here will make the process of manual curation achievable and amenable to all researchers and in special to those new to the field of TEs.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8969392PMC
http://dx.doi.org/10.1186/s13100-021-00259-7DOI Listing

Publication Analysis

Top Keywords

manual curation
12
transposable elements
8
consensus sequences
8
process manual
8
beginner's guide
4
guide manual
4
curation transposable
4
elements background
4
background study
4
study transposable
4

Similar Publications

Data Checking of Asymmetric Catalysis Literature Using a Graph Neural Network Approach.

Molecules

January 2025

GSK Carbon Neutral Laboratories for Sustainable Chemistry, Jubilee Campus, University of Nottingham, Triumph Road, Nottingham NG7 2TU, UK.

The range of chemical databases available has dramatically increased in recent years, but the reliability and quality of their data are often negatively affected by human-error fidelity. The size of chemical databases can make manual data curation/checking of such sets time consuming; thus, automated tools to help this process are highly desirable. Herein, we propose the use of Graph Neural Networks (GNNs) to identifying potential stereochemical misassignments in the primary asymmetric catalysis literature.

View Article and Find Full Text PDF

Haplotyped-resolved phased assemblies aim to capture the full allelic diversity in heterozygous and polyploid species to enable accurate genetic analyses. However, building non-collapsed references still presents a challenge. Here, we used long-range interaction Hi-C reads (high-throughput chromatin conformation capture) and HiFi PacBio reads to assemble the genome of the apomictic cultivar Basilisks from Urochloa decumbens (2n = 4x = 36), an outcrossed tetraploid Paniceae grass widely cropped to feed livestock in the tropics.

View Article and Find Full Text PDF

Background: Cohort studies contain rich clinical data across large and diverse patient populations and are a common source of observational data for clinical research. Because large scale cohort studies are both time and resource intensive, one alternative is to harmonize data from existing cohorts through multicohort studies. However, given differences in variable encoding, accurate variable harmonization is difficult.

View Article and Find Full Text PDF

Standardized pipelines support and facilitate integration of diverse datasets at the Rat Genome Database.

Database (Oxford)

January 2025

Rat Genome Database, Department of Physiology, Medical College of Wisconsin, 8701 Watertown Plank Rd, Milwaukee, WI 53226, United States.

The Rat Genome Database (RGD) is a multispecies knowledgebase which integrates genetic, multiomic, phenotypic, and disease data across 10 mammalian species. To support cross-species, multiomics studies and to enhance and expand on data manually extracted from the biomedical literature by the RGD team of expert curators, RGD imports and integrates data from multiple sources. These include major databases and a substantial number of domain-specific resources, as well as direct submissions by individual researchers.

View Article and Find Full Text PDF

With the increasing maturity of genetic profiling, an essential and routine task in cancer research is to model disease outcomes/phenotypes using genetic variables. Many methods have been successfully developed. However, oftentimes, empirical performance is unsatisfactory because of a "lack of information.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!