Publications by authors named "Wratko Hlavina"

Reference sequences and annotations serve as the foundation for many lines of research today, from organism and sequence identification to providing a core description of the genes, transcripts and proteins found in an organism's genome. Interpretation of data including transcriptomics, proteomics, sequence variation and comparative analyses based on reference gene annotations informs our understanding of gene function and possible disease mechanisms, leading to new biomedical discoveries. The Reference Sequence (RefSeq) resource created at the National Center for Biotechnology Information (NCBI) leverages both automatic processes and expert curation to create a robust set of reference sequences of genomic, transcript and protein data spanning the tree of life.

View Article and Find Full Text PDF

To explore complex biological questions, it is often necessary to access various data types from public data repositories. As the volume and complexity of biological sequence data grow, public repositories face significant challenges in ensuring that the data is easily discoverable and usable by the biological research community. To address these challenges, the National Center for Biotechnology Information (NCBI) has created NCBI Datasets.

View Article and Find Full Text PDF

Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.

View Article and Find Full Text PDF
Article Synopsis
  • FCS-GX is a new tool developed by NCBI to quickly identify and remove contamination from genomic sequences.
  • It efficiently screens genomes in a short time (0.1-10 minutes) and has high sensitivity (>95%) and specificity (>99.93%) for detecting various contaminant species.
  • The tool was used to analyze 1.6 million GenBank assemblies, uncovering 36.8 Gbp of contamination, which led to improved genome accuracy in NCBI's databases.
View Article and Find Full Text PDF

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.

View Article and Find Full Text PDF

The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly.

View Article and Find Full Text PDF
Article Synopsis
  • - The cattle genome was sequenced to enhance the understanding of ruminant biology and evolution, containing at least 22,000 genes with 14,345 orthologs shared across seven mammal species.
  • - Certain regions in the cattle genome have a higher density of segmental duplications, indicating unique evolutionary changes, particularly in genes linked to lactation and immune responses.
  • - This genome sequence serves as a valuable resource for studying mammalian evolution and improving livestock genetics, which can lead to better milk and meat production.
View Article and Find Full Text PDF

Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products. We describe its genome sequence here. This omnivorous beetle has evolved the ability to interact with a diverse chemical environment, as shown by large expansions in odorant and gustatory receptors, as well as P450 and other detoxification enzymes.

View Article and Find Full Text PDF

We report the sequence and analysis of the 814-megabase genome of the sea urchin Strongylocentrotus purpuratus, a model for developmental and systems biology. The sequencing strategy combined whole-genome shotgun and bacterial artificial chromosome (BAC) sequences. This use of BAC clones, aided by a pooling strategy, overcame difficulties associated with high heterozygosity of the genome.

View Article and Find Full Text PDF

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences.

View Article and Find Full Text PDF