Publications by Dan Bolser | LitMetric

Publications by authors named "Dan Bolser"

Page 1 of 2

Korea4K: whole genome sequences of 4,157 Koreans with 107 phenotypes derived from extensive health check-ups.

Sungwon Jeon Hansol Choi Yeonsu Jeon Whan-Hyuk Choi Hyunjoo Choi Dan Bolser

Gigascience

January 2024

Background: Phenome-wide association studies (PheWASs) have been conducted on Asian populations, including Koreans, but many were based on chip or exome genotyping data. Such studies have limitations regarding whole genome-wide association analysis, making it crucial to have genome-to-phenome association information with the largest possible whole genome and matched phenome data to conduct further population-genome studies and develop health care services based on population genomics.

Results: Here, we present 4,157 whole genome sequences (Korea4K) coupled with 107 health check-up parameters as the largest genomic resource of the Korean Genome Project.

View Article and Find Full Text PDF

Comparative analysis of repeat content in plant genomes, large and small.

Joris Argentin Dan Bolser Paul J Kersey Paul Flicek

Front Plant Sci

July 2023

The DNA Features pipeline is the analysis pipeline at EMBL-EBI that annotates repeat elements, including transposable elements. With Ensembl's goal to stay at the cutting edge of genome annotation, we proved that this pipeline needed an update. We then created a new analysis that allowed the Ensembl database to store the repeat classification from the PGSB repeat classification (Recat).

View Article and Find Full Text PDF

LT1, an ONT long-read-based assembly scaffolded with Hi-C data and polished with short reads.

Hui-Su Kim Asta Blazyte Sungwon Jeon Changhan Yoon Yeonkyung Kim Dan Bolser

GigaByte

May 2022

We present LT1, the first high-quality human reference genome from the Baltic States. LT1 is a female human reference genome assembly, constructed using 57× nanopore long reads and polished using 47× short paired-end reads. We utilized 72 GB of Hi-C chromosomal mapping data for scaffolding, to maximize assembly contiguity and accuracy.

View Article and Find Full Text PDF

A chromosome-scale genome assembly and annotation of the spring orchid (Cymbidium goeringii).

Oksung Chung Jungeun Kim Dan Bolser Hak-Min Kim Je Hoon Jun

Mol Ecol Resour

April 2022

Cymbidium goeringii, commonly known as the spring orchid, has long been favoured for horticultural purposes in Asian countries. It is a popular orchid with much demand for improvement and development for its valuable varieties. Until now, its reference genome has not been published despite its popularity and conservation efforts.

View Article and Find Full Text PDF

Regional TMPRSS2 V197M Allele Frequencies Are Correlated with COVID-19 Case Fatality Rates.

Sungwon Jeon Asta Blazyte Changhan Yoon Hyojung Ryu Yeonsu Jeon Dan Bolser

Mol Cells

September 2021

Coronavirus disease, COVID-19 (coronavirus disease 2019), caused by SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2), has a higher case fatality rate in European countries than in others, especially East Asian ones. One potential explanation for this regional difference is the diversity of the viral infection efficiency. Here, we analyzed the allele frequencies of a nonsynonymous variant rs12329760 (V197M) in the gene, a key enzyme essential for viral infection and found a significant association between the COVID-19 case fatality rate and the V197M allele frequencies, using over 200,000 present-day and ancient genomic samples.

View Article and Find Full Text PDF

Comparative analysis of 7 short-read sequencing platforms using the Korean Reference Genome: MGI and Illumina sequencing benchmark for whole-genome sequencing.

Hak-Min Kim Sungwon Jeon Oksung Chung Je Hoon Jun Hui-Su Kim Dan M Bolser

Gigascience

March 2021

Background: DNBSEQ-T7 is a new whole-genome sequencer developed by Complete Genomics and MGI using DNA nanoball and combinatorial probe anchor synthesis technologies to generate short reads at a very large scale-up to 60 human genomes per day. However, it has not been objectively and systematically compared against Illumina short-read sequencers.

Findings: By using the same KOREF sample, the Korean Reference Genome, we have compared 7 sequencing platforms including BGISEQ-500, DNBSEQ-T7, HiSeq2000, HiSeq2500, HiSeq4000, HiSeqX10, and NovaSeq6000.

View Article and Find Full Text PDF

Welfare Genome Project: A Participatory Korean Personal Genome Project With Free Health Check-Up and Genetic Report Followed by Counseling.

Yeonsu Jeon Sungwon Jeon Asta Blazyte Yeo Jin Kim Jasmin Junseo Lee Dan Bolser

Front Genet

February 2021

The Welfare Genome Project (WGP) provided 1,000 healthy Korean volunteers with detailed genetic and health reports to test the social perception of integrating personal genetic and healthcare data at a large-scale. WGP was launched in 2016 in the Ulsan Metropolitan City as the first large-scale genome project with public participation in Korea. The project produced a set of genetic materials, genotype information, clinical data, and lifestyle survey answers from participants aged 20-96.

View Article and Find Full Text PDF

Korean Genome Project: 1094 Korean personal genomes with clinical information.

Sungwon Jeon Youngjune Bhak Yeonsong Choi Yeonsu Jeon Seunghoon Kim Dan Bolser

Sci Adv

May 2020

We present the initial phase of the Korean Genome Project (Korea1K), including 1094 whole genomes (sequenced at an average depth of 31×), along with data of 79 quantitative clinical traits. We identified 39 million single-nucleotide variants and indels of which half were singleton or doubleton and detected Korean-specific patterns based on several types of genomic variations. A genome-wide association study illustrated the power of whole-genome sequences for analyzing clinical traits, identifying nine more significant candidate alleles than previously reported from the same linkage disequilibrium blocks.

View Article and Find Full Text PDF

Efficient mutation screening for cervical cancers from circulating tumor DNA in blood.

Sun-Young Lee Dong-Kyu Chae Sung-Hun Lee Yohan Lim Jahyun An Dan Bolser

BMC Cancer

July 2020

Background: Early diagnosis and continuous monitoring are necessary for an efficient management of cervical cancers (CC). Liquid biopsy, such as detecting circulating tumor DNA (ctDNA) from blood, is a simple, non-invasive method for testing and monitoring cancer markers. However, tumor-specific alterations in ctDNA have not been extensively investigated or compared to other circulating biomarkers in the diagnosis and monitoring of the CC.

View Article and Find Full Text PDF

Decoding a highly mixed Kazakh genome.

Madina Seidualy Asta Blazyte Sungwon Jeon Youngjune Bhak Yeonsu Jeon Dan Bolser

Hum Genet

May 2020

We provide a Kazakh whole genome sequence (MJS) and analyses with the largest comparative Kazakh genomic data available to date. We found 102,240 novel SNVs and a high level of heterozygosity. ADMIXTURE analysis confirmed a significant proportion of variations in this individual coming from all continents except Africa and Oceania.

View Article and Find Full Text PDF

Ensembl Genomes 2020-enabling non-vertebrate genomic research.

Kevin L Howe Bruno Contreras-Moreira Nishadi De Silva Gareth Maslen Wasiu Akanni Dan M Bolser

Nucleic Acids Res

January 2020

Article Synopsis

Ensembl Genomes is an online resource that offers genome-scale data specifically for non-vertebrate species, complementing the vertebrate data available through the Ensembl project.
The resource provides a consistent interface for accessing various genomic data, including genome sequences, gene models, and genetic variations, which is updated four times a year.
Recent developments have focused on better organizing orthologues and paralogues, enhancing gene expression data, particularly in plants, and strengthening integration with the Ensembl project to manage the growing amount of genomic data.

View Article and Find Full Text PDF

Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species.

Paul Julian Kersey James E Allen Alexis Allot Matthieu Barba Sanjay Boddu Dan M Bolser

Nucleic Acids Res

January 2018

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.

View Article and Find Full Text PDF

An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations.

Bernardo J Clavijo Luca Venturini Christian Schudoma Gonzalo Garcia Accinelli Gemy Kaithakottil Dan M Bolser

Genome Res

May 2017

Advances in genome sequencing and assembly technologies are generating many high-quality genome sequences, but assemblies of large, repeat-rich polyploid genomes, such as that of bread wheat, remain fragmented and incomplete. We have generated a new wheat whole-genome shotgun sequence assembly using a combination of optimized data types and an assembly algorithm designed to deal with large and complex genomes. The new assembly represents >78% of the genome with a scaffold N50 of 88.

View Article and Find Full Text PDF

Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomic Data.

Dan M Bolser Daniel M Staines Emily Perry Paul J Kersey

Methods Mol Biol

January 2018

Ensembl Plants ( http://plants.ensembl.org ) is an integrative resource presenting genome-scale information for 39 sequenced plant species.

View Article and Find Full Text PDF

Tools and data services registry: a community effort to document bioinformatics resources.

Jon Ison Kristoffer Rapacki Hervé Ménager Matúš Kalaš Emil Rydza Dan Bolser

Nucleic Acids Res

January 2016

Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information.

View Article and Find Full Text PDF

Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomics Data.

Dan Bolser Daniel M Staines Emily Pritchard Paul Kersey

Methods Mol Biol

May 2016

Ensembl Plants ( http://plants.ensembl.org ) is an integrative resource presenting genome-scale information for a growing number of sequenced plant species (currently 33).

View Article and Find Full Text PDF

Triticeae resources in Ensembl Plants.

Dan M Bolser Arnaud Kerhornou Brandon Walts Paul Kersey

Plant Cell Physiol

January 2015

Recent developments in DNA sequencing have enabled the large and complex genomes of many crop species to be determined for the first time, even those previously intractable due to their polyploid nature. Indeed, over the course of the last 2 years, the genome sequences of several commercially important cereals, notably barley and bread wheat, have become available, as well as those of related wild species. While still incomplete, comparison with other, more completely assembled species suggests that coverage of genic regions is likely to be high.

View Article and Find Full Text PDF

De novo transcriptome assembly and analyses of gene expression during photomorphogenesis in diploid wheat Triticum monococcum.

Samuel E Fox Matthew Geniza Mamatha Hanumappa Sushma Naithani Chris Sullivan Dan Bolser

PLoS One

January 2015

Background: Triticum monococcum (2n) is a close ancestor of T. urartu, the A-genome progenitor of cultivated hexaploid wheat, and is therefore a useful model for the study of components regulating photomorphogenesis in diploid wheat. In order to develop genetic and genomic resources for such a study, we constructed genome-wide transcriptomes of two Triticum monococcum subspecies, the wild winter wheat T.

View Article and Find Full Text PDF

Gramene 2013: comparative plant genomics resources.

Marcela K Monaco Joshua Stein Sushma Naithani Sharon Wei Palitha Dharmawardhana Dan Bolser

Nucleic Acids Res

January 2014

Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38.

View Article and Find Full Text PDF

The NGS WikiBook: a dynamic collaborative online training effort with long-term sustainability.

Jing-Woei Li Dan Bolser Magnus Manske Federico Manuel Giorgi Nikolay Vyahhi

Brief Bioinform

September 2013

Next-generation sequencing (NGS) is increasingly being adopted as the backbone of biomedical research. With the commercialization of various affordable desktop sequencers, NGS will be reached by increasing numbers of cellular and molecular biologists, necessitating community consensus on bioinformatics protocols to tackle the exponential increase in quantity of sequence data. The current resources for NGS informatics are extremely fragmented.

View Article and Find Full Text PDF

EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats.

Jon Ison Matús Kalas Inge Jonassen Dan Bolser Mahmut Uludag

Bioinformatics

May 2013

Motivation: Advancing the search, publication and integration of bioinformatics tools and resources demands consistent machine-understandable descriptions. A comprehensive ontology allowing such descriptions is therefore required.

Results: EDAM is an ontology of bioinformatics operations (tool or workflow functions), types of data and identifiers, application domains and data formats.

View Article and Find Full Text PDF

Analysis of the bread wheat genome using whole-genome shotgun sequencing.

Rachel Brenchley Manuel Spannagl Matthias Pfeifer Gary L A Barker Rosalinda D'Amore Dan Bolser

Nature

November 2012

Bread wheat (Triticum aestivum) is a globally important crop, accounting for 20 per cent of the calories consumed by humans. Major efforts are underway worldwide to increase wheat production by extending genetic diversity and analysing key traits, and genomic resources can accelerate progress. But so far the very large size and polyploid complexity of the bread wheat genome have been substantial barriers to genome analysis.

View Article and Find Full Text PDF

Identification and localisation of the NB-LRR gene family within the potato genome.

Florian Jupe Leighton Pritchard Graham J Etherington Katrin Mackenzie Peter J A Cock Dan Bolser

BMC Genomics

February 2012

Background: The potato genome sequence derived from the Solanum tuberosum Group Phureja clone DM1-3 516 R44 provides unparalleled insight into the genome composition and organisation of this important crop. A key class of genes that comprises the vast majority of plant resistance (R) genes contains a nucleotide-binding and leucine-rich repeat domain, and is collectively known as NB-LRRs.

Results: As part of an effort to accelerate the process of functional R gene isolation, we performed an amino acid motif based search of the annotated potato genome and identified 438 NB-LRR type genes among the ~39,000 potato gene models.

View Article and Find Full Text PDF

MetaBase--the wiki-database of biological databases.

Dan M Bolser Pierre-Yves Chibon Nicolas Palopoli Sungsam Gong Daniel Jacob

Nucleic Acids Res

January 2012

Biology is generating more data than ever. As a result, there is an ever increasing number of publicly available databases that analyse, integrate and summarize the available data, providing an invaluable resource for the biological community. As this trend continues, there is a pressing need to organize, catalogue and rate these resources, so that the information they contain can be most effectively exploited.

View Article and Find Full Text PDF

The SEQanswers wiki: a wiki database of tools for high-throughput sequencing analysis.

Jing-Woei Li Keith Robison Marcel Martin Andreas Sjödin Björn Usadel Dan M Bolser

Nucleic Acids Res

January 2012

Recent advances in sequencing technology have created unprecedented opportunities for biological research. However, the increasing throughput of these technologies has created many challenges for data management and analysis. As the demand for sophisticated analyses increases, the development time of software and algorithms is outpacing the speed of traditional publication.

View Article and Find Full Text PDF