Publications by Flicek P

Publications by authors named "Flicek P"

Page 1 of 11

Author Correction: Large-scale benchmarking of circRNA detection tools reveals large differences in sensitivity but not in precision.

Marieke Vromman Jasper Anckaert Stefania Bortoluzzi Alessia Buratin Chia-Ying Chen

Nat Methods

January 2025

View Article and Find Full Text PDF

GENCODE: massively expanding the lncRNA catalog through capture long-read RNA sequencing.

Gazaldeep Kaur Tamara Perteghella Sílvia Carbonell-Sala Jose Gonzalez-Martinez Toby Hunt

bioRxiv

October 2024

Article Synopsis

- Accurate gene annotations are essential for interpreting how genomes function, and the GENCODE consortium has spent twenty years creating reference annotations for human and mouse genomes, serving as a vital resource for researchers globally.
- Previous annotations of long non-coding RNAs (lncRNAs) were incomplete and poorly organized, hindering research, prompting GENCODE to launch a comprehensive effort that resulted in adding nearly 18,000 novel human genes and over 22,000 novel mouse genes, significantly increasing the catalog of transcripts.
- The new annotations not only show evolutionary patterns and link to genetic variants associated with traits but also improve understanding of previously unclear genomic functions, greatly advancing research into both human and mouse genetic diseases.

View Article and Find Full Text PDF

DNA methylation patterns of transcription factor binding regions characterize their functional and evolutionary contexts.

Martina Rimoldi Ning Wang Jilin Zhang Diego Villar Duncan T Odom

Genome Biol

June 2024

Background: DNA methylation is an important epigenetic modification which has numerous roles in modulating genome function. Its levels are spatially correlated across the genome, typically high in repressed regions but low in transcription factor (TF) binding sites and active regulatory regions. However, the mechanisms establishing genome-wide and TF binding site methylation patterns are still unclear.

View Article and Find Full Text PDF

Multiple genomic solutions for local adaptation in two closely related species (sheep and goats) facing the same climatic constraints.

Badr Benjelloun Kevin Leempoel Frédéric Boyer Sylvie Stucki Ian Streeter

Mol Ecol

October 2024

The question of how local adaptation takes place remains a fundamental question in evolutionary biology. The variation of allele frequencies in genes under selection over environmental gradients remains mainly theoretical and its empirical assessment would help understanding how adaptation happens over environmental clines. To bring new insights to this issue we set up a broad framework which aimed to compare the adaptive trajectories over environmental clines in two domesticated mammal species co-distributed in diversified landscapes.

View Article and Find Full Text PDF

Transcriptomics and chromatin accessibility in multiple African population samples.

Marianne K DeGorter Page C Goddard Emre Karakoc Soumya Kundu Stephanie M Yan

bioRxiv

November 2023

Mapping the functional human genome and impact of genetic variants is often limited to European-descendent population samples. To aid in overcoming this limitation, we measured gene expression using RNA sequencing in lymphoblastoid cell lines (LCLs) from 599 individuals from six African populations to identify novel transcripts including those not represented in the hg38 reference genome. We used whole genomes from the 1000 Genomes Project and 164 Maasai individuals to identify 8,881 expression and 6,949 splicing quantitative trait loci (eQTLs/sQTLs), and 2,611 structural variants associated with gene expression (SV-eQTLs).

View Article and Find Full Text PDF

GET_PANGENES: calling pangenes from plant genome alignments confirms presence-absence variation.

Bruno Contreras-Moreira Shradha Saraf Guy Naamati Ana M Casas Sandeep S Amberkar

Genome Biol

October 2023

Crop pangenomes made from individual cultivar assemblies promise easy access to conserved genes, but genome content variability and inconsistent identifiers hamper their exploration. To address this, we define pangenes, which summarize a species coding potential and link back to original annotations. The protocol get_pangenes performs whole genome alignments (WGA) to call syntenic gene models based on coordinate overlaps.

View Article and Find Full Text PDF

The complete sequence of a human Y chromosome.

Arang Rhie Sergey Nurk Monika Cechova Savannah J Hoyt Dylan J Taylor

Nature

September 2023

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region.

View Article and Find Full Text PDF

Comparative analysis of repeat content in plant genomes, large and small.

Joris Argentin Dan Bolser Paul J Kersey Paul Flicek

Front Plant Sci

July 2023

The DNA Features pipeline is the analysis pipeline at EMBL-EBI that annotates repeat elements, including transposable elements. With Ensembl's goal to stay at the cutting edge of genome annotation, we proved that this pipeline needed an update. We then created a new analysis that allowed the Ensembl database to store the repeat classification from the PGSB repeat classification (Recat).

View Article and Find Full Text PDF

Large-scale benchmarking of circRNA detection tools reveals large differences in sensitivity but not in precision.

Marieke Vromman Jasper Anckaert Stefania Bortoluzzi Alessia Buratin Chia-Ying Chen

Nat Methods

August 2023

Article Synopsis

A study was conducted to compare 16 computational tools for detecting circular RNA (circRNA) using RNA sequencing data, identifying over 315,000 unique circRNAs across three human cell types.
The validation of 1,516 predicted circRNAs showed high precision across different methods (around 95-98%), but sensitivity varied significantly (1,372 to 58,032 predicted circRNAs).
The research emphasizes the importance of using multiple tools together for better detection sensitivity and provides suggestions for improving future circRNA detection methods.

View Article and Find Full Text PDF

A draft human pangenome reference.

Wen-Wei Liao Mobin Asri Jana Ebler Daniel Doerr Marina Haukness

Nature

May 2023

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels.

View Article and Find Full Text PDF

Atlantic herring () population structure in the Northeast Atlantic Ocean.

Sunnvør Í Kongsstovu Svein-Ole Mikalsen Eydna Í Homrum Jan Arge Jacobsen Thomas D Als

Fish Res

May 2022

The Atlantic herring L has a vast geographical distribution and a complex population structure with a few very large migratory units and many small local populations. Each population has its own spawning ground and/or time, thereby maintaining their genetic integrity. Several herring populations migrate between common feeding grounds and over-wintering areas resulting in frequent mixing of populations.

View Article and Find Full Text PDF

Analysis of genome-wide knockout mouse database identifies candidate ciliopathy genes.

Kendall Higgins Bret A Moore Zorana Berberovic Hibret A Adissu Mohammad Eskandarian

Sci Rep

December 2022

We searched a database of single-gene knockout (KO) mice produced by the International Mouse Phenotyping Consortium (IMPC) to identify candidate ciliopathy genes. We first screened for phenotypes in mouse lines with both ocular and renal or reproductive trait abnormalities. The STRING protein interaction tool was used to identify interactions between known cilia gene products and those encoded by the genes in individual knockout mouse strains in order to generate a list of "candidate ciliopathy genes.

View Article and Find Full Text PDF

GENCODE: reference annotation for the human and mouse genomes in 2023.

Adam Frankish Sílvia Carbonell-Sala Mark Diekhans Irwin Jungreis Jane E Loveland

Nucleic Acids Res

January 2023

GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function.

View Article and Find Full Text PDF

The IPD-IMGT/HLA Database.

Dominic J Barker Giuseppe Maccari Xenia Georgiou Michael A Cooper Paul Flicek

Nucleic Acids Res

January 2023

It is 24 years since the IPD-IMGT/HLA Database, http://www.ebi.ac.

View Article and Find Full Text PDF

Ensembl 2023.

Fergal J Martin M Ridwan Amode Alisha Aneja Olanrewaju Austine-Orimoloye Andrey G Azov

Nucleic Acids Res

January 2023

Ensembl (https://www.ensembl.org) has produced high-quality genomic resources for vertebrates and model organisms for more than twenty years.

View Article and Find Full Text PDF

Author Correction: Comparative and demographic analysis of orang-utan genomes.

Devin P Locke LaDeana W Hillier Wesley C Warren Kim C Worley Lynne V Nazareth

Nature

August 2022

View Article and Find Full Text PDF

Standardized annotation of translated open reading frames.

Jonathan M Mudge Jorge Ruiz-Orera John R Prensner Marie A Brunet Ferriol Calvet

Nat Biotechnol

July 2022

View Article and Find Full Text PDF

Author Correction: Perspectives on ENCODE.

Nature

May 2022

View Article and Find Full Text PDF

The Human Pangenome Project: a global resource to map genomic diversity.

Ting Wang Lucinda Antonacci-Fulton Kerstin Howe Heather A Lawson Julian K Lucas

Nature

April 2022

The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation.

View Article and Find Full Text PDF

A joint NCBI and EMBL-EBI transcript set for clinical genomics and research.

Joannella Morales Shashikant Pujar Jane E Loveland Alex Astashyn Ruth Bennett

Nature

April 2022

Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE and RefSeq launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins.

View Article and Find Full Text PDF

GA4GH: International policies and standards for data sharing across genomic research and healthcare.

Heidi L Rehm Angela J H Page Lindsay Smith Jeremy B Adams Gil Alterovitz

Cell Genom

November 2021

The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution.

View Article and Find Full Text PDF

Standards recommendations for the Earth BioGenome Project.

Mara K N Lawniczak Richard Durbin Paul Flicek Kerstin Lindblad-Toh Xiaofeng Wei

Proc Natl Acad Sci U S A

January 2022

A global international initiative, such as the Earth BioGenome Project (EBP), requires both agreement and coordination on standards to ensure that the collective effort generates rapid progress toward its goals. To this end, the EBP initiated five technical standards committees comprising volunteer members from the global genomics scientific community: Sample Collection and Processing, Sequencing and Assembly, Annotation, Analysis, and IT and Informatics. The current versions of the resulting standards documents are available on the EBP website, with the recognition that opportunities, technologies, and challenges may improve or change in the future, requiring flexibility for the EBP to meet its goals.

View Article and Find Full Text PDF

The Earth BioGenome Project 2020: Starting the clock.

Harris A Lewin Stephen Richards Erez Lieberman Aiden Miguel L Allende John M Archibald

Proc Natl Acad Sci U S A

January 2022

View Article and Find Full Text PDF

Scripting Analyses of Genomes in Ensembl Plants.

Bruno Contreras-Moreira Guy Naamati Marc Rosello James E Allen Sarah E Hunt

Methods Mol Biol

March 2022

Ensembl Plants ( http://plants.ensembl.org ) offers genome-scale information for plants, with four releases per year.

View Article and Find Full Text PDF

The European Bioinformatics Institute (EMBL-EBI) in 2021.

Gaia Cantelli Alex Bateman Cath Brooksbank Anton I Petrov Rahuman S Malik-Sheriff

Nucleic Acids Res

January 2022

The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing resources, including the COVID-19 Data Platform, trRosetta and RoseTTAfold models introduced in Pfam and InterPro, and the launch of Genome Integrations with Function and Sequence by UniProt and Ensembl. Furthermore, we highlight projects through which EMBL-EBI has contributed to the development of community-driven data standards and guidelines, including the Recommended Metadata for Biological Images (REMBI), and the BioModels Reproducibility Scorecard.

View Article and Find Full Text PDF