Publications by authors named "Francesco Venco"

Next Generation Sequencing (NGS), a family of technologies for reading DNA and RNA, is changing biological research, and will soon change medical practice, by quickly providing sequencing data and high-level features of numerous individual genomes in different biological and clinical conditions. The availability of millions of whole genome sequences may soon become the biggest and most important "big data" problem of mankind. In this exciting framework, we recently proposed a new paradigm to raise the level of abstraction in NGS data management, by introducing a GenoMetric Query Language (GMQL) and demonstrating its usefulness through several biological query examples.

View Article and Find Full Text PDF

The Encyclopedia of DNA Elements (ENCODE) is a huge and still expanding public repository of more than 4,000 experiments and 25,000 data files, assembled by a large international consortium since 2007; unknown biological knowledge can be extracted from these huge and largely unexplored data, leading to data-driven genomic, transcriptomic, and epigenomic discoveries. Yet, search of relevant datasets for knowledge discovery is limitedly supported: metadata describing ENCODE datasets are quite simple and incomplete, and not described by a coherent underlying ontology. Here, we show how to overcome this limitation, by adopting an ENCODE metadata searching approach which uses high-quality ontological knowledge and state-of-the-art indexing technologies.

View Article and Find Full Text PDF

Motivation: Improvement of sequencing technologies and data processing pipelines is rapidly providing sequencing data, with associated high-level features, of many individual genomes in multiple biological and clinical conditions. They allow for data-driven genomic, transcriptomic and epigenomic characterizations, but require state-of-the-art 'big data' computing strategies, with abstraction levels beyond available tool capabilities.

Results: We propose a high-level, declarative GenoMetric Query Language (GMQL) and a toolkit for its use.

View Article and Find Full Text PDF
Article Synopsis
  • * SMITH, a web application developed by scientists and database experts, stores comprehensive data from NGS experiments and allows for metadata creation, facilitating easy data search and statistical analysis through a MySQL backend.
  • * The tool automates many processes, minimizing human involvement to administrative tasks, and standardizes data delivery, making it easier for biologists and analysts to access and navigate the sequencing data.
View Article and Find Full Text PDF