The Pfam protein families database is a comprehensive collection of protein domains and families used for genome annotation and protein structure and function analysis (https://www.ebi.ac.
View Article and Find Full Text PDFThe InterPro database (https://www.ebi.ac.
View Article and Find Full Text PDFThe GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs.
View Article and Find Full Text PDFThe InterPro database (https://www.ebi.ac.
View Article and Find Full Text PDFThe Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species.
View Article and Find Full Text PDFThe Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms.
View Article and Find Full Text PDFThe Ensembl project (https://www.ensembl.org) makes key genomic data sets available to the entire scientific community without restrictions.
View Article and Find Full Text PDFThe accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation.
View Article and Find Full Text PDFLoss-of-function studies are key for investigating gene function, and CRISPR technology has made genome editing widely accessible in model organisms and cells. However, conditional gene inactivation in diploid cells is still difficult to achieve. Here, we present CRISPR-FLIP, a strategy that provides an efficient, rapid and scalable method for biallelic conditional gene knockouts in diploid or aneuploid cells, such as pluripotent stem cells, 3D organoids and cell lines, by co-delivery of CRISPR-Cas9 and a universal conditional intronic cassette.
View Article and Find Full Text PDFUnlabelled: The rapid development of CRISPR-Cas9 mediated genome editing techniques has given rise to a number of online and stand-alone tools to find and score CRISPR sites for whole genomes. Here we describe the Wellcome Trust Sanger Institute Genome Editing database (WGE), which uses novel methods to compute, visualize and select optimal CRISPR sites in a genome browser environment. The WGE database currently stores single and paired CRISPR sites and pre-calculated off-target information for CRISPRs located in the mouse and human exomes.
View Article and Find Full Text PDFWith the amount of chemical data being produced and reported in the literature growing at a fast pace, it is increasingly important to efficiently retrieve this information. To tackle this issue text mining tools have been applied, but despite their good performance they still provide many errors that we believe can be filtered by using semantic similarity. Thus, this paper proposes a novel method that receives the results of chemical entity identification systems, such as Whatizit, and exploits the semantic relationships in ChEBI to measure the similarity between the entities found in the text.
View Article and Find Full Text PDFChemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method.
View Article and Find Full Text PDFAccording to previous reports, flavonoids and nutraceuticals correct defective electrolyte transport in cystic fibrosis (CF) airways. Traditional medicinal plants from China and Thailand contain phytoflavonoids and other bioactive compounds. We examined herbal extracts of the common Thai medicinal euphorbiaceous plant Phyllanthus acidus for their potential effects on epithelial transport.
View Article and Find Full Text PDF