Pac Symp Biocomput
December 2024
Dynamic changes in protein glycosylation impact human health and disease progression. However, current resources that capture disease and phenotype information focus primarily on the macromolecules within the central dogma of molecular biology (DNA, RNA, proteins). To gain a better understanding of organisms, there is a need to capture the functional impact of glycans and glycosylation on biological processes.
View Article and Find Full Text PDFLarge Language Models (LLMs) are a type of artificial intelligence that has been revolutionizing various fields, including biomedicine. They have the capability to process and analyze large amounts of data, understand natural language, and generate new content, making them highly desirable in many biomedical applications and beyond. In this workshop, we aim to introduce the attendees to an in-depth understanding of the rise of LLMs in biomedicine, and how they are being used to drive innovation and improve outcomes in the field, along with associated challenges and pitfalls.
View Article and Find Full Text PDFMotivation: Figures in biomedical papers communicate essential information with the potential to identify relevant documents in biomedical and clinical settings. However, academic search interfaces mainly search over text fields.
Results: We describe a search system for biomedical documents that leverages image modalities and an existing index server.
The coronavirus disease 2019 (COVID-19) pandemic has compelled biomedical researchers to communicate data in real time to establish more effective medical treatments and public health policies. Nontraditional sources such as preprint publications, i.e.
View Article and Find Full Text PDFOver the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages.
View Article and Find Full Text PDFAutomated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g.
View Article and Find Full Text PDFThe UniProt knowledgebase is a public database for protein sequence and function, covering the tree of life and over 220 million protein entries. Now, the whole community can use a new crowdsourcing annotation system to help scale up UniProt curation and receive proper attribution for their biocuration work.
View Article and Find Full Text PDFMotivation: Biomedical research findings are typically disseminated through publications. To simplify access to domain-specific knowledge while supporting the research community, several biomedical databases devote significant effort to manual curation of the literature-a labor intensive process. The first step toward biocuration requires identifying articles relevant to the specific area on which the database focuses.
View Article and Find Full Text PDFmicroRNAs (miRNAs) are essential gene regulators, and their dysregulation often leads to diseases. Easy access to miRNA information is crucial for interpreting generated experimental data, connecting facts across publications and developing new hypotheses built on previous knowledge. Here, we present extracting miRNA Information from Text (emiRIT), a text-miningbased resource, which presents miRNA information mined from the literature through a user-friendly interface.
View Article and Find Full Text PDFThe Protein Ontology (PRO) provides an ontological representation of protein-related entities, ranging from protein families to proteoforms to complexes. Protein Ontology Linked Open Data (LOD) exposes, shares, and connects knowledge about protein-related entities on the Semantic Web using Resource Description Framework (RDF), thus enabling integration with other Linked Open Data for biological knowledge discovery. For example, proteins (or variants thereof) can be retrieved on the basis of specific disease associations.
View Article and Find Full Text PDFNumerous cell surface receptors and receptor-like proteins (RLPs) undergo activation or deactivation via a transmembrane domain (TMD). A subset of plant RLPs distinctively localizes to the plasma membrane-lined pores called plasmodesmata. Those RLPs include the Arabidopsis thaliana Plasmodesmata-located protein (PDLP) 5, which is well known for its vital function regulating plasmodesmal gating and molecular movement between cells.
View Article and Find Full Text PDFMotivation: The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes.
View Article and Find Full Text PDFiPTMnet is a bioinformatics resource that integrates protein post-translational modification (PTM) data from text mining and curated databases and ontologies to aid in knowledge discovery and scientific study. The current iPTMnet website can be used for querying and browsing rich PTM information but does not support automated iPTMnet data integration with other tools. Hence, we have developed a RESTful API utilizing the latest developments in cloud technologies to facilitate the integration of iPTMnet into existing tools and pipelines.
View Article and Find Full Text PDFIn the UniProt Knowledgebase (UniProtKB), publications providing evidence for a specific protein annotation entry are organized across different categories, such as function, interaction and expression, based on the type of data they contain. To provide a systematic way of categorizing computationally mapped bibliographies in UniProt, we investigate a convolutional neural network (CNN) model to classify publications with accession annotations according to UniProtKB categories. The main challenge of categorizing publications at the accession annotation level is that the same publication can be annotated with multiple proteins and thus be associated with different category sets according to the evidence provided for the protein.
View Article and Find Full Text PDFPublished literature is an important source of knowledge supporting biomedical research. Given the large and increasing number of publications, automated document classification plays an important role in biomedical research. Effective biomedical document classifiers are especially needed for bio-databases, in which the information stems from many thousands of biomedical publications that curators must read in detail and annotate.
View Article and Find Full Text PDFMethods focused on predicting 'global' annotations for proteins (such as molecular function, biological process and presence of domains or membership in a family) have reached a relatively mature stage. Methods to provide fine-grained 'local' annotation of functional sites (at the level of individual amino acid) are now coming to the forefront, especially in light of the rapid accumulation of genetic variant data. We have developed a computational method and workflow that predicts functional sites within proteins using position-specific conditional template annotation rules (namely PIR Site Rules or PIRSRs for short).
View Article and Find Full Text PDFNumerous efforts have been made for developing text-mining tools to extract information from biomedical text automatically. They have assisted in many biological tasks, such as database curation and hypothesis generation. Text-mining tools are usually different from each other in terms of programming language, system dependency and input/output format.
View Article and Find Full Text PDFOviductosomes (OVS) are nano-sized extracellular vesicles secreted in the oviductal luminal fluid by oviductal epithelial cells and known to be involved in sperm capacitation and fertility. Although they have been shown to transfer encapsulated proteins to sperm, cargo constituents other than proteins have not been identified. Using next-generation sequencing, we demonstrate that OVS are carriers of microRNAs (miRNAs), with 272 detected throughout the estrous cycle.
View Article and Find Full Text PDF