Publications by Ginter F | LitMetric

Publications by authors named "Ginter F"

Page 1 of 2

Dependency parsing of biomedical text with BERT.

Jenna Kanerva Filip Ginter Sampo Pyysalo

BMC Bioinformatics

December 2020

Background: : Syntactic analysis, or parsing, is a key task in natural language processing and a required component for many text mining approaches. In recent years, Universal Dependencies (UD) has emerged as the leading formalism for dependency parsing. While a number of recent tasks centering on UD have substantially advanced the state of the art in multilingual parsing, there has been only little study of parsing texts from specialized domains such as biomedicine.

View Article and Find Full Text PDF

Neural Network and Random Forest Models in Protein Function Prediction.

Kai Hakala Suwisa Kaewphan Jari Bjorne Farrokh Mehryary Hans Moen

IEEE/ACM Trans Comput Biol Bioinform

June 2022

Over the past decade, the demand for automated protein function prediction has increased due to the volume of newly sequenced proteins. In this paper, we address the function prediction task by developing an ensemble system automatically assigning Gene Ontology (GO) terms to the given input protein sequence. We develop an ensemble system which combines the GO predictions made by random forest (RF) and neural network (NN) classifiers.

View Article and Find Full Text PDF

Assisting nurses in care documentation: from automated sentence classification to coherent document structures with subject headings.

Hans Moen Kai Hakala Laura-Maria Peltonen Hanna-Maria Matinolli Henry Suhonen

J Biomed Semantics

September 2020

Background: Up to 35% of nurses' working time is spent on care documentation. We describe the evaluation of a system aimed at assisting nurses in documenting patient care and potentially reducing the documentation workload. Our goal is to enable nurses to write or dictate nursing notes in a narrative manner without having to manually structure their text under subject headings.

View Article and Find Full Text PDF

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

Naihui Zhou Yuxiang Jiang Timothy R Bergquist Alexandra J Lee Balint Z Kacsoh

Genome Biol

November 2019

Background: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.

Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes.

View Article and Find Full Text PDF

Supporting the use of standardized nursing terminologies with automatic subject heading prediction: a comparison of sentence-level text classification methods.

Hans Moen Kai Hakala Laura-Maria Peltonen Henry Suhonen Filip Ginter

J Am Med Inform Assoc

January 2020

Objective: This study focuses on the task of automatically assigning standardized (topical) subject headings to free-text sentences in clinical nursing notes. The underlying motivation is to support nurses when they document patient care by developing a computer system that can assist in incorporating suitable subject headings that reflect the documented topics. Central in this study is performance evaluation of several text classification methods to assess the feasibility of developing such a system.

View Article and Find Full Text PDF

Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction.

Farrokh Mehryary Jari Björne Tapio Salakoski Filip Ginter

Database (Oxford)

January 2018

Biomedical researchers regularly discover new interactions between chemical compounds/drugs and genes/proteins, and report them in research literature. Having knowledge about these interactions is crucially important in many research areas such as precision medicine and drug discovery. The BioCreative VI Task 5 (CHEMPROT) challenge promotes the development and evaluation of computer systems that can automatically recognize and extract statements of such interactions from biomedical literature.

View Article and Find Full Text PDF

Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task.

Abeed Sarker Maksim Belousov Jasper Friedrichs Kai Hakala Svetlana Kiritchenko

J Am Med Inform Assoc

October 2018

Objective: We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data.

Materials And Methods: We organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions.

View Article and Find Full Text PDF

Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling.

Suwisa Kaewphan Kai Hakala Niko Miekka Tapio Salakoski Filip Ginter

Database (Oxford)

January 2018

We present a system for automatically identifying a multitude of biomedical entities from the literature. This work is based on our previous efforts in the BioCreative VI: Interactive Bio-ID Assignment shared task in which our system demonstrated state-of-the-art performance with the highest achieved results in named entity recognition. In this paper we describe the original conditional random field-based system used in the shared task as well as experiments conducted since, including better hyperparameter tuning and character level modeling, which led to further performance improvements.

View Article and Find Full Text PDF

Finding novel relationships with integrated gene-gene association network analysis of PCC 6803 using species-independent text-mining.

Sanna M Kreula Suwisa Kaewphan Filip Ginter Patrik R Jones

PeerJ

May 2018

Article Synopsis

Scientists are using new computer methods called text-mining to analyze lots of scientific articles quickly, which helps them build networks of information that are too complex to understand by just reading.
They focused on a type of bacteria called PCC 6803, which hasn't been studied as much, to show how this technique can help find connections between genes that weren't known before.
By combining their findings with previous research and using special rules to search for new gene connections, they created a helpful tool that anyone can access to learn more about gene interactions.

View Article and Find Full Text PDF

Improving Layman Readability of Clinical Narratives with Unsupervised Synonym Replacement.

Hans Moen Laura-Maria Peltonen Mikko Koivumäki Henry Suhonen Tapio Salakoski

Stud Health Technol Inform

June 2018

We report on the development and evaluation of a prototype tool aimed to assist laymen/patients in understanding the content of clinical narratives. The tool relies largely on unsupervised machine learning applied to two large corpora of unlabeled text - a clinical corpus and a general domain corpus. A joint semantic word-space model is created for the purpose of extracting easier to understand alternatives for words considered difficult to understand by laymen.

View Article and Find Full Text PDF

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.

Yuxiang Jiang Tal Ronnen Oron Wyatt T Clark Asma R Bankapur Daniel D'Andrea

Genome Biol

September 2016

Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.

View Article and Find Full Text PDF

Filtering large-scale event collections using a combination of supervised and unsupervised learning for event trigger classification.

Farrokh Mehryary Suwisa Kaewphan Kai Hakala Filip Ginter

J Biomed Semantics

November 2017

Background: Biomedical event extraction is one of the key tasks in biomedical text mining, supporting various applications such as database curation and hypothesis generation. Several systems, some of which have been applied at a large scale, have been introduced to solve this task. Past studies have shown that the identification of the phrases describing biological processes, also known as trigger detection, is a crucial part of event extraction, and notable overall performance gains can be obtained by solely focusing on this sub-task.

View Article and Find Full Text PDF

Application of the EVEX resource to event extraction and network construction: Shared Task entry and result analysis.

Kai Hakala Sofie Van Landeghem Tapio Salakoski Yves Van de Peer Filip Ginter

BMC Bioinformatics

June 2016

Background: Modern methods for mining biomolecular interactions from literature typically make predictions based solely on the immediate textual context, in effect a single sentence. No prior work has been published on extending this context to the information automatically gathered from the whole biomedical literature. Thus, our motivation for this study is to explore whether mutually supporting evidence, aggregated across several documents can be utilized to improve the performance of the state-of-the-art event extraction systems.

View Article and Find Full Text PDF

Cell line name recognition in support of the identification of synthetic lethality in cancer from text.

Suwisa Kaewphan Sofie Van Landeghem Tomoko Ohta Yves Van de Peer Filip Ginter

Bioinformatics

January 2016

Motivation: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain.

View Article and Find Full Text PDF

Care episode retrieval: distributional semantic models for information retrieval in the clinical domain.

Hans Moen Filip Ginter Erwin Marsi Laura-Maria Peltonen Tapio Salakoski

BMC Med Inform Decis Mak

March 2016

Patients' health related information is stored in electronic health records (EHRs) by health service providers. These records include sequential documentation of care episodes in the form of clinical notes. EHRs are used throughout the health care sector by professionals, administrators and patients, primarily for clinical purposes, but also for secondary purposes such as decision support and research.

View Article and Find Full Text PDF

Statistical parsing of varieties of clinical Finnish.

Veronika Laippala Timo Viljanen Antti Airola Jenna Kanerva Sanna Salanterä

Artif Intell Med

July 2014

Objectives: In this paper, we study the development and domain-adaptation of statistical syntactic parsers for three different clinical domains in Finnish.

Methods And Materials: The materials include text from daily nursing notes written by nurses in an intensive care unit, physicians' notes from cardiology patients' health records, and daily nursing notes from cardiology patients' health records. The parsing is performed with the statistical parser of Bohnet (http://code.

View Article and Find Full Text PDF

Large-scale event extraction from literature with multi-level gene normalization.

Sofie Van Landeghem Jari Björne Chih-Hsuan Wei Kai Hakala Sampo Pyysalo

PLoS One

November 2013

Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families.

View Article and Find Full Text PDF

University of Turku in the BioNLP'11 Shared Task.

Jari Björne Filip Ginter Tapio Salakoski

BMC Bioinformatics

June 2012

Background: We present a system for extracting biomedical events (detailed descriptions of biomolecular interactions) from research articles, developed for the BioNLP'11 Shared Task. Our goal is to develop a system easily adaptable to different event schemes, following the theme of the BioNLP'11 Shared Task: generalization, the extension of event extraction to varied biomedical domains. Our system extends our BioNLP'09 Shared Task winning Turku Event Extraction System, which uses support vector machines to first detect event-defining words, followed by detection of their relationships.

View Article and Find Full Text PDF

Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations.

Sofie Van Landeghem Kai Hakala Samuel Rönnqvist Tapio Salakoski Yves Van de Peer

Adv Bioinformatics

August 2012

Technological advancements in the field of genetics have led not only to an abundance of experimental data, but also caused an exponential increase of the number of published biomolecular studies. Text mining is widely accepted as a promising technique to help researchers in the life sciences deal with the amount of available literature. This paper presents a freely available web application built on top of 21.

View Article and Find Full Text PDF

U-Compare bio-event meta-service: compatible BioNLP event extraction services.

Yoshinobu Kano Jari Björne Filip Ginter Tapio Salakoski Ekaterina Buyko

BMC Bioinformatics

December 2011

Background: Bio-molecular event extraction from literature is recognized as an important task of bio text mining and, as such, many relevant systems have been developed and made available during the last decade. While such systems provide useful services individually, there is a need for a meta-service to enable comparison and ensemble of such services, offering optimal solutions for various purposes.

Results: We have integrated nine event extraction systems in the U-Compare framework, making them intercompatible and interoperable with other U-Compare components.

View Article and Find Full Text PDF

Complex event extraction at PubMed scale.

Jari Björne Filip Ginter Sampo Pyysalo Jun'ichi Tsujii Tapio Salakoski

Bioinformatics

June 2010

Motivation: There has recently been a notable shift in biomedical information extraction (IE) from relation models toward the more expressive event model, facilitated by the maturation of basic tools for biomedical text analysis and the availability of manually annotated resources. The event model allows detailed representation of complex natural language statements and can support a number of advanced text mining applications ranging from semantic search to pathway extraction. A recent collaborative evaluation demonstrated the potential of event extraction systems, yet there have so far been no studies of the generalization ability of the systems nor the feasibility of large-scale extraction.

View Article and Find Full Text PDF

Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: method and clinical application.

Filip Ginter Hanna Suominen Sampo Pyysalo Tapio Salakoski

Int J Med Inform

December 2009

Motivation: Topic segmentation and labeling systems enable fine-grained information search. However, previously proposed methods require annotated data to adapt to different information needs and have limited applicability to texts with short segment length.

Methods: We introduce an unsupervised method based on a combination of hidden Markov models and latent semantic analysis which allows the topics of interest to be defined freely, without the need for data annotation, and can identify short segments.

View Article and Find Full Text PDF

Towards automated processing of clinical Finnish: sublanguage analysis and a rule-based parser.

Veronika Laippala Filip Ginter Sampo Pyysalo Tapio Salakoski

Int J Med Inform

December 2009

Introduction: In this paper, we present steps taken towards more efficient automated processing of clinical Finnish, focusing on daily nursing notes in a Finnish Intensive Care Unit (ICU). First, we analyze ICU Finnish as a sublanguage, identifying its specific features facilitating, for example, the development of a specialized syntactic analyser. The identified features include frequent omission of finite verbs, limitations in allowed syntactic structures, and domain-specific vocabulary.

View Article and Find Full Text PDF

All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning.

Antti Airola Sampo Pyysalo Jari Björne Tapio Pahikkala Filip Ginter

BMC Bioinformatics

November 2008

Background: Automated extraction of protein-protein interactions (PPI) is an important and widely studied task in biomedical text mining. We propose a graph kernel based approach for this task. In contrast to earlier approaches to PPI extraction, the introduced all-paths graph kernel has the capability to make use of full, general dependency graphs representing the sentence structure.

View Article and Find Full Text PDF

Comparative analysis of five protein-protein interaction corpora.

Sampo Pyysalo Antti Airola Juho Heimonen Jari Björne Filip Ginter

BMC Bioinformatics

April 2008

Background: Growing interest in the application of natural language processing methods to biomedical text has led to an increasing number of corpora and methods targeting protein-protein interaction (PPI) extraction. However, there is no general consensus regarding PPI annotation and consequently resources are largely incompatible and methods are difficult to evaluate.

Results: We present the first comparative evaluation of the diverse PPI corpora, performing quantitative evaluation using two separate information extraction methods as well as detailed statistical and qualitative analyses of their properties.

View Article and Find Full Text PDF