In the postgenome era, many problems in bioinformatics have arisen due to the generation of large amounts of imbalanced data. In particular, the computational classification of precursor microRNA (pre-miRNA) involves a high imbalance in the classes. For this task, a classifier is trained to identify RNA sequences having the highest chance of being miRNA precursors. The big issue is that well-known pre-miRNAs are usually just a few in comparison to the hundreds of thousands of candidate sequences in a genome, which results in highly imbalanced data. This imbalance has a strong influence on most standard classifiers and, if not properly addressed, the classifier is not able to work properly in a real-life scenario. This work provides a comparative assessment of recent deep neural architectures for dealing with the large imbalanced data issue in the classification of pre-miRNAs. We present and analyze recent architectures in a benchmark framework with genomes of animals and plants, with increasing imbalance ratios up to 1:2000. We also propose a new graphical way for comparing classifiers performance in the context of high-class imbalance. The comparative results obtained show that, at a very high imbalance, deep belief neural networks can provide the best performance.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2019.2914471DOI Listing

Publication Analysis

Top Keywords

imbalanced data
16
deep neural
8
neural architectures
8
highly imbalanced
8
high imbalance
8
imbalance
5
architectures highly
4
imbalanced
4
data
4
data bioinformatics
4

Similar Publications

The high mobility and dynamic nature of unmanned aerial vehicles (UAVs) pose significant challenges to clustering and routing in flying ad hoc networks (FANETs). Traditional methods often fail to achieve stable networks with efficient resource utilization and low latency. To address these issues, we propose a hybrid bio-inspired algorithm, HMAO, combining the mountain gazelle optimizer (MGO) and the aquila optimizer (AO).

View Article and Find Full Text PDF

To cope with the challenges posed by high-concurrency training tasks involving large models and big data, Directed Acyclic Graph (DAG) and shard were proposed as alternatives to blockchain-based federated learning, aiming to enhance training concurrency. However, there is insufficient research on the specific consensus designs and the effects of varying shard sizes on federated learning. In this paper, we combine DAG and shard by designing three tip selection consensus algorithms and propose an adaptive algorithm to improve training performance.

View Article and Find Full Text PDF

Ethnobiology! Until when will the colonialist legacy be reinforced?

J Ethnobiol Ethnomed

January 2025

Laboratory of Human Ecology and Ethnobotany (ECOHE), Department of Ecology and Zoology, Federal University of Santa Catarina, Florianópolis, Brazil.

In this essay, we will present arguments for a negative answer to the debate question: "Is publishing ethnobiology data respectful of Indigenous and Local Knowledge holders' rights?" We recognize that ethnobiological research has advanced in recognizing the rights of Indigenous Peoples and Local Communities (IPLC), but we believe that we still have a long way to go in deconstructing colonialism in ethnobiology. In order to be truly respectful, ethnobiologists need to collaborate with IPLC to achieve an ethical science with equity between knowledge systems, fostering the co-production of knowledge from an intercultural science perspective. This essay was written by a group of Brazilian scientists, both IPLC and non-IPLC, and reflects a perspective of the academic universe seen from the place we are, in this multicultural and imbalanced world.

View Article and Find Full Text PDF

Physician-dominated conversations: An analysis of illness understanding discussions among patients with advanced cancer.

Patient Educ Couns

January 2025

Dana-Farber Cancer Institute, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; Brigham and Women's Hospital, Boston, MA, USA. Electronic address:

Context: Effective communication between patients and oncologists is crucial, particularly around illness understanding. When this communication is asymmetric or imbalanced, it can hinder shared decision-making and lead to suboptimal clinical outcomes.

Objectives: We sought to describe physician-patient speech imbalances ("asymmetry") in illness understanding portions of discussions between oncologists and advanced cancer patients and explore potential trends related to patient characteristics.

View Article and Find Full Text PDF

In credit risk assessment, unsupervised classification techniques can be introduced to reduce human resource expenses and expedite decision-making. Despite the efficacy of unsupervised learning methods in handling unlabeled datasets, their performance remains limited owing to challenges such as imbalanced data, local optima, and parameter adjustment complexities. Thus, this paper introduces a novel hybrid unsupervised classification method, named the two-stage hybrid system with spectral clustering and semi-supervised support vector machine (TSC-SVM), which effectively addresses the unsupervised imbalance problem in credit risk assessment by targeting global optimal solutions.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!