As one of the main types of structural variation in the human genome, copy number variation (CNV) plays an important role in the occurrence and development of human cancers. Next-generation sequencing (NGS) technology can provide base-level resolution, which provides favorable conditions for the accurate detection of CNVs. However, it is still a very challenging task to accurately detect CNVs from cancer samples with different purity and low sequencing coverage. Local distance-based CNV detection (LDCNV), an innovative computational approach to predict CNVs using NGS data, is proposed in this work. LDCNV calculates the average distance between each read depth (RD) and its nearest neighbors (KNNs) to define the distance of KNNs of each RD, and the average distance between the KNNs for each RD to define their internal distance. Based on the above definitions, a local distance score is constructed using the ratio between the distance of KNNs and the internal distance of KNNs for each RD. The local distance scores are used to fit a normal distribution to evaluate the significance level of each RDS, and then use the hypothesis test method to predict the CNVs. The performance of the proposed method is verified with simulated and real data and compared with several popular methods. The experimental results show that the proposed method is superior to various other techniques. Therefore, the proposed method can be helpful for cancer diagnosis and targeted drug development.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10556732PMC
http://dx.doi.org/10.3389/fgene.2023.1147761DOI Listing

Publication Analysis

Top Keywords

distance knns
16
local distance
12
proposed method
12
distance
9
copy number
8
next-generation sequencing
8
predict cnvs
8
average distance
8
knns define
8
internal distance
8

Similar Publications

Objectives: To complete the task of automatic recognition and classification of thyroid nodules and solve the problem of high classification error rates when the samples are imbalanced.

Methods: An improved k-nearest neighbor (KNN) algorithm is proposed and a method for automatic thyroid nodule classification based on the improved KNN algorithm is established. In the improved KNN algorithm, we consider not only the number of class labels for various classes of data in KNNs, but also the corresponding weights.

View Article and Find Full Text PDF

The Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2) can obtain underwater elevation due to its strong penetration ability. However, the photons recorded by ICESat-2 include a large amount of noise that needs to be removed. Although density-based clustering methods can finish signal photon extraction, heterogeneous density and weak connectivity in photon data distribution impede their denoising performance, especially for sparse signals in deep water and drastic topographic change areas.

View Article and Find Full Text PDF

As one of the main types of structural variation in the human genome, copy number variation (CNV) plays an important role in the occurrence and development of human cancers. Next-generation sequencing (NGS) technology can provide base-level resolution, which provides favorable conditions for the accurate detection of CNVs. However, it is still a very challenging task to accurately detect CNVs from cancer samples with different purity and low sequencing coverage.

View Article and Find Full Text PDF

Assessing stationary distributions derived from chromatin contact maps.

BMC Bioinformatics

February 2020

Computational Biology, 23andMe, Inc., 899 West Evelyn Avenue, Mountain View, 94041, CA, USA.

Article Synopsis
  • The spatial arrangement of chromosomes is crucial for cellular functions and can lead to cancer when altered, making the study of chromatin conformation important yet difficult due to its complexity.
  • Recent advancements, especially in Hi-C assays, have improved our understanding of chromatin structure, but the evaluation of 3D reconstructions based on this data is challenging due to a lack of gold standards.
  • This study investigates the use of stationary distributions (StatDns) from Hi-C contact matrices to assess the accuracy of chromatin reconstructions, aiming to enhance the identification of highly interactive genomic regions involved in chromosomal interactions.
View Article and Find Full Text PDF

Fast k-NNG construction with GPU-based quick multi-select.

PLoS One

October 2015

Department of Mechanical Engineering, Complex Systems Simulation Lab, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, United States of America.

In this paper, we describe a new brute force algorithm for building the k-Nearest Neighbor Graph (k-NNG). The k-NNG algorithm has many applications in areas such as machine learning, bio-informatics, and clustering analysis. While there are very efficient algorithms for data of low dimensions, for high dimensional data the brute force search is the best algorithm.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!