Comparing the 3D structures of proteins is an important but computationally hard problem in bioinformatics. In this paper, we propose studying the problem when much less information or assumptions are available. We model the structural alignment of proteins as a combinatorial problem. In the problem, each protein is simply a set of points in the 3D space, without sequence order information, and the objective is to discover all large enough alignments for any subset of the input. We propose a data-mining approach for this problem. We first perform geometric hashing of the structures such that points with similar locations in the 3D space are hashed into the same bin in the hash table. The novelty is that we consider each bin as a coincidence group and mine for frequent patterns, which is a well-studied technique in data mining. We observe that these frequent patterns are already potentially large alignments. Then a simple heuristic is used to extend the alignments if possible. We implemented the algorithm and tested it using real protein structures. The results were compared with existing tools. They showed that the algorithm is capable of finding conserved substructures that do not preserve sequence order, especially those existing in protein interfaces. The algorithm can also identify conserved substructures of functionally similar structures within a mixture with dissimilar ones. The running time of the program was smaller or comparable to that of the existing tools.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2951672 | PMC |
http://dx.doi.org/10.6026/97320630004366 | DOI Listing |
Sci Total Environ
January 2025
College of Ecology and Environment, Joint Center for sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China; Yale-NUIST Center on Atmospheric Environment, Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters (CIC-FEMD), Nanjing University of Information Science & Technology, Nanjing 210044, China. Electronic address:
Methane (CH) emissions from the coal industry represent a substantial portion of anthropogenic CH emissions from energy-related activities. China ranks as the world's largest coal producer, where Shanxi Province is one of its major coal production regions and accounts for 20.7 % of the national total coal production.
View Article and Find Full Text PDFPLoS One
January 2025
Department of Radiology, Yantaishan Hospital, Yantai, Shandong, China.
Diabetic retinopathy, a retinal disorder resulting from diabetes mellitus, is a prominent cause of visual degradation and loss among the global population. Therefore, the identification and classification of diabetic retinopathy are of utmost importance in the clinical diagnosis and therapy. Currently, these duties are extensively carried out by manual examination utilizing the human visual system.
View Article and Find Full Text PDFJ Chem Inf Model
January 2025
Center for Engineering Concepts Development, Department of Mechanical Engineering, University of Maryland, College Park, Maryland 20742, United States.
In 2020, nearly 3 million scientific and engineering papers were published worldwide (White, K. Publications Output: U.S.
View Article and Find Full Text PDFAsian Pac J Cancer Prev
January 2025
Cancer Chemoprevention Research Center, Faculty of Pharmacy, Universitas Gadjah Mada Sekip Utara II, 55281 Yogyakarta, Indonesia.
Objective: Programmed cell death-1 (PD-1, encoded by PDCD1) regulatory network participates in glioblastoma multiforme development. However, such a network in trastuzumab-resistant human epidermal growth factor receptor 2-positive (HER2+) breast cancer remains to be determined. Accordingly, this study was aimed to explore the PD-1 regulatory network responsible for the resistance of breast cancer cells to trastuzumab through a bioinformatics approach.
View Article and Find Full Text PDFBMC Med Res Methodol
January 2025
Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 1620 Tremont Street, Suite 3030-R, Boston, MA, 02120, USA.
Background: A vast amount of potentially useful information such as description of patient symptoms, family, and social history is recorded as free-text notes in electronic health records (EHRs) but is difficult to reliably extract at scale, limiting their utility in research. This study aims to assess whether an "out of the box" implementation of open-source large language models (LLMs) without any fine-tuning can accurately extract social determinants of health (SDoH) data from free-text clinical notes.
Methods: We conducted a cross-sectional study using EHR data from the Mass General Brigham (MGB) system, analyzing free-text notes for SDoH information.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!