The problem of optimally removing a set of vertices from a graph to minimize the size of the largest resultant component is known to be NP-complete. Prior work has provided near optimal heuristics with a high time complexity that function on up to hundreds of nodes and less optimal but faster techniques that function on up to thousands of nodes. In this work, we analyze how to perform vertex partitioning on massive graphs of tens of millions of nodes. We use a previously known and very simple heuristic technique: iteratively removing the node of largest degree and all of its edges. This approach has an apparent quadratic complexity since, upon removal of a node and adjoining set of edges, the node degree calculations must be updated prior to choosing the next node. However, we describe a linear time complexity solution using an array whose indices map to node degree and whose values are hash tables indicating the presence or absence of a node at that degree value. This approach also has a linear growth with respect to memory usage which is surprising since we lowered the time complexity from quadratic to linear. We empirically demonstrate linear scalability and linear memory usage on random graphs of up to 15000 nodes. We then demonstrate tractability on massive graphs through execution on a graph with 34 million nodes representing Internet wide router connectivity.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4913482PMC

Publication Analysis

Top Keywords

massive graphs
12
time complexity
12
node degree
12
linear time
8
vertex partitioning
8
partitioning massive
8
memory usage
8
linear
6
node
6
nodes
5

Similar Publications

Percolation for 2D Classical Heisenberg Model and Exit Sets of Vector Valued GFF.

Commun Math Phys

January 2025

Centro de Modelamiento Matemático (AFB170001), UMI-CNRS 2807, Universidad de Chile, Beauchef 851, Santiago, Chile.

Our motivation in this paper is twofold. First, we study the geometry of a class of exploration sets, called , which are naturally associated with a 2D vector-valued Gaussian Free Field : . We prove that, somewhat surprisingly, these sets are a.

View Article and Find Full Text PDF

The steady state of a water distribution system abides by the laws of mass and energy conservation. Hydraulic solvers, such as the one used by EPANET approach the simulation for a given topology with a Newton-Raphson algorithm. However, iterative approximation involves a matrix inversion which acts as a computational bottleneck and may significantly slow down the process.

View Article and Find Full Text PDF

Medical large language models are vulnerable to data-poisoning attacks.

Nat Med

January 2025

Department of Neurosurgery, NYU Langone Health, New York, NY, USA.

The adoption of large language models (LLMs) in healthcare demands a careful analysis of their potential to spread false medical knowledge. Because LLMs ingest massive volumes of data from the open Internet during training, they are potentially exposed to unverified medical knowledge that may include deliberately planted misinformation. Here, we perform a threat assessment that simulates a data-poisoning attack against The Pile, a popular dataset used for LLM development.

View Article and Find Full Text PDF

To address the instability and performance issues of the classical K-Means algorithm when dealing with massive datasets, we propose SOSK-Means, an improved K-Means algorithm based on Spark optimization. SOSK-Means incorporates several key modifications to enhance the clustering process.Firstly, a weighted jump-bank approach is introduced to enable efficient random sampling and preclustering.

View Article and Find Full Text PDF

BioMedGraphica: An All-in-One Platform for Biomedical Prior Knowledge and Omic Signaling Graph Generation.

bioRxiv

December 2024

Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA.

Artificial intelligence (AI) is revolutionizing scientific discovery because of its super capability, following the neural scaling laws, to integrate and analyze large-scale datasets to mine knowledge. Foundation models, large language models (LLMs) and large vision models (LVMs), are among the most important foundations paving the way for general AI by pre-training on massive domain-specific datasets. Different from the well annotated, formatted and integrated large textual and image datasets for LLMs and LVMs, biomedical knowledge and datasets are fragmented with data scattered across publications and inconsistent databases that often use diverse nomenclature systems in the field of AI for Precision Health and Medicine (AI4PHM).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!