Approximate and Exact Optimization Algorithms for the Beltway and Turnpike Problems with Duplicated, Missing, Partially Labeled, and Uncertain Measurements.

C S Elder Minh Hoang Mohsen Ferdosi Carl Kingsford

J Comput Biol

Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh Pennsylvania, USA.

Published: October 2024

The Turnpike problem focuses on reconstructing one-dimensional points from their pairwise distances, crucial in fields like biology for tasks such as genomic sequencing and molecular structure determination.
Handling noisy data makes this problem NP-hard, traditionally requiring extensive time and space to solve, motivating a new approach based on optimization to enhance efficiency.
The proposed suite of algorithms includes a bilevel optimization framework capable of tackling large instances and adapting to constraints like missing or duplicated distances, achieving high scalability and performance in comparison to existing solutions.

The Turnpike problem aims to reconstruct a set of one-dimensional points from their unordered pairwise distances. Turnpike arises in biological applications such as molecular structure determination, genomic sequencing, tandem mass spectrometry, and molecular error-correcting codes. Under noisy observation of the distances, the Turnpike problem is NP-hard and can take exponential time and space to solve when using traditional algorithms. To address this, we reframe the noisy Turnpike problem through the lens of optimization, seeking to simultaneously find the unknown point set and a permutation that maximizes similarity to the input distances. Our core contribution is a suite of algorithms that robustly solve this new objective. This includes a bilevel optimization framework that can efficiently solve Turnpike instances with up to 100,000 points. We show that this framework can be extended to scenarios with domain-specific constraints that include duplicated, missing, and partially labeled distances. Using these, we also extend our algorithms to work for points distributed on a circle (the Beltway problem). For small-scale applications that require global optimality, we formulate an integer linear program (ILP) that (i) accepts an objective from a generic family of convex functions and (ii) uses an extended formulation to reduce the number of binary variables. On synthetic and real partial digest data, our bilevel algorithms achieved state-of-the-art scalability across challenging scenarios with performance that matches or exceeds competing baselines. On small-scale instances, our ILP efficiently recovered ground-truth assignments and produced reconstructions that match or exceed our alternating algorithms. Our implementations are available at https://github.com/Kingsford-Group/turnpikesolvermm.

Download full-text PDF	Source
http://dx.doi.org/10.1089/cmb.2024.0661	DOI Listing

Publication Analysis

Top Keywords

turnpike problem

duplicated missing

missing partially

partially labeled

distances turnpike

algorithms

turnpike

approximate exact

exact optimization

optimization algorithms

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!