Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users' queries from commercial search engines), computing such near or nearest neighbors is a non-trivial task, as the computational cost grows significantly with the number of items. To address this challenge, we adopt Locality Sensitive Hashing (a.k.a, LSH) methods and evaluate four variants in a distributed computing environment (specifically, Hadoop). We identify several optimizations which improve performance, suitable for deployment in very large scale settings. The experimental results demonstrate our variants of LSH achieve the robust performance with better recall compared with "vanilla" LSH, even when using the same amount of space.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5773183PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0191175PLOS

Publication Analysis

Top Keywords

locality sensitive
8
sensitive hashing
8
large scale
8
evaluation multi-probe
4
multi-probe locality
4
hashing computing
4
computing similarities
4
similarities web-scale
4
web-scale query
4
query logs
4

Similar Publications

Approximate nearest neighbor graph provides fast and efficient embedding with applications for large-scale biological data.

NAR Genom Bioinform

December 2024

Center for Bioinformatics and Computational Genomics, Georgia Institute of Technology, 225 North Avenue NW, Atlanta, GA, 30332, USA.

Dimension reduction (DR or embedding) algorithms such as t-SNE and UMAP have many applications in big data visualization but remain slow for large datasets. Here, we further improve the UMAP-like algorithms by (i) combining several aspects of t-SNE and UMAP to create a new DR algorithm; (ii) replacing its rate-limiting step, the K-nearest neighbor graph (K-NNG), with a Hierarchical Navigable Small World (HNSW) graph; and (iii) extending the functionality to DNA/RNA sequence data by combining HNSW with locality sensitive hashing algorithms (e.g.

View Article and Find Full Text PDF

This qualitative synthesis explores the experiences of UK communities facing growing health risks from climate change and extreme weather. The eight included studies show the profound impacts of extreme weather events such as floods on mental health, including challenges to self-identity and anxiety from the fear of flooding returning. Included data reveal individual and household impacts of extreme weather are mediated by a complex interaction of institutional support, community support, gender inequalities and personal agency.

View Article and Find Full Text PDF

Morphological identification of hookworm species in five regions of Cameroon.

Helminthologia

September 2024

Laboratory of Parasitology and Ecology, Department of Animal Biology and Physiology, Faculty of Science, University of Yaounde I, P.O. Box 812, Yaounde, Cameroon.

Infections with hookworms ( and ) remain a major public health problem in low- and middle-income countries. However, the information about the distribution of each species is inaccurate in many countries since their traditional diagnosis is based only on the identification of eggs in stool under a microscope. We aimed to identify the prevalence of hookworm species using morphological stools to identify L3 larvae to gain insights into the distribution of both species in five regions of Cameroon.

View Article and Find Full Text PDF

Bullous pemphigoid and mucous membrane pemphigoid humoral responses differ in reactivity towards BP180 midportion and BP230.

Front Immunol

December 2024

Molecular and Cell Biology Laboratory, Istituto Dermopatico dell'Immacolata (IDI)-IRCCS, Rome, Italy.

Background: Bullous pemphigoid (BP) and mucous membrane pemphigoid (MMP) are rare autoimmune blistering disorders characterized by autoantibodies (autoAbs) targeting dermo-epidermal junction components such as BP180 and BP230. The differential diagnosis, based on both the time of appearance and the extension of cutaneous and/or mucosal lesions, is crucial to distinguish these diseases for improving therapy outcomes and delineating the correct prognosis; however, in some cases, it can be challenging. In addition, negative results obtained by commercially available enzyme-linked immunosorbent assays (ELISAs) with BP and MMP sera, especially from patients with ocular involvement, often delay diagnosis and treatment, leading to a greater risk of poor outcomes.

View Article and Find Full Text PDF

Actigraphy validation in behavioral variant frontotemporal dementia.

Sleep Med

December 2024

Department of Translational Biomedicine and Neurosciences (DiBraiN), University of Bari Aldo Moro, Bari, Italy; Center for Neurodegenerative Diseases and the Aging Brain, University of Bari Aldo Moro, "Pia Fondazione Cardinale G. Panico", Tricase, Lecce, Italy. Electronic address:

Background: Actigraphy is increasingly being used to assess sleep in patients with neurodegenerative diseases. However, information on its accuracy relative to polysomnography (PSG) in this clinical population remains scarce. This study investigates the performance of actigraphy compared to PSG in patients with behavioral variant frontotemporal dementia (bvFTD), which is the leading form of early-onset dementia.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!