Semantic aware-based instruction embedding for binary code similarity detection.

PLoS One

College of Information Engineering, Zhejiang University of Technology, Hangzhou, Zhejiang, China.

Published: June 2024

Binary code similarity detection plays a crucial role in various applications within binary security, including vulnerability detection, malicious software analysis, etc. However, existing methods suffer from limited differentiation in binary embedding representations across different compilation environments, lacking dynamic high-level semantics. Moreover, current approaches often neglect multi-level semantic feature extraction, thereby failing to acquire precise semantic information about the binary code. To address these limitations, this paper introduces a novel detection solution called BinBcla. This method employs an enhanced pre-training model to generate instruction embeddings with dynamic semantics for binary functions. Subsequently, multi-feature fusion technique is utilized to extract local semantic information and long-distance global features from the code, respectively, employing self-attention to comprehend the structure information of the code. Finally, an improved cosine similarity method is employed to learn relationships among all elements of the distance vectors, thereby enhancing the model's robustness to new sample functions. Experiments are conducted across different architectures, compilers, and optimization levels. The results indicate that BinBcla achieves higher accuracy, precision and F1 score compared to existing methods.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11166306PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305299PLOS

Publication Analysis

Top Keywords

binary code
12
code similarity
8
similarity detection
8
existing methods
8
binary
6
code
5
semantic
4
semantic aware-based
4
aware-based instruction
4
instruction embedding
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!