An open-source natural language processing toolkit to support software development: addressing automatic bug detection, code summarisation and code search.

Cristian Robledo Francesca Sallicati Gaël de Chalendar Marcos Fernández Pablo de Castro Eduardo Martín Javier Gutiérrez Yannis Bouachera

Open Res Eur

CEA, Paris, Île-de-France, France.

Published: October 2023

This paper aims to introduce the innovative work carried out in the Horizon 2020 DECODER project - acronym for "DEveloper COmpanion for Documented and annotatEd code Reference" - (Grant Agreement no. 824231) by linking the fields of natural language processing (NLP) and software engineering. The project as a whole addresses the development of a framework, namely the Persistent Knowledge Monitor (PKM), that acts as a central infrastructure to store, access, and trace all the data, information and knowledge related to a given software or ecosystem. This meta-model defines the knowledge base that can be queried and analysed by all the tools integrated and developed in DECODER. Besides, the DECODER project offers a friendly user interface where each of the predefined three roles (i.e., developers, maintainers and reviewers) can access and query the PKM with their personal accounts. The paper focuses on the NLP tools developed and integrated in the PKM, namely the deep learning models developed to perform variable misuse, code summarisation and semantic parsing. These were developed under a common work package - "Activities for the developer" - intended to precisely target developers, who can perform tasks such as detection of bugs, automatic generation of documentation for source code and generation of code snippets from natural languages instructions, among the multiple functionalities that DECODER offers. These tools assist and help the developers in the daily work, by increasing their productivity and avoiding loss of time in tedious tasks such as manual bug detection. Training and validation were conducted for four use cases in Java, C and C++ programming languages in order to evaluate the performance, suitability, usability, etc. of the developed tools.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11036033	PMC
http://dx.doi.org/10.12688/openreseurope.14507.2	DOI Listing

Publication Analysis

Top Keywords

natural language

language processing

bug detection

code summarisation

decoder project

code

developed

open-source natural

processing toolkit

toolkit support

Similar Publications

Large Language Model Approach for Zero-Shot Information Extraction and Clustering of Japanese Radiology Reports: Algorithm Development and Validation.

JMIR Cancer

January 2025

Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.

Yosuke Yamagishi Yuta Nakamura Shouhei Hanaoka Osamu Abe

Background: The application of natural language processing in medicine has increased significantly, including tasks such as information extraction and classification. Natural language processing plays a crucial role in structuring free-form radiology reports, facilitating the interpretation of textual content, and enhancing data utility through clustering techniques. Clustering allows for the identification of similar lesions and disease patterns across a broad dataset, making it useful for aggregating information and discovering new insights in medical imaging.

View Article and Find Full Text PDF

Similar Publications

Bibliometric analysis of global research trends in vestibular neuritis (1980-2024).

Eur Arch Otorhinolaryngol

January 2025

Faculty of Applied Sciences, Department of Accounting and Financial Management, Necmettin Erbakan University, Konya, Turkey.

Mehmet Akif Dündar Ahmet Avcı Müşerref Arık Ahmet Tayfur Akcan Hasan Kazak

Purpose: Vestibular neuritis (VN) is a common cause of vertigo with significant impact on patients' quality of life. This study aimed to analyze global research trends in VN using bibliometric methods to identify key themes, influential authors, institutions, and countries contributing to the field.

Methods: We conducted a comprehensive search of the Web of Science Core Collection database for publications related to VN from 1980 to 2024.

View Article and Find Full Text PDF

Similar Publications

Leveraging two-dimensional pre-trained vision transformers for three-dimensional model generation via masked autoencoders.

Sci Rep

January 2025

Department of Electrical Power, Adama Science and Technology University, Adama, 1888, Ethiopia.

Muhammad Sajid Kaleem Razzaq Malik Ateeq Ur Rehman Tauqeer Safdar Malik Masoud Alajmi

Although the Transformer architecture has established itself as the industry standard for jobs involving natural language processing, it still has few uses in computer vision. In vision, attention is used in conjunction with convolutional networks or to replace individual convolutional network elements while preserving the overall network design. Differences between the two domains, such as significant variations in the scale of visual things and the higher granularity of pixels in images compared to words in the text, make it difficult to transfer Transformer from language to vision.

View Article and Find Full Text PDF

Similar Publications

ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis.

J Biomed Inform

January 2025

Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, 02115, MA, USA; VA Boston Healthcare System, 150 S Huntington Ave, Boston, 02130, MA, USA. Electronic address:

Ziming Gan Doudou Zhou Everett Rush Vidul A Panickan Yuk-Lam Ho

Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes (NLP). The complexity of EHR presents challenges in feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!