Motivation: Many viruses are organized into taxonomies of subtypes based on their genetic similarities. For human immunodeficiency virus 1 (HIV-1), subtype classification plays a crucial role in infection management. Sequence alignment-based methods for subtype classification are impractical for large datasets because they are costly and time-consuming. Alignment-free methods involve creating numerical representations for genetic sequences and applying statistical or machine learning methods. Despite their high overall accuracy, existing models perform poorly on less common subtypes. Furthermore, there is limited work investigating the impact of sequence vectorization methods, in particular natural language-inspired embedding methods, on HIV-1 subtype classification.

Results: We present a comprehensive analysis of sequence vectorization methods across machine learning methods. We report a -mer-based XGBoost model with a balanced accuracy of 0.84, indicating that it has good overall performance for both common and uncommon HIV-1 subtypes. We also report a Word2Vec-based support vector machine that achieves promising results on precision and balanced accuracy. Our study sheds light on the effect of sequence vectorization methods on HIV-1 subtype classification and suggests that natural language-inspired encoding methods show promise. Our results could help to develop improved HIV-1 subtype classification methods, leading to improved individual patient outcomes, and the development of subtype-specific treatments.

Availability And Implementation: Source code is available at https://www.github.com/kwade4/HIV_Subtypes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371153PMC
http://dx.doi.org/10.1093/bioadv/vbae108DOI Listing

Publication Analysis

Top Keywords

hiv-1 subtype
20
subtype classification
20
machine learning
12
learning methods
12
methods hiv-1
12
sequence vectorization
12
vectorization methods
12
methods
11
natural language-inspired
8
balanced accuracy
8

Similar Publications

Objectives: This study aimed to evaluate the prevalence and characteristics of drug resistance mutations (DRMs) in patients with low-level viremia (LLV) in Southwestern China, as it has become a growing challenge in AIDS clinical practice.

Methods: This cross-sectional study was performed in Yunnan Province, Southwestern China. LLV was defined as 50-999 copies/mL of plasma viral load with antiretroviral therapy (ART) for at least 6 months.

View Article and Find Full Text PDF

To analyze the transmission characteristics of newly reported HIV-infected students aged ≥18 years in Nanjing City from 2016 to 2022 and provide evidence for AIDS publicity and intervention among young students. The pol region sequences of newly reported HIV-infected students and non-student HIV-infected individuals in Nanjing City from 2016 to 2022 were collected, and the BLAST tool was used to search the published global non-Nanjing reported HIV infection sequences in the LANL HIV database. The basic molecular transmission network and regional molecular transmission network were constructed using the HIV-TRACE in a pairwise genetic distance threshold of 1.

View Article and Find Full Text PDF

Indonesia has one of the highest HIV infection rates in Southeast Asia. The use of dolutegravir, an integrase strand transfer inhibitor (INSTI), as a first-line treatment underscores the need for detailed data on INSTI drug resistance mutations (DRMs). Currently, there is a lack of comprehensive data on DRMs INSTI and other HIV drug resistance in Indonesian patients, both pre- and post-treatment.

View Article and Find Full Text PDF

Development of a latency model for HIV-1 subtype C and the impact of long terminal repeat element genetic variation on latency reversal.

J Virus Erad

December 2024

HIV Pathogenesis Programme, The Doris Duke Medical Research Institute, Nelson R. Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa.

Sub-Saharan Africa accounts for almost 70 % of people living with HIV (PLWH) worldwide, with the greatest numbers centred in South Africa where 98 % of infections are caused by subtype C (HIV-1C). However, HIV-1 subtype B (HIV-1B), prevalent in Europe and North America, has been the focus of most cure research and testing despite making up only 12 % of HIV-1 infections globally. Development of latency models for non-subtype B viruses is a necessary step to address this disproportionate focus.

View Article and Find Full Text PDF

Major role of dolutegravir in the emergence of the S147G integrase resistance mutation.

J Antimicrob Chemother

December 2024

Department of Virology, Sorbonne Université, INSERM, UMR-S 1136, Institut Pierre Louis d'Epidémiologie et de Santé Publique, AP-HP, Hôpitaux Universitaires Pitié Salpêtrière - Charles Foix, 83 Boulevard de l'Hôpital 39, F-75013 Paris, France.

Background: The S147G mutation is associated with high-level resistance to the integrase strand transfer inhibitor (INSTI) elvitegravir. In several poorly documented cases, it was also selected in patients on dolutegravir. Given the widespread use of dolutegravir, further studies of S147G are required.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!