Motivation: Many viruses are organized into taxonomies of subtypes based on their genetic similarities. For human immunodeficiency virus 1 (HIV-1), subtype classification plays a crucial role in infection management. Sequence alignment-based methods for subtype classification are impractical for large datasets because they are costly and time-consuming. Alignment-free methods involve creating numerical representations for genetic sequences and applying statistical or machine learning methods. Despite their high overall accuracy, existing models perform poorly on less common subtypes. Furthermore, there is limited work investigating the impact of sequence vectorization methods, in particular natural language-inspired embedding methods, on HIV-1 subtype classification.
Results: We present a comprehensive analysis of sequence vectorization methods across machine learning methods. We report a -mer-based XGBoost model with a balanced accuracy of 0.84, indicating that it has good overall performance for both common and uncommon HIV-1 subtypes. We also report a Word2Vec-based support vector machine that achieves promising results on precision and balanced accuracy. Our study sheds light on the effect of sequence vectorization methods on HIV-1 subtype classification and suggests that natural language-inspired encoding methods show promise. Our results could help to develop improved HIV-1 subtype classification methods, leading to improved individual patient outcomes, and the development of subtype-specific treatments.
Availability And Implementation: Source code is available at https://www.github.com/kwade4/HIV_Subtypes.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371153 | PMC |
http://dx.doi.org/10.1093/bioadv/vbae108 | DOI Listing |
J Antimicrob Chemother
January 2025
Department of Laboratory Medicine, Yunnan Provincial Infectious Disease Hospital, Kunming 650301, China.
Objectives: This study aimed to evaluate the prevalence and characteristics of drug resistance mutations (DRMs) in patients with low-level viremia (LLV) in Southwestern China, as it has become a growing challenge in AIDS clinical practice.
Methods: This cross-sectional study was performed in Yunnan Province, Southwestern China. LLV was defined as 50-999 copies/mL of plasma viral load with antiretroviral therapy (ART) for at least 6 months.
Zhonghua Yu Fang Yi Xue Za Zhi
January 2025
Department of AIDS/STD Control and Prevention, Nanjing Municipal Center for Disease Control and Prevention, Nanjing210003, China.
To analyze the transmission characteristics of newly reported HIV-infected students aged ≥18 years in Nanjing City from 2016 to 2022 and provide evidence for AIDS publicity and intervention among young students. The pol region sequences of newly reported HIV-infected students and non-student HIV-infected individuals in Nanjing City from 2016 to 2022 were collected, and the BLAST tool was used to search the published global non-Nanjing reported HIV infection sequences in the LANL HIV database. The basic molecular transmission network and regional molecular transmission network were constructed using the HIV-TRACE in a pairwise genetic distance threshold of 1.
View Article and Find Full Text PDFNarra J
December 2024
Department of Clinical Pathology, Faculty of Medicine, Universitas Padjadjaran, Bandung, Indonesia.
Indonesia has one of the highest HIV infection rates in Southeast Asia. The use of dolutegravir, an integrase strand transfer inhibitor (INSTI), as a first-line treatment underscores the need for detailed data on INSTI drug resistance mutations (DRMs). Currently, there is a lack of comprehensive data on DRMs INSTI and other HIV drug resistance in Indonesian patients, both pre- and post-treatment.
View Article and Find Full Text PDFJ Virus Erad
December 2024
HIV Pathogenesis Programme, The Doris Duke Medical Research Institute, Nelson R. Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa.
Sub-Saharan Africa accounts for almost 70 % of people living with HIV (PLWH) worldwide, with the greatest numbers centred in South Africa where 98 % of infections are caused by subtype C (HIV-1C). However, HIV-1 subtype B (HIV-1B), prevalent in Europe and North America, has been the focus of most cure research and testing despite making up only 12 % of HIV-1 infections globally. Development of latency models for non-subtype B viruses is a necessary step to address this disproportionate focus.
View Article and Find Full Text PDFJ Antimicrob Chemother
December 2024
Department of Virology, Sorbonne Université, INSERM, UMR-S 1136, Institut Pierre Louis d'Epidémiologie et de Santé Publique, AP-HP, Hôpitaux Universitaires Pitié Salpêtrière - Charles Foix, 83 Boulevard de l'Hôpital 39, F-75013 Paris, France.
Background: The S147G mutation is associated with high-level resistance to the integrase strand transfer inhibitor (INSTI) elvitegravir. In several poorly documented cases, it was also selected in patients on dolutegravir. Given the widespread use of dolutegravir, further studies of S147G are required.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!