Research into Intrusion and Anomaly Detectors at the Host level typically pays much attention to extracting attributes from system call traces. These include window-based, Hidden Markov Models, and sequence-model-based attributes. Recently, several works have been focusing on sequence-model-based feature extractors, specifically Word2Vec and GloVe, to extract embeddings from the system call traces due to their ability to capture semantic relationships among system calls. However, due to the nature of the data, these extractors introduce inconsistencies in the extracted features, causing the Machine Learning models built on them to yield inaccurate and potentially misleading results. In this paper, we first highlight the research challenges posed by these extractors. Then, we conduct experiments with new feature sets assessing their suitability to address the detected issues. Our experiments show that Word2Vec is prone to introducing more duplicated samples than GloVe. Regarding the solutions proposed, we found that concatenating the embedding vectors generated by Word2Vec and GloVe yields the overall best balanced accuracy. In addition to resolving the challenge of data leakage, this approach enables an improvement in performance relative to other alternatives.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10077957 | PMC |
http://dx.doi.org/10.1007/s44248-023-00002-y | DOI Listing |
Oncotarget
December 2024
The FDA approval on September 29, 2023, for "class III " blood tests to assess hereditary cancer risk make widely available tests that may be obtained through a Direct to Consumer (DTC) path. There is concern that germ-line predisposition tests may not be reimbursed by insurance adding financial burdens to individuals and families. It is generally agreed in the fields on oncology and genetics that germ-line testing for disease susceptibility including cancer is best performed under care of a healthcare provider with genetic counseling.
View Article and Find Full Text PDFBackground: Recent developments in physiological and digital biomarkers provide an opportunity to shift the first diagnostic steps to the home-setting, thus allowing earlier detection and treatment of Alzheimer's disease (AD). Blood-based, magnetic resonance imaging, electrophysiological, digital and microbiome biomarkers have shown great promise and call for an evaluation of their accuracy, feasibility and safety in primary care and the community. The aim of PREDICTOM is to develop and test the accuracy of an artificial intelligence (AI) driven screening platform for the prediction and early detection of AD and to extend the clinical pathway to home-based screening using established and novel biomarkers.
View Article and Find Full Text PDFBackground: Aging is the strongest risk factor for Alzheimer's disease (AD). Accordingly, identifying biomarkers of accelerated aging is a major focus of AD prevention research. Current MRI-based "aging clocks" (i.
View Article and Find Full Text PDFBackground: Late-Onset Alzheimer's Disease (LOAD) is characterized by genetic heterogeneity and there is no single model explaining the genetic mode of inheritance. To date, more than 70 genetic loci associated with AD have been identified but they explain only a small proportion of AD heritability. Structural variants (SVs) may explain some of the missing AD heritability, and specifically, their segregation in AD families has yet to be investigated.
View Article and Find Full Text PDFBackground: Aging is the strongest risk factor for Alzheimer's disease (AD). Accordingly, identifying biomarkers of accelerated aging is a major focus of AD prevention research. Current MRI-based "aging clocks" (i.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!