Publications by authors named "Slobodan Vucetic"

Introduction: Social participation for emerging symbolic communicators on the autism spectrum is often restricted. This is due in part to the time and effort required for both children and partners to use traditional augmentative and alternative communication (AAC) technologies during fast-paced social routines. Innovations in artificial intelligence provide the potential for context-aware AAC technology that can provide just-in-time communication options based on linguistic input from partners to minimize the time and effort needed to use AAC technologies for social participation.

View Article and Find Full Text PDF

Millions of individuals who have limited or no functional speech use augmentative and alternative communication (AAC) technology to participate in daily life and exercise the human right to communication. While advances in AAC technology lag significantly behind those in other technology sectors, mainstream technology innovations such as artificial intelligence (AI) present potential for the future of AAC. However, a new future of AAC will only be as effective as it is responsive to the needs and dreams of the people who rely upon it every day.

View Article and Find Full Text PDF

Background: Biomedical Relation Extraction (RE) is essential for uncovering complex relationships between biomedical entities within text. However, training RE classifiers is challenging in low-resource biomedical applications with few labeled examples.

Methods: We explore the potential of Shortest Dependency Paths (SDPs) to aid biomedical RE, especially in situations with limited labeled examples.

View Article and Find Full Text PDF

Purpose: Augmentative and alternative communication (AAC) technology innovation is urgently needed to improve outcomes for children on the autism spectrum who are minimally verbal. One potential technology innovation is applying artificial intelligence (AI) to automate strategies such as augmented input to increase language learning opportunities while mitigating communication partner time and learning barriers. Innovation in AAC research and design methodology is also needed to empirically explore this and other applications of AI to AAC.

View Article and Find Full Text PDF

Relation Extraction (RE) is an important task in extracting structured data from free biomedical text. Obtaining labeled data needed to train RE models in specialized domains such as biomedicine can be very expensive because it requires expert knowledge. Thus, it is often the case that RE models need to be trained from relatively small labeled data sets.

View Article and Find Full Text PDF

Identification of procedures using International Classification of Diseases or Healthcare Common Procedure Coding System codes is challenging when conducting medical claims research. We demonstrate how Pointwise Mutual Information can be used to find associated codes. We apply the method to an investigation of racial differences in breast cancer outcomes.

View Article and Find Full Text PDF

Background: Additional evaluations, including second opinions, before breast cancer surgery may improve care, but may cause detrimental treatment delays that could allow disease progression.

Aims: We investigate the timing of surgical delays that are associated with survival benefits conferred by preoperative encounters versus the timing that are associated with potential harm.

Methods And Results: We investigated survival outcomes of SEER Medicare patients with stage 1-3 breast cancer using propensity score-based weighting.

View Article and Find Full Text PDF

Healthcare providers generate a medical claim after every patient visit. A medical claim consists of a list of medical codes describing the diagnosis and any treatment provided during the visit. Medical claims have been popular in medical research as a data source for retrospective cohort studies.

View Article and Find Full Text PDF

Background: Health providers create Electronic Health Records (EHRs) to describe the conditions and procedures used to treat their patients. Medical notes entered by medical staff in the form of free text are a particularly insightful component of EHRs. There is a great interest in applying machine learning tools on medical notes in numerous medical informatics applications.

View Article and Find Full Text PDF

Given the growing number of cancer survivors, it is important to better understand socio-spatial mobility patterns of cancer patients after diagnosis that could have public health implications regarding post-diagnostic access to care for treatment and follow-up surveillance. In this exploratory study, residential histories from LexisNexis were linked to New Jersey colon cancer cases diagnosed from 2006 to 2011 to examine differences in socio-spatial mobility patterns after diagnosis by stage at cancer diagnosis, sex, and race/ethnicity. For the colon cancer cases, we summarized and compared the number of residences and changes in the residential census tract and neighborhood poverty after the diagnosis.

View Article and Find Full Text PDF

Potts models and variational autoencoders (VAEs) have recently gained popularity as generative protein sequence models (GPSMs) to explore fitness landscapes and predict mutation effects. Despite encouraging results, current model evaluation metrics leave unclear whether GPSMs faithfully reproduce the complex multi-residue mutational patterns observed in natural sequences due to epistasis. Here, we develop a set of sequence statistics to assess the "generative capacity" of three current GPSMs: the pairwise Potts Hamiltonian, the VAE, and the site-independent model.

View Article and Find Full Text PDF

Purpose: Cutaneous T-cell lymphoma (CTCL) is a rare type of non-Hodgkin lymphoma. Previous studies have reported geographic clustering of CTCL based on the residence at the time of diagnosis. We explore geographic clustering of CTCL using both the residence at the time of diagnosis and past residences using data from the New Jersey State Cancer Registry.

View Article and Find Full Text PDF

Landscape characteristics have been shown to influence health outcomes, but few studies have examined their relationship with cancer survival. We used data from the National Land Cover Database to examine associations between regional-stage colon cancer survival and 27 different landscape metrics. The study population included all adult New Jersey residents diagnosed between 2006 and 2011.

View Article and Find Full Text PDF

Incorporation of physical principles in a machine learning (ML) architecture is a fundamental step toward the continued development of artificial intelligence for inorganic materials. As inspired by the Pauling's rule, we propose that structure motifs in inorganic crystals can serve as a central input to a machine learning framework. We demonstrated that the presence of structure motifs and their connections in a large set of crystalline compounds can be converted into unique vector representations using an unsupervised learning algorithm.

View Article and Find Full Text PDF

Background: Identifying geospatial cancer survival disparities is critical to focus interventions and prioritize efforts with limited resources. Incorporating residential mobility into spatial models may result in different geographic patterns of survival compared with the standard approach using a single location based on the patient's residence at the time of diagnosis.

Methods: Data on 3,949 regional-stage colon cancer cases diagnosed from 2006 to 2011 and followed until December 31, 2016, were obtained from the New Jersey State Cancer Registry.

View Article and Find Full Text PDF

The pointwise mutual information statistic (PMI), which measures how often two words occur together in a document corpus, is a cornerstone of recently proposed popular natural language processing algorithms such as word2vec. PMI and word2vec reveal semantic relationships between words and can be helpful in a range of applications such as document indexing, topic analysis, or document categorization. We use probability theory to demonstrate the relationship between PMI and word2vec.

View Article and Find Full Text PDF

Background: Residential histories linked to cancer registry data provide new opportunities to examine cancer outcomes by neighborhood socioeconomic status (SES). We examined differences in regional stage colon cancer survival estimates comparing models using a single neighborhood SES at diagnosis to models using neighborhood SES from residential histories.

Methods: We linked regional stage colon cancers from the New Jersey State Cancer Registry diagnosed from 2006 to 2011 to LexisNexis administrative data to obtain residential histories.

View Article and Find Full Text PDF

Protein secondary structure prediction remains a vital topic with broad applications. Due to lack of a widely accepted standard in secondary structure predictor evaluation, a fair comparison of predictors is challenging. A detailed examination of factors that contribute to higher accuracy is also lacking.

View Article and Find Full Text PDF

Representing words as low dimensional vectors is very useful in many natural language processing tasks. This idea has been extended to medical domain where medical codes listed in medical claims are represented as vectors to facilitate exploratory analysis and predictive modeling. However, depending on a type of a medical provider, medical claims can use medical codes from different ontologies or from a combination of ontologies, which complicates learning of the representations.

View Article and Find Full Text PDF

Background: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.

Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes.

View Article and Find Full Text PDF

Various deep learning models have recently been applied to predictive modeling of Electronic Health Records (EHR). In medical claims data, which is a particular type of EHR data, each patient is represented as a sequence of temporally ordered irregularly sampled visits to health providers, where each visit is recorded as an unordered set of medical codes specifying patient's diagnosis and treatment provided during the visit. Based on the observation that different patient conditions have different temporal progression patterns, in this paper we propose a novel interpretable deep learning model, called Timeline.

View Article and Find Full Text PDF

Protein loops connect regular secondary structures and contain 4-residue beta turns which represent 63% of the residues in loops. The commonly used classification of beta turns (Type I, I', II, II', VIa1, VIa2, VIb, and VIII) was developed in the 1970s and 1980s from analysis of a small number of proteins of average resolution, and represents only two thirds of beta turns observed in proteins (with a generic class Type IV representing the rest). We present a new clustering of beta-turn conformations from a set of 13,030 turns from 1074 ultra-high resolution protein structures (≤1.

View Article and Find Full Text PDF

Background: There has been an increasing interest in learning low-dimensional vector representations of medical concepts from Electronic Health Records (EHRs). Vector representations of medical concepts facilitate exploratory analysis and predictive modeling of EHR data to gain insights about the patterns of care and health outcomes. EHRs contain structured data such as diagnostic codes and laboratory tests, as well as unstructured free text data in form of clinical notes, which provide more detail about condition and treatment of patients.

View Article and Find Full Text PDF

There has been an increasing interest in learning low-dimensional vector representations of medical concepts from electronic health records (EHRs). While EHRs contain structured data such as diagnostic codes and laboratory tests, they also contain unstructured clinical notes, which provide more nuanced details on a patient's health status. In this work, we propose a method that jointly learns medical concept and word representations.

View Article and Find Full Text PDF

Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.

View Article and Find Full Text PDF