Publications by authors named "Scott Crossley"

This study documents and assesses the Tool for Automatic Measurement of Morphological Information (TAMMI), which calculates measures related to basic morpheme counts, morphological variety, morphological complexity, morpheme type-token counts, and variables found in the MorphoLex database (Sánchez-Gutiérrez et al., 2018) including morpheme frequency/length, morpheme family size counts and frequency, and morpheme hapax counts. These measures are assessed in two studies that include a word frequency measure as a control variable.

View Article and Find Full Text PDF

This study examines differences in lexical and phraseological complexity features between second language (L2) written and spoken opinion responses classification analysis. The study further examines the characteristics of L2 written and spoken responses that were misclassified in terms of lexical and phraseological differences, L2 learners' vocabulary knowledge, and raters' judgments of L2 use. The goal is to more thoroughly explore potential differences in lexical and phraseological production based on modality.

View Article and Find Full Text PDF

This paper introduces the Persuasive Essays for Rating, Selecting, and Understanding Argumentative and Discourse Elements (PERSUADE) corpus.The PERSUADE corpus is large-scale corpus of writing with annotated discourse elements. The goal of the corpus is to spur the development of new, open-source scoring algorithms that identify discourse elements in argumentative writing to open new avenues for the development of automatic writing evaluation systems that focus more specifically on the semantic and organizational elements of student writing.

View Article and Find Full Text PDF

Comprehension monitoring is a meta-cognitive skill that is defined as the ability to self-evaluate one's comprehension of text. Although it is known that struggling adult readers are poor at monitoring their comprehension, additional research is needed to understand the mechanisms underlying comprehension monitoring and their role in reading comprehension in this population. This study used a comprehension monitoring task with struggling adult readers, which included online eye movements (reread and regression path durations) and an offline verbal protocol (oral explanations of key information).

View Article and Find Full Text PDF

This paper introduces the CommonLit Ease of Readability (CLEAR) corpus, which provides unique readability scores for ~ 5000 text excerpts along with information about the excerpt's year of publishing, genre, and other metadata. The CLEAR corpus will provide researchers interested in discourse processing and reading with a resource from which to develop and test readability metrics and to model text readability. The CLEAR corpus includes a number of improvements in comparison to previous readability corpora including size, breadth of the excerpts available, which cover over 250 years of writing in two different genres, and unique readability criterion provided for each text based on teachers' ratings of text difficulty for student readers.

View Article and Find Full Text PDF

Age of acquisition (AoA) is a measure of word complexity which refers to the age at which a word is typically learned. AoA measures have shown strong correlations with reading comprehension, lexical decision times, and writing quality. AoA scores based on both adult and child data have limitations that allow for error in measurement, and increase the cost and effort to produce.

View Article and Find Full Text PDF

Little quantitative research has explored which clinician skills and behaviors facilitate communication. Mutual understanding is especially challenging when patients have limited health literacy (HL). Two strategies hypothesized to improve communication include matching the complexity of language to patients’ HL (“universal tailoring”); or always using simple language (“universal precautions”).

View Article and Find Full Text PDF

Limited health literacy (HL) partially mediates health disparities. Measurement constraints, including lack of validity assessment across racial/ethnic groups and administration challenges, have undermined the field and impeded scaling of HL interventions. We employed computational linguistics to develop an automated and novel HL measure, analyzing >300,000 messages sent by >9,000 diabetes patients via a patient portal to create a Literacy Profiles.

View Article and Find Full Text PDF

Objective: In the National Library of Medicine funded ECLIPPSE Project (Employing Computational Linguistics to Improve Patient-Provider Secure Emails exchange), we attempted to create novel, valid, and scalable measures of both patients' health literacy (HL) and physicians' linguistic complexity by employing natural language processing (NLP) techniques and machine learning (ML). We applied these techniques to > 400,000 patients' and physicians' secure messages (SMs) exchanged via an electronic patient portal, developing and validating an automated patient literacy profile (LP) and physician complexity profile (CP). Herein, we describe the challenges faced and the solutions implemented during this innovative endeavor.

View Article and Find Full Text PDF

The substantial expansion of secure messaging (SM) via the patient portal in the last decade suggests that it is becoming a standard of care, but few have examined SM use longitudinally. We examined SM patterns among a diverse cohort of patients with diabetes (N = 19 921) and the providers they exchanged messages with within a large, integrated health system over 10 years (2006-2015), linking patient demographics to SM use. We found a 10-fold increase in messaging volume.

View Article and Find Full Text PDF

Objective: To develop novel, scalable, and valid literacy profiles for identifying limited health literacy patients by harnessing natural language processing.

Data Source: With respect to the linguistic content, we analyzed 283 216 secure messages sent by 6941 diabetes patients to physicians within an integrated system's electronic portal. Sociodemographic, clinical, and utilization data were obtained via questionnaire and electronic health records.

View Article and Find Full Text PDF

Patients with diabetes and limited health literacy (HL) may have suboptimal communication exchange with their health care providers and be at elevated risk of adverse health outcomes. These difficulties are generally attributed to patients' reduced ability to both communicate and understand health-related ideas as well as physicians' lack of skill in identifying those with limited HL. Understanding and identifying patients with barriers posed by lower HL to improve healthcare delivery and outcomes is an important research avenue.

View Article and Find Full Text PDF

Background: Low literacy skills impact important aspects of communication, including health-related information exchanges. Unsuccessful communication on the part of physician or patient contributes to lower quality of care, is associated with poorer chronic disease control, jeopardizes patient safety and can lead to unfavorable healthcare utilization patterns. To date, very little research has focused on digital communication between physicians and patients, such as secure messages sent via electronic patient portals.

View Article and Find Full Text PDF

Background: Little is known about patients who have caregiver proxies communicate with healthcare providers via portal secure messaging (SM). Since proxy portal use is often informal (e.g.

View Article and Find Full Text PDF

Limited health literacy is a barrier to optimal healthcare delivery and outcomes. Current measures requiring patients to self-report limitations are time-consuming and may be considered intrusive by some. This makes widespread classification of patient health literacy challenging.

View Article and Find Full Text PDF

This article introduces the second version of the Tool for the Automatic Analysis of Cohesion (TAACO 2.0). Like its predecessor, TAACO 2.

View Article and Find Full Text PDF

This study introduces the second release of the Tool for the Automatic Analysis of Lexical Sophistication (TAALES 2.0), a freely available and easy-to-use text analysis tool. TAALES 2.

View Article and Find Full Text PDF

Health systems are heavily promoting patient portals. However, limited health literacy (HL) can restrict online communication via secure messaging (SM) because patients' literacy skills must be sufficient to convey and comprehend content while clinicians must encourage and elicit communication from patients and match patients' literacy level. This paper describes the Employing Computational Linguistics to Improve Patient-Provider Secure Email (ECLIPPSE) study, an interdisciplinary effort bringing together scientists in communication, computational linguistics, and health services to employ computational linguistic methods to (1) create a novel Linguistic Complexity Profile (LCP) to characterize communications of patients and clinicians and demonstrate its validity and (2) examine whether providers accommodate communication needs of patients with limited HL by tailoring their SM responses.

View Article and Find Full Text PDF

This study introduces the Sentiment Analysis and Cognition Engine (SEANCE), a freely available text analysis tool that is easy to use, works on most operating systems (Windows, Mac, Linux), is housed on a user's hard drive (as compared to being accessed via an Internet interface), allows for batch processing of text files, includes negation and part-of-speech (POS) features, and reports on thousands of lexical categories and 20 component scores related to sentiment, social cognition, and social order. In the study, we validated SEANCE by investigating whether its indices and related component scores can be used to classify positive and negative reviews in two well-known sentiment analysis test corpora. We contrasted the results of SEANCE with those from Linguistic Inquiry and Word Count (LIWC), a similar tool that is popular in sentiment analysis, but is pay-to-use and does not include negation or POS features.

View Article and Find Full Text PDF

This study introduces the Tool for the Automatic Analysis of Cohesion (TAACO), a freely available text analysis tool that is easy to use, works on most operating systems (Windows, Mac, and Linux), is housed on a user's hard drive (rather than having an Internet interface), allows for the batch processing of text files, and incorporates over 150 classic and recently developed indices related to text cohesion. The study validates TAACO by investigating how its indices related to local, global, and overall text cohesion can predict expert judgments of text coherence and essay quality. The findings of this study provide predictive validation of TAACO and support the notion that expert judgments of text coherence and quality are either negatively correlated or not predicted by local and overall text cohesion indices, but are positively predicted by global indices of cohesion.

View Article and Find Full Text PDF

The Writing Pal is an intelligent tutoring system that provides writing strategy training. A large part of its artificial intelligence resides in the natural language processing algorithms to assess essay quality and guide feedback to students. Because writing is often highly nuanced and subjective, the development of these algorithms must consider a broad array of linguistic, rhetorical, and contextual features.

View Article and Find Full Text PDF