CRIE: An automated analyzer for Chinese texts.

Behav Res Methods

Graduate Institute of Information and Computer Education, National Taiwan Normal University, Taipei, Taiwan.

Published: December 2016

Textual analysis has been applied to various fields, such as discourse analysis, corpus studies, text leveling, and automated essay evaluation. Several tools have been developed for analyzing texts written in alphabetic languages such as English and Spanish. However, currently there is no tool available for analyzing Chinese-language texts. This article introduces a tool for the automated analysis of simplified and traditional Chinese texts, called the Chinese Readability Index Explorer (CRIE). Composed of four subsystems and incorporating 82 multilevel linguistic features, CRIE is able to conduct the major tasks of segmentation, syntactic parsing, and feature extraction. Furthermore, the integration of linguistic features with machine learning models enables CRIE to provide leveling and diagnostic information for texts in language arts, texts for learning Chinese as a foreign language, and texts with domain knowledge. The usage and validation of the functions provided by CRIE are also introduced.

Download full-text PDF

Source
http://dx.doi.org/10.3758/s13428-015-0649-1DOI Listing

Publication Analysis

Top Keywords

chinese texts
8
linguistic features
8
texts
7
crie
5
crie automated
4
automated analyzer
4
chinese
4
analyzer chinese
4
texts textual
4
textual analysis
4

Similar Publications

A phenotype-based AI pipeline outperforms human experts in differentially diagnosing rare diseases using EHRs.

NPJ Digit Med

January 2025

Department of Computer Science and Technology & Institute for Artificial Intelligence & BNRist, Tsinghua University, Beijing, China.

Rare diseases, affecting ~350 million people worldwide, pose significant challenges in clinical diagnosis due to the lack of experienced physicians and the complexity of differentiating between numerous rare diseases. To address these challenges, we introduce PhenoBrain, a fully automated artificial intelligence pipeline. PhenoBrain utilizes a BERT-based natural language processing model to extract phenotypes from clinical texts in EHRs and employs five new diagnostic models for differential diagnoses of rare diseases.

View Article and Find Full Text PDF

With the development of social media platforms such as Weibo, they have provided a broad platform for the expression of public sentiments during the pandemic. This study aims to explore the emotional attitudes of Chinese netizens toward the COVID-19 opening-up policies and their related thematic characteristics. Using Python, 145,851 texts were collected from the Weibo platform.

View Article and Find Full Text PDF

In the context of Chinese clinical texts, this paper aims to propose a deep learning algorithm based on Bidirectional Encoder Representation from Transformers (BERT) to identify privacy information and to verify the feasibility of our method for privacy protection in the Chinese clinical context. We collected and double-annotated 33,017 discharge summaries from 151 medical institutions on a municipal regional health information platform, developed a BERT-based Bidirectional Long Short-Term Memory Model (BiLSTM) and Conditional Random Field (CRF) model, and tested the performance of privacy identification on the dataset. To explore the performance of different substructures of the neural network, we created five additional baseline models and evaluated the impact of different models on performance.

View Article and Find Full Text PDF

Collocations typically refer to habitual word combinations, which not only occur in texts but also constitute an essential component of the mental lexicon. This study focuses on the mental lexicon of Chinese learners of English as a foreign language (EFL), investigating the representation of collocations and the influence of input frequency and L2 proficiency by employing a phrasal decision task. The findings reveal the following: (1) Collocations elicited faster response times and higher accuracy rates than non-collocations.

View Article and Find Full Text PDF

Purpose: The Chinese community constitutes the largest demographic and faces the highest rates of cancer incidence in Singapore. Given this, palliative care plays a crucial role in supporting individuals, particularly those nearing the end of life, with family serving as their primary source of support. Many Chinese family caregivers in Singapore reported significant unmet needs in cancer care provision, with studies indicating that they often bear the brunt of caregiving responsibilities.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!