Publications by John McCrae

Publications by authors named "John McCrae"

Page 1 of 1

DravidianCodeMix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text.

Bharathi Raja Chakravarthi Ruba Priyadharshini Vigneshwaran Muralidaran Navya Jose Shardul Suryawanshi John P McCrae

Lang Resour Eval

February 2022

This paper describes the development of a multilingual, manually annotated dataset for three under-resourced Dravidian languages generated from social media comments. The dataset was annotated for sentiment analysis and offensive language identification for a total of more than 60,000 YouTube comments. The dataset consists of around 44,000 comments in Tamil-English, around 7000 comments in Kannada-English, and around 20,000 comments in Malayalam-English.

View Article and Find Full Text PDF

Toward an Integrative Approach for Making Sense Distinctions.

John P McCrae Theodorus Fransen Sina Ahmadi Paul Buitelaar Koustava Goswami

Front Artif Intell

February 2022

Word senses are the fundamental unit of description in lexicography, yet it is rarely the case that different dictionaries reach any agreement on the number and definition of senses in a language. With the recent rise in natural language processing and other computational approaches there is an increasing demand for quantitatively validated sense catalogues of words, yet no consensus methodology exists. In this paper, we look at four main approaches to making sense distinctions: formal, cognitive, distributional, and intercultural and examine the strengths and weaknesses of each approach.

View Article and Find Full Text PDF

A Survey of Orthographic Information in Machine Translation.

Bharathi Raja Chakravarthi Priya Rani Mihael Arcan John P McCrae

SN Comput Sci

June 2021

Machine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine translation systems is the linguistic difference and variation in orthographic conventions which causes many issues to traditional approaches.

View Article and Find Full Text PDF

Putting patients in control of data from electronic health records.

John P New David Leather Nawar Diar Bakerly John McCrae J Martin Gibson

BMJ

January 2018

View Article and Find Full Text PDF

Monitoring safety in a phase III real-world effectiveness trial: use of novel methodology in the Salford Lung Study.

Sue Collier Catherine Harvey Jill Brewster Nawar Diar Bakerly Hanaa F Elkhenini John McCrae

Pharmacoepidemiol Drug Saf

March 2017

Background: The Salford Lung Study (SLS) programme, encompassing two phase III pragmatic randomised controlled trials, was designed to generate evidence on the effectiveness of a once-daily treatment for asthma and chronic obstructive pulmonary disease in routine primary care using electronic health records.

Objective: The objective of this study was to describe and discuss the safety monitoring methodology and the challenges associated with ensuring patient safety in the SLS. Refinements to safety monitoring processes and infrastructure are also discussed.

View Article and Find Full Text PDF

Synonym set extraction from the biomedical literature by lexical pattern discovery.

John McCrae Nigel Collier

BMC Bioinformatics

March 2008

Background: Although there are a large number of thesauri for the biomedical domain many of them lack coverage in terms and their variant forms. Automatic thesaurus construction based on patterns was first suggested by Hearst 1, but it is still not clear how to automatically construct such patterns for different semantic relations and domains. In particular it is not certain which patterns are useful for capturing synonymy.

View Article and Find Full Text PDF