Empirical evaluation of a hybrid intelligent monitoring system using different measures of effectiveness.

Bertha Guijarro-Berdiñas Amparo Alonso-Betanzos

Artif Intell Med

Laboratory for Research and Development in Artificial Intelligence (LIDIA), Department of Computer Science, University of A Coruña, Campus de Elviña s/n, 15071, A Coruña, Spain.

Published: January 2002

The validation of a software product is a fundamental part of its development, and focuses on an analysis of whether the software correctly resolves the problems it was designed to tackle. Traditional approaches to validation are based on a comparison of results with what is called a gold standard. Nevertheless, in certain domains, it is not always easy or even possible to establish such a standard. This is the case of intelligent systems that endeavour to simulate or emulate a model of expert behaviour. This article describes the validation of the intelligent system computer-aided foetal evaluator (CAFE), developed for intelligent monitoring of the antenatal condition based on data from the non-stress test (NST), and how this validation was accomplished through a methodology designed to resolve the problem of the validation of intelligent systems. System performance was compared to that of three obstetricians using 3450 min of cardiotocographic (CTG) records corresponding to 53 different patients. From these records different parameters were extracted and interpreted, and thus, the validation was carried out on a parameter-by-parameter basis using measurement techniques such as percentage agreement, the Kappa statistic or cluster analysis. Results showed that the system's agreement with the experts is, in general, similar to agreement between the experts themselves which, in turn, permits our system to be considered at least as skillful as our experts. Throughout our article, the results obtained are commented on with a view to demonstrating how the utilisation of different measures of the level of agreement existing between system and experts can assist not only in assessing the aptness of a system, but also in highlighting its weaknesses. This kind of assessment means that the system can be fine-tuned repeatedly to the point where the expected results are obtained.

Download full-text PDF	Source
http://dx.doi.org/10.1016/s0933-3657(01)00091-4	DOI Listing

Publication Analysis

Top Keywords

intelligent monitoring

intelligent systems

validation intelligent

agreement experts

system

validation

intelligent

empirical evaluation

evaluation hybrid

hybrid intelligent

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!