Estimating dataset size requirements for classifying DNA microarray data.

Sayan Mukherjee Pablo Tamayo Simon Rogers Ryan Rifkin Anna Engle Colin Campbell Todd R Golub Jill P Mesirov

J Comput Biol

Whitehead Institute/Massachusetts Institute of Technology Center for Genome Research, Cambridge, MA 02139, USA.

Published: July 2003

A statistical methodology for estimating dataset size requirements for classifying microarray data using learning curves is introduced. The goal is to use existing classification results to estimate dataset size requirements for future classification experiments and to evaluate the gain in accuracy and significance of classifiers built with additional data. The method is based on fitting inverse power-law models to construct empirical learning curves. It also includes a permutation test procedure to assess the statistical significance of classification performance for a given dataset size. This procedure is applied to several molecular classification problems representing a broad spectrum of levels of complexity.

Download full-text PDF	Source
http://dx.doi.org/10.1089/106652703321825928	DOI Listing

Publication Analysis

Top Keywords

dataset size

size requirements

estimating dataset

requirements classifying

microarray data

learning curves

size

classifying dna

dna microarray

data statistical

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!