ALCHEMY: a reliable method for automated SNP genotype calling for small batch sizes and highly homozygous populations.

Bioinformatics

Department of Biological Statistics and Computational Biology, 102 Weill Hall, Cornell University, Ithaca, NY 14853, USA.

Published: December 2010

Motivation: The development of new high-throughput genotyping products requires a significant investment in testing and training samples to evaluate and optimize the product before it can be used reliably on new samples. One reason for this is current methods for automated calling of genotypes are based on clustering approaches which require a large number of samples to be analyzed simultaneously, or an extensive training dataset to seed clusters. In systems where inbred samples are of primary interest, current clustering approaches perform poorly due to the inability to clearly identify a heterozygote cluster.

Results: As part of the development of two custom single nucleotide polymorphism genotyping products for Oryza sativa (domestic rice), we have developed a new genotype calling algorithm called 'ALCHEMY' based on statistical modeling of the raw intensity data rather than modelless clustering. A novel feature of the model is the ability to estimate and incorporate inbreeding information on a per sample basis allowing accurate genotyping of both inbred and heterozygous samples even when analyzed simultaneously. Since clustering is not used explicitly, ALCHEMY performs well on small sample sizes with accuracy exceeding 99% with as few as 18 samples.

Availability: ALCHEMY is available for both commercial and academic use free of charge and distributed under the GNU General Public License at http://alchemy.sourceforge.net/

Contact: mhw6@cornell.edu

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2982150PMC
http://dx.doi.org/10.1093/bioinformatics/btq533DOI Listing

Publication Analysis

Top Keywords

genotype calling
8
genotyping products
8
clustering approaches
8
samples analyzed
8
analyzed simultaneously
8
samples
5
alchemy reliable
4
reliable method
4
method automated
4
automated snp
4

Similar Publications

Replication Study and Meta-Analysis of the Contribution of Seven Genetic Polymorphisms in Immune-Related Genes to the Risk of Gastric and Colorectal Cancers.

Int J Immunogenet

January 2025

Department of Biological Science and Technology, School of Chemistry, Chemical Engineering and Life Sciences, Wuhan University of Technology, Wuhan, Hubei, China.

Recently, it has been realized that immune processes participate in the pathogenesis of human cancers. A large number of genetic polymorphisms in immune-related genes have been extensively examined for their roles in the susceptibility of gastric cancer (GC) and colorectal cancer (CRC), including IL4 gene rs2070874, IL4RA gene rs1801275, IL18 gene rs187238, IL18RAP gene rs917997, IL17A gene rs8193036, IL23R gene rs1884444 and IL23R gene rs10889677. However, there is no consistent conclusion, which calls for further research.

View Article and Find Full Text PDF

Genotypic and phenotypic diversity of Mycobacterium tuberculosis strains from eastern India.

Infect Genet Evol

January 2025

Immunogenomics & Systems Biology group, Institute of Life Sciences (ILS), Bhubaneswar, Odisha, India; School of Biotechnology, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar, Odisha, India. Electronic address:

Whole genome sequencing has been used to investigate the genomic diversity of M. tuberculosis in the northern and southern states of India, but information about the eastern part of the country is still limited. Through a sequencing-based strategy, this study seeks to comprehend the diversity and drug resistance pattern in the eastern region.

View Article and Find Full Text PDF

Applying artificial intelligence to uncover the genetic landscape of coagulation factors.

J Thromb Haemost

January 2025

Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20090 Pieve Emanuele, Milan, Italy; IRCCS Humanitas Research Hospital - via Manzoni 56, 20089 Rozzano, Milan, Italy. Electronic address:

Artificial intelligence (AI) is rapidly advancing our ability to identify and interpret genetic variants associated with coagulation factor deficiencies. This review introduces AI, with a specific focus on machine learning (ML) methods, and examines its applications in the field of coagulation genetics over the past decade. We observed a significant increase in AI-related publications, with a focus on hemophilia A and B.

View Article and Find Full Text PDF

The frequency of mitochondrial DNA haplogroups (mtDNA-HG) in humans is known to be shaped by migration and repopulation. Mounting evidence indicates that mtDNA-HG are not phenotypically neutral, and selection may contribute to its distribution. Haplogroup H, the most abundant in Europe, improved survival in sepsis.

View Article and Find Full Text PDF

Background: Anorexia nervosa (AN) is a polygenic, severe metabopsychiatric disorder with poorly understood aetiology. Eight significant loci have been identified by genome-wide association studies (GWAS) and single nucleotide polymorphism (SNP)-based heritability was estimated to be ~ 11-17, yet causal variants remain elusive. It is therefore important to define the full spectrum of genetic variants in the wider regions surrounding these significantly associated loci.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!