AI Article Synopsis

  • Existing cancer benchmark datasets face limitations due to reliance on germline variants, synthetic methods, or costly validations, making them inadequate for representing true somatic variation in whole genomes.
  • The proposed dataset, Lineage derived Somatic Truth (LinST), consists of verified short somatic mutations from the HT115 colon cancer cell line.
  • This dataset includes thousands of mutations and covers a significant 2.7 gigabases per sample, providing a robust resource for cancer research.

Article Abstract

Existing cancer benchmark data sets for human sequencing data use germline variants, synthetic methods, or expensive validations, none of which are satisfactory for providing a large collection of true somatic variation across a whole genome. Here we propose a data set, Lineage derived Somatic Truth (LinST), of short somatic mutations in the HT115 colon cancer cell-line, that are validated using a known cell lineage that includes thousands of mutations and a high confidence region covering 2.7 gigabases per sample.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7722876PMC
http://dx.doi.org/10.1038/s42003-020-01460-9DOI Listing

Publication Analysis

Top Keywords

somatic truth
8
validated lineage-derived
4
somatic
4
lineage-derived somatic
4
truth data
4
data set
4
set enables
4
enables benchmarking
4
benchmarking cancer
4
cancer genome
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!