A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 176

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 1034
Function: getPubMedXML

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3152
Function: GetPubMedArticleOutput_2016

File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 316
Function: require_once

DACE: a scalable DP-means algorithm for clustering extremely large sequence data. | LitMetric

DACE: a scalable DP-means algorithm for clustering extremely large sequence data.

Bioinformatics

Bioinformatics Division, Center for Synthetic and Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, China.

Published: March 2017

Motivation: Advancements in next-generation sequencing technology have produced large amounts of reads at low cost in a short time. In metagenomics, 16S and 18S rRNA gene have been widely used as marker genes to profile diversity of microorganisms in environmental samples. Through clustering of sequencing reads we can determine both number of OTUs and their relative abundance. In many applications, clustering of very large sequencing data with high efficiency and accuracy is essential for downstream analysis.

Results: Here, we report a scalable D irichlet Process Means (DP-means) a lgorithm for c lustering e xtremely large sequencing data, termed . With an efficient random projection partition strategy for parallel clustering, DACE can cluster billions of sequences within a couple of hours. Experimental results show that DACE runs between 6 and 80 times faster than state-of-the-art programs, while maintaining overall better clustering accuracy. Using 80 cores, DACE clustered the Lake Taihu 16S rRNA gene sequencing data (∼316M reads, 30 GB) in 25 min, and the Ocean TARA Eukaryotic 18S rRNA gene sequencing data (∼500M reads, 88 GB) into ∼100 000 clusters within an hour. When applied to the IGC gene catalogs in human gut microbiome (∼10M genes), DACE produced 9.8M clusters with 52K redundant genes in 1.5 hours of running time.

Availability And Implementation: DACE is available at https://github.com/tinglab/DACE .

Contacts: tingchen@mail.tsinghua.edu.cn or ningchen@mail.tsinghua.edu.cn.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btw722DOI Listing

Publication Analysis

Top Keywords

sequencing data
16
rrna gene
12
18s rrna
8
large sequencing
8
gene sequencing
8
dace
6
data
6
sequencing
6
clustering
5
dace scalable
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!