Motivation: Read alignment is an essential first step in the characterization of DNA sequence variation. The accuracy of variant-calling results depends not only on the quality of read alignment and variant-calling software but also on the interaction between these complex software tools.
Results: In this review, we evaluate short-read aligner performance with the goal of optimizing germline variant-calling accuracy.
Multispectral, multiplex immunofluorescence (mIF) microscopy has been used to great effect in research to identify cellular co-expression profiles and spatial relationships within tissue, providing a myriad of diagnostic advantages. As these technologies mature, it is essential that image data from mIF microscopes is reproducible and standardizable across devices. We sought to characterize and correct differences in illumination intensity and spectral sensitivity between three multispectral microscopes.
View Article and Find Full Text PDFMultiplex immunohistochemistry/immunofluorescence (mIHC/mIF) is a developing technology that facilitates the evaluation of multiple, simultaneous protein expressions at single-cell resolution while preserving tissue architecture. These approaches have shown great potential for biomarker discovery, yet many challenges remain. Importantly, streamlined cross-registration of multiplex immunofluorescence images with additional imaging modalities and immunohistochemistry (IHC) can help increase the plex and/or improve the quality of the data generated by potentiating downstream processes such as cell segmentation.
View Article and Find Full Text PDFClin Cancer Res
August 2022
Astronomy was among the first disciplines to embrace Big Data and use it to characterize spatial relationships between stars and galaxies. Today, medicine, in particular pathology, has similar needs with regard to characterizing the spatial relationships between cells, with an emphasis on understanding the organization of the tumor microenvironment. In this article, we chronicle the emergence of data-intensive science through the development of the Sloan Digital Sky Survey and describe how analysis patterns and approaches similarly apply to multiplex immunofluorescence (mIF) pathology image exploration.
View Article and Find Full Text PDFResearch organizations are critically in need of directed growth toward future interoperability and federation. The purpose of this Viewpoint is to alert the government, academia, professional societies, foundations, and industries of a further need for consideration of data in chemistry and materials as a long-term and sustained development in the US. This paper is a call for coordinated action from the government, academia, and industry to establish a national strategy and concomitant infrastructure focused on research data.
View Article and Find Full Text PDFSummary: Over the past decade, short-read sequence alignment has become a mature technology. Optimized algorithms, careful software engineering and high-speed hardware have contributed to greatly increased throughput and accuracy. With these improvements, many opportunities for performance optimization have emerged.
View Article and Find Full Text PDFNext-generation tissue-based biomarkers for immunotherapy will likely include the simultaneous analysis of multiple cell types and their spatial interactions, as well as distinct expression patterns of immunoregulatory molecules. Here, we introduce a comprehensive platform for multispectral imaging and mapping of multiple parameters in tumor tissue sections with high-fidelity single-cell resolution. Image analysis and data handling components were drawn from the field of astronomy.
View Article and Find Full Text PDFOverwhelming evidence has shown the significant role of the tumor microenvironment (TME) in governing the triple-negative breast cancer (TNBC) progression. Digital pathology can provide key information about the spatial heterogeneity within the TME using image analysis and spatial statistics. These analyses have been applied to CD8+ T cells, but quantitative analyses of other important markers and their correlations are limited.
View Article and Find Full Text PDFIn large DNA sequence repositories, archival data storage is often coupled with computers that provide 40 or more CPU threads and multiple GPU (general-purpose graphics processing unit) devices. This presents an opportunity for DNA sequence alignment software to exploit high-concurrency hardware to generate short-read alignments at high speed. Arioc, a GPU-accelerated short-read aligner, can compute WGS (whole-genome sequencing) alignments ten times faster than comparable CPU-only alignment software.
View Article and Find Full Text PDFMotivation: DNA sequencing archives have grown to enormous scales in recent years, and thousands of human genomes have already been sequenced. The size of these data sets has made searching the raw read data infeasible without high-performance data-query technology. Additionally, it is challenging to search a repository of short-read data using relational logic and to apply that logic across samples from multiple whole-genome sequencing samples.
View Article and Find Full Text PDFMotivation: The alignment of bisulfite-treated DNA sequences (BS-seq reads) to a large genome involves a significant computational burden beyond that required to align non-bisulfite-treated reads. In the analysis of BS-seq data, this can present an important performance bottleneck that can be mitigated by appropriate algorithmic and software-engineering improvements. One strategy is to modify the read-alignment algorithms by integrating the logic related to BS-seq alignment, with the goal of making the software implementation amenable to optimizations that lead to higher speed and greater sensitivity than might otherwise be attainable.
View Article and Find Full Text PDFWhen computing alignments of DNA sequences to a large genome, a key element in achieving high processing throughput is to prioritize locations in the genome where high-scoring mappings might be expected. We formulated this task as a series of list-processing operations that can be efficiently performed on graphics processing unit (GPU) hardware.We followed this approach in implementing a read aligner called Arioc that uses GPU-based parallel sort and reduction techniques to identify high-priority locations where potential alignments may be found.
View Article and Find Full Text PDFThe analysis of data requires computation: originally by hand and more recently by computers. Different models of computing are designed and optimized for different kinds of data. In data-intensive science, the scale and complexity of data exceeds the comfort zone of local data stores on scientific workstations.
View Article and Find Full Text PDFWe describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space.
View Article and Find Full Text PDFWe describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build - neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.
View Article and Find Full Text PDFIn this paper, we use a statistical estimator developed in astrophysics to study the distribution and organization of features of the human genome. Using the human reference sequence we quantify the global distribution of CpG islands (CGI) in each chromosome and demonstrate that the organization of the CGI across a chromosome is non-random, exhibits surprisingly long range correlations (10 Mb) and varies significantly among chromosomes. These correlations of CGI summarize functional properties of the genome that are not captured when considering variation in any particular separate (and local) feature.
View Article and Find Full Text PDFWe show that the apparent redshift-space clustering of galaxies in the redshift range of 0.2-0.4 provides surprisingly useful constraints on dark-energy components in the Universe, because of the right balance between the density of objects and the survey depth.
View Article and Find Full Text PDF