Publications by authors named "Alexander S Szalay"

Motivation: Read alignment is an essential first step in the characterization of DNA sequence variation. The accuracy of variant-calling results depends not only on the quality of read alignment and variant-calling software but also on the interaction between these complex software tools.

Results: In this review, we evaluate short-read aligner performance with the goal of optimizing germline variant-calling accuracy.

View Article and Find Full Text PDF

Multispectral, multiplex immunofluorescence (mIF) microscopy has been used to great effect in research to identify cellular co-expression profiles and spatial relationships within tissue, providing a myriad of diagnostic advantages. As these technologies mature, it is essential that image data from mIF microscopes is reproducible and standardizable across devices. We sought to characterize and correct differences in illumination intensity and spectral sensitivity between three multispectral microscopes.

View Article and Find Full Text PDF

Multiplex immunohistochemistry/immunofluorescence (mIHC/mIF) is a developing technology that facilitates the evaluation of multiple, simultaneous protein expressions at single-cell resolution while preserving tissue architecture. These approaches have shown great potential for biomarker discovery, yet many challenges remain. Importantly, streamlined cross-registration of multiplex immunofluorescence images with additional imaging modalities and immunohistochemistry (IHC) can help increase the plex and/or improve the quality of the data generated by potentiating downstream processes such as cell segmentation.

View Article and Find Full Text PDF
Article Synopsis
  • The human brain is an incredibly efficient computing system, operating on just 20 watts of power, and is unmatched in processing information and learning.
  • Recent advancements in stem cell technology have led to the creation of three-dimensional brain organoids that better mimic human brain functions, paving the way for Organoid Intelligence (OI).
  • The first Organoid Intelligence Workshop at Johns Hopkins University aimed to foster a community focused on establishing OI as a new discipline, exploring its potential to revolutionize fields like computing, neuroscience, and drug development.
View Article and Find Full Text PDF

Astronomy was among the first disciplines to embrace Big Data and use it to characterize spatial relationships between stars and galaxies. Today, medicine, in particular pathology, has similar needs with regard to characterizing the spatial relationships between cells, with an emphasis on understanding the organization of the tumor microenvironment. In this article, we chronicle the emergence of data-intensive science through the development of the Sloan Digital Sky Survey and describe how analysis patterns and approaches similarly apply to multiplex immunofluorescence (mIF) pathology image exploration.

View Article and Find Full Text PDF

Research organizations are critically in need of directed growth toward future interoperability and federation. The purpose of this Viewpoint is to alert the government, academia, professional societies, foundations, and industries of a further need for consideration of data in chemistry and materials as a long-term and sustained development in the US. This paper is a call for coordinated action from the government, academia, and industry to establish a national strategy and concomitant infrastructure focused on research data.

View Article and Find Full Text PDF
Article Synopsis
  • * Research revealed significant differences in gene expression related to antipsychotic drug toxicity in the human caudate nucleus, but no major differences in DNA methylation, echoing similarities found between schizophrenia cases and controls.
  • * The study highlights varying gene expression changes in different brain regions and compares findings to a mouse model, noting some similarities but also significant differences, indicating that the effects of antipsychotics may stabilize over the long term.
View Article and Find Full Text PDF

Summary: Over the past decade, short-read sequence alignment has become a mature technology. Optimized algorithms, careful software engineering and high-speed hardware have contributed to greatly increased throughput and accuracy. With these improvements, many opportunities for performance optimization have emerged.

View Article and Find Full Text PDF
Article Synopsis
  • DNA methylation (DNAm) is crucial for gene regulation and is influenced by environmental factors, and this study investigates its role in the brains of neurotypical individuals versus those with schizophrenia.
  • Using whole-genome bisulfite sequencing on 344 brain tissue samples, researchers found that a significant proportion of genetic variation (like SNPs) affects local methylation levels, particularly around CpG and CpH sites.
  • The study suggests that regions of the genome differentially affected by schizophrenia risk variants explain much of the heritability linked to schizophrenia, highlighting the potential of these epigenetic changes in understanding the disorder.
View Article and Find Full Text PDF

Next-generation tissue-based biomarkers for immunotherapy will likely include the simultaneous analysis of multiple cell types and their spatial interactions, as well as distinct expression patterns of immunoregulatory molecules. Here, we introduce a comprehensive platform for multispectral imaging and mapping of multiple parameters in tumor tissue sections with high-fidelity single-cell resolution. Image analysis and data handling components were drawn from the field of astronomy.

View Article and Find Full Text PDF

Overwhelming evidence has shown the significant role of the tumor microenvironment (TME) in governing the triple-negative breast cancer (TNBC) progression. Digital pathology can provide key information about the spatial heterogeneity within the TME using image analysis and spatial statistics. These analyses have been applied to CD8+ T cells, but quantitative analyses of other important markers and their correlations are limited.

View Article and Find Full Text PDF

In large DNA sequence repositories, archival data storage is often coupled with computers that provide 40 or more CPU threads and multiple GPU (general-purpose graphics processing unit) devices. This presents an opportunity for DNA sequence alignment software to exploit high-concurrency hardware to generate short-read alignments at high speed. Arioc, a GPU-accelerated short-read aligner, can compute WGS (whole-genome sequencing) alignments ten times faster than comparable CPU-only alignment software.

View Article and Find Full Text PDF
Article Synopsis
  • DNA methylation (DNAm) plays a crucial role in regulating gene expression during the development of the prenatal brain, which is a complex and changing tissue.
  • Researchers investigated methylation patterns at over 39 million sites in the prenatal cortex and discovered dynamic changes that are linked to nearby gene expression and associated with neuropsychiatric disorders.
  • The study identified differences in DNAm between sexes during prenatal development and confirmed that the changes primarily involve CpG methylation, offering insights into both brain development and potential mental health issues later in life.
View Article and Find Full Text PDF

Motivation: DNA sequencing archives have grown to enormous scales in recent years, and thousands of human genomes have already been sequenced. The size of these data sets has made searching the raw read data infeasible without high-performance data-query technology. Additionally, it is challenging to search a repository of short-read data using relational logic and to apply that logic across samples from multiple whole-genome sequencing samples.

View Article and Find Full Text PDF

Motivation: The alignment of bisulfite-treated DNA sequences (BS-seq reads) to a large genome involves a significant computational burden beyond that required to align non-bisulfite-treated reads. In the analysis of BS-seq data, this can present an important performance bottleneck that can be mitigated by appropriate algorithmic and software-engineering improvements. One strategy is to modify the read-alignment algorithms by integrating the logic related to BS-seq alignment, with the goal of making the software implementation amenable to optimizations that lead to higher speed and greater sensitivity than might otherwise be attainable.

View Article and Find Full Text PDF

When computing alignments of DNA sequences to a large genome, a key element in achieving high processing throughput is to prioritize locations in the genome where high-scoring mappings might be expected. We formulated this task as a series of list-processing operations that can be efficiently performed on graphics processing unit (GPU) hardware.We followed this approach in implementing a read aligner called Arioc that uses GPU-based parallel sort and reduction techniques to identify high-priority locations where potential alignments may be found.

View Article and Find Full Text PDF

The analysis of data requires computation: originally by hand and more recently by computers. Different models of computing are designed and optimized for different kinds of data. In data-intensive science, the scale and complexity of data exceeds the comfort zone of local data stores on scientific workstations.

View Article and Find Full Text PDF

We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space.

View Article and Find Full Text PDF

We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build - neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.

View Article and Find Full Text PDF

In this paper, we use a statistical estimator developed in astrophysics to study the distribution and organization of features of the human genome. Using the human reference sequence we quantify the global distribution of CpG islands (CGI) in each chromosome and demonstrate that the organization of the CGI across a chromosome is non-random, exhibits surprisingly long range correlations (10 Mb) and varies significantly among chromosomes. These correlations of CGI summarize functional properties of the genome that are not captured when considering variation in any particular separate (and local) feature.

View Article and Find Full Text PDF

We show that the apparent redshift-space clustering of galaxies in the redshift range of 0.2-0.4 provides surprisingly useful constraints on dark-energy components in the Universe, because of the right balance between the density of objects and the survey depth.

View Article and Find Full Text PDF