J Am Med Inform Assoc
November 2024
Objective: The COVID-19 pandemic emphasized the value of geospatial visual analytics for both epidemiologists and the general public. However, systems struggled to encode temporal and geospatial trends of multiple, potentially interacting variables, such as active cases, deaths, and vaccinations. We sought to ask (1) how epidemiologists interact with visual analytics tools, (2) how multiple, time-varying, geospatial variables can be conveyed in a unified view, and (3) how complex spatiotemporal encodings affect utility for both experts and non-experts.
View Article and Find Full Text PDFPattern Recognit Lett
April 2023
Region expansion-the growth of regions to include all points within a certain distance of their perimeters-is a basic, widely applicable operation, but is expensive to perform exactly. It has been shown that, if the solution is approximated by relaxing the distance metric to the L-norm, efficiency can be greatly improved using properties of quadtrees. The method as described, however, requires the quadtrees to be square, both for the metric and the particular details of the algorithm.
View Article and Find Full Text PDFThough exponentially growing health-related literature has been made available to a broad audience online, the language of scientific articles can be difficult for the general public to understand. Therefore, adapting this expert-level language into plain language versions is necessary for the public to reliably comprehend the vast health-related literature. Deep Learning algorithms for automatic adaptation are a possible solution; however, gold standard datasets are needed for proper evaluation.
View Article and Find Full Text PDFJ Am Med Inform Assoc
October 2022
Objective: Plain language in medicine has long been advocated as a way to improve patient understanding and engagement. As the field of Natural Language Processing has progressed, increasingly sophisticated methods have been explored for the automatic simplification of existing biomedical text for consumers. We survey the literature in this area with the goals of characterizing approaches and applications, summarizing existing resources, and identifying remaining challenges.
View Article and Find Full Text PDFData visualizations convert numbers into visual marks so that our visual system can extract data from an image instead of raw numbers. Clearly, the visual system does not compute these values as a computer would, as an arithmetic mean or a correlation. Instead, it extracts these patterns using perceptual proxies; heuristic shortcuts of the visual marks, such as a center of mass or a shape envelope.
View Article and Find Full Text PDFThe MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets.
View Article and Find Full Text PDFPerceptual tasks in visualizations often involve comparisons. Of two sets of values depicted in two charts, which set had values that were the highest overall? Which had the widest range? Prior empirical work found that the performance on different visual comparison tasks (e.g.
View Article and Find Full Text PDFMicrobiome
November 2018
Data are often viewed as a single set of values, but those values frequently must be compared with another set. The existing evaluations of designs that facilitate these comparisons tend to be based on intuitive reasoning, rather than quantifiable measures. We build on this work with a series of crowdsourced experiments that use low-level perceptual comparison tasks that arise frequently in comparisons within data visualizations (e.
View Article and Find Full Text PDFWhen performing bioforensic casework, it is important to be able to reliably detect the presence of a particular organism in a metagenomic sample, even if the organism is only present in a trace amount. For this task, it is common to use a sequence classification program that determines the taxonomic affiliation of individual sequence reads by comparing them to reference database sequences. As metagenomic data sets often consist of millions or billions of reads that need to be compared to reference databases containing millions of sequences, such sequence classification programs typically use search heuristics and databases with reduced sequence diversity to speed up the analysis, which can lead to incorrect assignments.
View Article and Find Full Text PDFMash extends the MinHash dimensionality-reduction technique to include a pairwise mutation distance and P value significance test, enabling the efficient clustering and search of massive sequence collections. Mash reduces large sequences and sequence sets to small, representative sketches, from which global mutation distances can be rapidly estimated. We demonstrate several use cases, including the clustering of all 54,118 NCBI RefSeq genomes in 33 CPU h; real-time database search using assembled or unassembled Illumina, Pacific Biosciences, and Oxford Nanopore data; and the scalable clustering of hundreds of metagenomic samples by composition.
View Article and Find Full Text PDFHigh consequence human pathogenic viruses must be handled at biosafety level 2, 3 or 4 and must be rendered non-infectious before they can be utilized for molecular or immunological applications at lower biosafety levels. Here we evaluate psoralen-inactivated Arena-, Bunya-, Corona-, Filo-, Flavi- and Orthomyxoviruses for their suitability as antigen in immunological processes and as template for reverse transcription PCR and sequencing. The method of virus inactivation using a psoralen molecule appears to have broad applicability to RNA viruses and to leave both the particle and RNA of the treated virus intact, while rendering the virus non-infectious.
View Article and Find Full Text PDFWhole-genome sequences are now available for many microbial species and clades, however existing whole-genome alignment methods are limited in their ability to perform sequence comparisons of multiple sequences simultaneously. Here we present the Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform.
View Article and Find Full Text PDFStaphylococcus aureus subsp. aureus ATCC 25923 is commonly used as a control strain for susceptibility testing to antibiotics and as a quality control strain for commercial products. We present the completed genome sequence for the strain, consisting of the chromosome and a 27.
View Article and Find Full Text PDFThe Bacillus anthracis Carbosap genome, which includes the pXO1 and pXO2 plasmids, has been shown to encode the major B. anthracis virulence factors, yet this strain's attenuation has not yet been explained. Here we report the draft genome sequence of this strain, and a comparison to fully virulent B.
View Article and Find Full Text PDFWe describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations. MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost.
View Article and Find Full Text PDFBackground: Although genome-wide transcriptional analysis has been used for many years to study bacterial gene expression, many aspects of the bacterial transcriptome remain undefined. One example is antisense transcription, which has been observed in a number of bacteria, though the function of antisense transcripts, and their distribution across the bacterial genome, is still unclear.
Methodology/principal Findings: Single-stranded RNA-seq results revealed a widespread and non-random pattern of antisense transcription covering more than two thirds of the B.
Background: A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables.
View Article and Find Full Text PDFSummary: Bisulfite sequencing allows cytosine methylation, an important epigenetic marker, to be detected via nucleotide substitutions. Since the Applied Biosystems SOLiD System uses a unique di-base encoding that increases confidence in the detection of nucleotide substitutions, it is a potentially advantageous platform for this application. However, the di-base encoding also makes reads with many nucleotide substitutions difficult to align to a reference sequence with existing tools, preventing the platform's potential utility for bisulfite sequencing from being realized.
View Article and Find Full Text PDFAlthough gene expression has been studied in bacteria for decades, many aspects of the bacterial transcriptome remain poorly understood. Transcript structure, operon linkages, and information on absolute abundance all provide valuable insights into gene function and regulation, but none has ever been determined on a genome-wide scale for any bacterium. Indeed, these aspects of the prokaryotic transcriptome have been explored on a large scale in only a few instances, and consequently little is known about the absolute composition of the mRNA population within a bacterial cell.
View Article and Find Full Text PDFUnlabelled: Here, we report the development of SOCS (short oligonucleotide color space), a program designed for efficient and flexible mapping of Applied Biosystems SOLiD sequence data onto a reference genome. SOCS performs its mapping within the context of 'color space', and it maximizes usable data by allowing a user-specified number of mismatches. Sequence census functions facilitate a variety of functional genomics applications, including transcriptome mapping and profiling, as well as ChIP-Seq.
View Article and Find Full Text PDF