The true accuracy of a machine-learning model is a population-level statistic that cannot be observed directly. In practice, predictor performance is estimated against one or more test datasets, and the accuracy of this estimate strongly depends on how well the test sets represent all possible unseen datasets. Here we describe paired evaluation as a simple, robust approach for evaluating performance of machine-learning models in small-sample biological and clinical studies.
View Article and Find Full Text PDFWe performed quantitative proteomics on 60 human-derived breast cancer cell line models to a depth of ~13,000 proteins. The resulting high-throughput datasets were assessed for quality and reproducibility. We used the datasets to identify and characterize the subtypes of breast cancer and showed that they conform to known transcriptional subtypes, revealing that molecular subtypes are preserved even in under-sampled protein feature sets.
View Article and Find Full Text PDFAdvanced solid cancers are complex assemblies of tumor, immune, and stromal cells characterized by high intratumoral variation. We use highly multiplexed tissue imaging, 3D reconstruction, spatial statistics, and machine learning to identify cell types and states underlying morphological features of known diagnostic and prognostic significance in colorectal cancer. Quantitation of these features in high-plex marker space reveals recurrent transitions from one tumor morphology to the next, some of which are coincident with long-range gradients in the expression of oncogenes and epigenetic regulators.
View Article and Find Full Text PDFHighly multiplexed tissue imaging makes detailed molecular analysis of single cells possible in a preserved spatial context. However, reproducible analysis of large multichannel images poses a substantial computational challenge. Here, we describe a modular and open-source computational pipeline, MCMICRO, for performing the sequential steps needed to transform whole-slide images into single-cell data.
View Article and Find Full Text PDFBacterial cells construct many structures, such as the flagellar hook and the type III secretion system (T3SS) injectisome, that aid in crucial physiological processes such as locomotion and pathogenesis. Both of these structures involve long extracellular channels, and the length of these channels must be highly regulated in order for these structures to perform their intended functions. There are two leading models for how length control is achieved in the flagellar hook and T3SS needle: the substrate switching model, in which the length is controlled by assembly of an inner rod, and the ruler model, in which a molecular ruler controls the length.
View Article and Find Full Text PDFProtein turnover is vital to cellular homeostasis. Many proteins are degraded efficiently only after they have been post-translationally "tagged" with a polyubiquitin chain. Ubiquitylation is a form of Post-Translational Modification (PTM): addition of a ubiquitin to the chain is catalyzed by E3 ligases, and removal of ubiquitin is catalyzed by a De-UBiquitylating enzyme (DUB).
View Article and Find Full Text PDFThere is growing interest in generating physicochemical and biological analytical data sets to compare complex mixture drugs, for example, products from different manufacturers. In this work, we compare various crofelemer samples prepared from a single lot by filtration with varying molecular weight cutoffs combined with incubation for different times at different temperatures. The 2 preceding articles describe experimental data sets generated from analytical characterization of fractionated and degraded crofelemer samples.
View Article and Find Full Text PDFCrofelemer is a botanical polymeric proanthocyanidin that inhibits chloride channel activity and is used clinically for treating HIV-associated secretory diarrhea. Crofelemer lots may exhibit significant physicochemical variation due to the natural source of the raw material. A variety of physical, chemical, and biological assays were used to identify potential critical quality attributes (CQAs) of crofelemer, which may be useful in characterizing differently sourced and processed drug products.
View Article and Find Full Text PDFAs the second of a 3-part series of articles in this issue concerning the development of a mathematical model for comparative characterization of complex mixture drugs using crofelemer (CF) as a model compound, this work focuses on the evaluation of the chemical stability profile of CF. CF is a biopolymer containing a mixture of proanthocyanidin oligomers which are primarily composed of gallocatechin with a small contribution from catechin. CF extracted from drug product was subjected to molecular weight-based fractionation and thiolysis.
View Article and Find Full Text PDFType III Secretion Systems (T3SS) are complex bacterial structures that provide gram-negative pathogens with a unique virulence mechanism whereby they grow a needle-like structure in order to inject bacterial effector proteins into the cytoplasm of a host cell. Numerous experiments have been performed to understand the structural details of this nanomachine during the past decade. Despite the concerted efforts of molecular and structural biologists, several crucial aspects of the assembly of this structure, such as the regulation of the length of the needle itself, remain unclear.
View Article and Find Full Text PDF