Background: Due to its late stage of diagnosis lung cancer is the commonest cause of death from cancer in the UK. Existing epidemiological risk models in clinical usage, which have Positive Predictive Values (PPV) of less than 10%, do not consider the temporal relations expressed in sequential electronic health record (EHR) data. We aimed to build a model for lung cancer early detection in primary care using machine learning with deep 'transformer' models on EHR data to learn from these complex sequential 'care pathways'.
View Article and Find Full Text PDFRaman spectroscopy is widely used across scientific domains to characterize the chemical composition of samples in a nondestructive, label-free manner. Many applications entail the unmixing of signals from mixtures of molecular species to identify the individual components present and their proportions, yet conventional methods for chemometrics often struggle with complex mixture scenarios encountered in practice. Here, we develop hyperspectral unmixing algorithms based on autoencoder neural networks, and we systematically validate them using both synthetic and experimental benchmark datasets created in-house.
View Article and Find Full Text PDFThe high binding affinity of antibodies toward their cognate targets is key to eliciting effective immune responses, as well as to the use of antibodies as research and therapeutic tools. Here, we propose ANTIPASTI, a convolutional neural network model that achieves state-of-the-art performance in the prediction of antibody binding affinity using as input a representation of antibody-antigen structures in terms of normal mode correlation maps derived from elastic network models. This representation captures not only structural features but energetic patterns of local and global residue fluctuations.
View Article and Find Full Text PDFCurrent anticancer therapies suffer from issues such as off-target side effects and the emergence of drug resistance; therefore, the discovery of alternative therapeutic approaches is vital. These can include the development of drugs with different modes of action, and the exploration of new biomolecular targets. For the former, there has been increasing interest in drugs that are activated by an external stimulus (e.
View Article and Find Full Text PDFBackground: Identifying clusters of diseases may aid understanding of shared aetiology, management of co-morbidities, and the discovery of new disease associations. Our study aims to identify disease clusters using a large set of long-term conditions and comparing methods that use the co-occurrence of diseases versus methods that use the sequence of disease development in a person over time.
Methods: We use electronic health records from over ten million people with multimorbidity registered to primary care in England.
A signal mixer facilitates rich computation, which has been the building block of modern telecommunication. This frequency mixing produces new signals at the sum and difference frequencies of input signals, enabling powerful operations such as heterodyning and multiplexing. Here, we report that a neuron is a signal mixer.
View Article and Find Full Text PDFRaman spectroscopy is a nondestructive and label-free chemical analysis technique, which plays a key role in the analysis and discovery cycle of various branches of science. Nonetheless, progress in Raman spectroscopic analysis is still impeded by the lack of software, methodological and data standardization, and the ensuing fragmentation and lack of reproducibility of analysis workflows thereof. To address these issues, we introduce , an open-source Python package for Raman spectroscopic research and analysis.
View Article and Find Full Text PDFObjective: Natural language processing (NLP) algorithms are increasingly being applied to obtain unsupervised representations of electronic health record (EHR) data, but their comparative performance at predicting clinical endpoints remains unclear. Our objective was to compare the performance of unsupervised representations of sequences of disease codes generated by bag-of-words versus sequence-based NLP algorithms at predicting clinically relevant outcomes.
Materials And Methods: This cohort study used primary care EHRs from 6 286 233 people with Multiple Long-Term Conditions in England.
Background: Identifying clusters of co-occurring diseases may help characterise distinct phenotypes of Multiple Long-Term Conditions (MLTC). Understanding the associations of disease clusters with health-related outcomes requires a strategy to assign clusters to people, but it is unclear how the performance of strategies compare.
Aims: First, to compare the performance of methods of assigning disease clusters to people at explaining mortality, emergency department attendances and hospital admissions over one year.
Objective: To determine the extent to which the choice of timeframe used to define a long term condition affects the prevalence of multimorbidity and whether this varies with sociodemographic factors.
Design: Retrospective study of disease code frequency in primary care electronic health records.
Data Sources: Routinely collected, general practice, electronic health record data from the Clinical Practice Research Datalink Aurum were used.
Allostery is one of the cornerstones of biological function, as it plays a fundamental role in regulating protein activity. The modelling of allostery has gradually moved from a conformation-based framework, linked to structural changes, to dynamics-based allostery, whereby the effects of ligand binding propagate via signal transduction from the allosteric site to other regions of the protein via inter-residue interactions. Characterising such allosteric signalling pathways, which do not necessarily lead to conformational changes, has been pursued experimentally and complemented by computational analysis of protein networks to detect subtle dynamic propagation paths.
View Article and Find Full Text PDFMeasurements of systems taken along a continuous functional dimension, such as time or space, are ubiquitous in many fields, from the physical and biological sciences to economics and engineering. Such measurements can be viewed as realisations of an underlying smooth process sampled over the continuum. However, traditional methods for independence testing and causal learning are not directly applicable to such data, as they do not take into account the dependence along the functional dimension.
View Article and Find Full Text PDFMultivariate time-series data that capture the temporal evolution of interconnected systems are ubiquitous in diverse areas. Understanding the complex relationships and potential dependencies among co-observed variables is crucial for the accurate statistical modelling and analysis of such systems. Here, we introduce kernel-based statistical tests of joint independence in multivariate time series by extending the -variable Hilbert-Schmidt independence criterion to encompass both stationary and non-stationary processes, thus allowing broader real-world applications.
View Article and Find Full Text PDFFrom the perspective of human mobility, the COVID-19 pandemic constituted a natural experiment of enormous reach in space and time. Here, we analyse the inherent multiple scales of human mobility using Facebook Movement maps collected before and during the first UK lockdown. Firstly, we obtain the pre-lockdown UK mobility graph and employ multiscale community detection to extract, in an unsupervised manner, a set of robust partitions into flow communities at different levels of coarseness.
View Article and Find Full Text PDFObjectives: To determine whether the frequency of diagnostic codes for long-term conditions (LTCs) in primary care electronic healthcare records (EHRs) is associated with (1) disease coding incentives, (2) General Practice (GP), (3) patient sociodemographic characteristics and (4) calendar year of diagnosis.
Design: Retrospective cohort study.
Setting: GPs in England from 2015 to 2022 contributing to the Clinical Practice Research Datalink Aurum dataset.
The statistical structure of the environment is often important when making decisions. There are multiple theories of how the brain represents statistical structure. One such theory states that neural activity spontaneously samples from probability distributions.
View Article and Find Full Text PDFRecently, random lasing in complex networks has shown efficient lasing over more than 50 localised modes, promoted by multiple scattering over the underlying graph. If controlled, these network lasers can lead to fast-switching multifunctional light sources with synthesised spectrum. Here, we observe both in experiment and theory high sensitivity of the network laser spectrum to the spatial shape of the pump profile, with some modes for example increasing in intensity by 280% when switching off 7% of the pump beam.
View Article and Find Full Text PDFBackground: Real-time prediction is key to prevention and control of infections associated with health-care settings. Contacts enable spread of many infections, yet most risk prediction frameworks fail to account for their dynamics. We developed, tested, and internationally validated a real-time machine-learning framework, incorporating dynamic patient-contact networks to predict hospital-onset COVID-19 infections (HOCIs) at the individual level.
View Article and Find Full Text PDFInhibiting the main protease of SARS-CoV-2 is of great interest in tackling the COVID-19 pandemic caused by the virus. Most efforts have been centred on inhibiting the binding site of the enzyme. However, considering allosteric sites, distant from the active or orthosteric site, broadens the search space for drug candidates and confers the advantages of allosteric drug targeting.
View Article and Find Full Text PDFAllostery commonly refers to the mechanism that regulates protein activity through the binding of a molecule at a different, usually distal, site from the orthosteric site. The omnipresence of allosteric regulation in nature and its potential for drug design and screening render the study of allostery invaluable. Nevertheless, challenges remain as few computational methods are available to effectively predict allosteric sites, identify signalling pathways involved in allostery, or to aid with the design of suitable molecules targeting such sites.
View Article and Find Full Text PDFDirected acyclic graphs (DAGs) are a useful tool to represent, in a graphical format, researchers' assumptions about the causal structure among variables while providing a rationale for the choice of confounding variables to adjust for. With origins in the field of probabilistic graphical modelling, DAGs are yet to be widely adopted in applied health research, where causal assumptions are frequently made for the purpose of evaluating health services initiatives. In this context, there is still limited practical guidance on how to construct and use DAGs.
View Article and Find Full Text PDFDimension is a fundamental property of objects and the space in which they are embedded. Yet ideal notions of dimension, as in Euclidean spaces, do not always translate to physical spaces, which can be constrained by boundaries and distorted by inhomogeneities, or to intrinsically discrete systems such as networks. To take into account locality, finiteness and discreteness, dynamical processes can be used to probe the space geometry and define its dimension.
View Article and Find Full Text PDFBackground: Global sustainability is an enmeshed system of complex socioeconomic, climatological, and ecological interactions. The numerous objectives of the UN's Sustainable Development Goals (SDGs) and the Paris Agreement have various levels of interdependence, making it difficult to ascertain the influence of changes to particular indicators across the whole system. In this analysis, we aimed to detect and rank the complex interlinkages between objectives of sustainability agendas.
View Article and Find Full Text PDF