63 results match your criteria: "Institute of Statistical Sciences[Affiliation]"

Taking into account the local dependence structure in large-scale multiple testing is expected to improve both the efficiency of the testing procedure and the interpretability of scientific findings. The hidden Markov model (HMM), as an effective model to describe the sequential dependence, has been successfully applied to large-scale multiple testing with local correlations. However, in many applications, the first-order Markov chain is not flexible enough to capture the complexity of local correlations.

View Article and Find Full Text PDF

Large-scale selection of highly informative microhaplotypes for ancestry inference and population specific informativeness.

Forensic Sci Int Genet

January 2025

Institute of Statistical Sciences, School of Mathematics, Woodland Road, University of Bristol, Bristol BS8 1UG, UK; MRC Integrative Epidemiology Unit, School of Medicine, Oakfield Grove, University of Bristol, Bristol BS8 2BN, UK. Electronic address:

Microhaplotypes (MHs) describe physically close genetic markers that are inherited together and are gaining prominence due to their efficiency in forensic, clinical, and population studies. They excel in kinship analysis, DNA mixture detection, and ancestry inference, offering advantages in precision over individual SNPs and STRs. In this study, a pipeline was developed to efficiently select highly informative MHs from large-scale genomic datasets.

View Article and Find Full Text PDF

Benchmarking Mendelian randomization methods for causal inference using genome-wide association study summary statistics.

Am J Hum Genet

August 2024

Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong, China; Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Big Data Bio-Intelligence Lab, The Hong Kong University of Science and Technology, Hong Kong SAR, China. Electronic address:

Mendelian randomization (MR), which utilizes genetic variants as instrumental variables (IVs), has gained popularity as a method for causal inference between phenotypes using genetic data. While efforts have been made to relax IV assumptions and develop new methods for causal inference in the presence of invalid IVs due to confounding, the reliability of MR methods in real-world applications remains uncertain. Instead of using simulated datasets, we conducted a benchmark study evaluating 16 two-sample summary-level MR methods using real-world genetic datasets to provide guidelines for the best practices.

View Article and Find Full Text PDF

How admixed captive breeding populations could be rescued using local ancestry information.

Mol Ecol

April 2024

RZSS WildGenes Laboratory, Conservation Department, Royal Zoological Society of Scotland, Edinburgh, UK.

This paper asks the question: can genomic information be used to recover a species that is already on the pathway to extinction due to genetic swamping from a related and more numerous population? We show that a breeding strategy in a captive breeding program can use whole genome sequencing to identify and remove segments of DNA introgressed through hybridisation. The proposed policy uses a generalized measure of kinship or heterozygosity accounting for local ancestry, that is, whether a specific genetic location was inherited from the target of conservation. We then show that optimizing these measures would minimize undesired ancestry while also controlling kinship and/or heterozygosity, in a simulated breeding population.

View Article and Find Full Text PDF
Article Synopsis
  • The study analyzes 317 ancient genomes from Mesolithic and Neolithic periods across northern and western Eurasia to understand human migration impacts during the Holocene.* -
  • Findings show a significant genetic divide between eastern and western populations, with the west experiencing major gene replacement due to the introduction of farming, while the east maintained its hunter-gatherer ancestry longer.* -
  • The Yamnaya culture, which emerged around 5,000 BP, played a crucial role in spreading ancestry across western Eurasia, leading to significant genetic changes in European populations.*
View Article and Find Full Text PDF

Major migration events in Holocene Eurasia have been characterized genetically at broad regional scales. However, insights into the population dynamics in the contact zones are hampered by a lack of ancient genomic data sampled at high spatiotemporal resolution. Here, to address this, we analysed shotgun-sequenced genomes from 100 skeletons spanning 7,300 years of the Mesolithic period, Neolithic period and Early Bronze Age in Denmark and integrated these with proxies for diet (C and N content), mobility (Sr/Sr ratio) and vegetation cover (pollen).

View Article and Find Full Text PDF

The Holocene (beginning around 12,000 years ago) encompassed some of the most significant changes in human evolution, with far-reaching consequences for the dietary, physical and mental health of present-day populations. Using a dataset of more than 1,600 imputed ancient genomes, we modelled the selection landscape during the transition from hunting and gathering, to farming and pastoralism across West Eurasia. We identify key selection signals related to metabolism, including that selection at the FADS cluster began earlier than previously reported and that selection near the LCT locus predates the emergence of the lactase persistence allele by thousands of years.

View Article and Find Full Text PDF

Motivation: The utilization of single-cell bisulfite sequencing (scBS-seq) methods allows for precise analysis of DNA methylation patterns at the individual cell level, enabling the identification of rare populations, revealing cell-specific epigenetic changes, and improving differential methylation analysis. Nonetheless, the presence of sparse data and an overabundance of zeros and ones, attributed to limited sequencing depth and coverage, frequently results in reduced precision accuracy during the process of differential methylation detection using scBS-seq. Consequently, there is a pressing demand for an innovative differential methylation analysis approach that effectively tackles these data characteristics and enhances recognition accuracy.

View Article and Find Full Text PDF
Article Synopsis
  • The study investigates the link between dietary knowledge and muscle mass in Chinese individuals aged 60 and older, utilizing data from the China Health and Nutrition Survey from 2006 and 2011.
  • There is a significant prevalence of low muscle mass (31.20%), especially among females, and those with lower muscle mass had notably lower dietary knowledge scores.
  • While the cross-sectional results suggest that higher dietary knowledge correlates with lower odds of low muscle mass, the longitudinal analysis did not find a strong association, indicating the need for further research.
View Article and Find Full Text PDF

Hemoglobin level is negatively associated with sarcopenia and its components in Chinese aged 60 and above.

Front Public Health

March 2023

Department of Rehabilitation Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan, China.

Introduction: Sarcopenia and low hemoglobin level are common in older adults. Few studies have evaluated the association between hemoglobin level and sarcopenia and with inconsistent findings. The multifaceted effects of sarcopenia on the human body and the high prevalence of anemia in the Chinese population make it necessary to explore the association between the two.

View Article and Find Full Text PDF

Aims: To investigate the association between alanine transaminase (ALT) and in-hospital death in patients admitted to the intensive care unit for diabetic ketoacidosis (DKA).

Methods: A cohort of 2,684 patients was constructed from the eICU Collaborative Research Database. Baseline demographic and clinical characteristics were summarized.

View Article and Find Full Text PDF

scDLC: a deep learning framework to classify large sample single-cell RNA-seq data.

BMC Genomics

July 2022

Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, Kunming, China.

Background: Using single-cell RNA sequencing (scRNA-seq) data to diagnose disease is an effective technique in medical research. Several statistical methods have been developed for the classification of RNA sequencing (RNA-seq) data, including, for example, Poisson linear discriminant analysis (PLDA), negative binomial linear discriminant analysis (NBLDA), and zero-inflated Poisson logistic discriminant analysis (ZIPLDA). Nevertheless, few existing methods perform well for large sample scRNA-seq data, in particular when the distribution assumption is also violated.

View Article and Find Full Text PDF

CLARITY: comparing heterogeneous data using dissimilarity.

R Soc Open Sci

December 2021

Unité Eco-Anthropologie (EA), Muséum National d'Histoire Naturelle, 17 place du Trocadero, Paris 75016, France.

Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the (dis)similarities between entities are conserved across such different data. Our method, CLARITY, quantifies consistency across datasets, identifies where inconsistencies arise and aids in their interpretation.

View Article and Find Full Text PDF

Background: The incidence of nontuberculous mycobacterial lung disease (NTM-LD) is increasing worldwide. Immune exhaustion has been reported in NTM-LD, but T-cell immunoglobulin and mucin domain-containing protein 3 (TIM3), a co-inhibitory receptor on T cells, has been scarcely studied.

Methods: Patients with NTM-LD and healthy controls were prospectively recruited from July 2014 to August 2019 at three tertiary referral centers in Taiwan.

View Article and Find Full Text PDF

Controlling latent tuberculosis infection (LTBI) is important for preventing tuberculosis (TB). However, the immune regulation of LTBI remains uncertain. Immune checkpoints and CD14+ monocytes are pivotal for immune defense but have been scarcely studied in LTBI.

View Article and Find Full Text PDF

In medical studies, the collected covariates contain underlying outliers. For clustered/longitudinal data with censored observations, the traditional Gehan-type estimator is robust to outliers in response but sensitive to outliers in the covariate domain, and it also ignores the within-cluster correlations. To take account of within-cluster correlations, varying cluster sizes, and outliers in covariates, we propose weighted Gehan-type estimating functions for parameter estimation in the accelerated failure time model for clustered data.

View Article and Find Full Text PDF

Background: Identifying differentially expressed genes between the same or different species is an urgent demand for biological and medical research. For RNA-seq data, systematic technical effects and different sequencing depths are usually encountered when conducting experiments. Normalization is regarded as an essential step in the discovery of biologically important changes in expression.

View Article and Find Full Text PDF
Article Synopsis
  • - The study investigates intrinsic resistance (IR) in lung adenocarcinoma patients who do not respond to EGFR-tyrosine kinase inhibitors (TKIs), focusing on the role of epigenomic factors like DNA methylation.
  • - Researchers analyzed DNA methylation in tumors from 79 patients and confirmed their findings in a larger group of 163 patients, identifying 216 CpG sites linked to patient response to treatment.
  • - The results highlight that specific homeobox gene methylation patterns can help identify patients less likely to benefit from EGFR-TKIs, potentially leading to more personalized treatment approaches.
View Article and Find Full Text PDF

Bulk and single-cell RNA-seq (scRNA-seq) data are being used as alternatives to traditional technology in biology and medicine research. These data are used, for example, for the detection of differentially expressed (DE) genes. Several statistical methods have been developed for the classification of bulk and single-cell RNA-seq data.

View Article and Find Full Text PDF

Model estimation and selection for partial linear varying coefficient EV models with longitudinal data.

J Appl Stat

March 2021

College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Shenzhen University, Shenzhen, People's Republic of China.

In this paper, we consider the estimation and model selection for longitudinal partial linear varying coefficient errors-in-variables (EV) models when the covariates are measured with some additive errors. Bias-corrected penalized quadratic inference functions method is proposed based on quadratic inference functions with two penalty function terms. The proposed method can not only handle the measurement errors of covariates and within-subject correlations but also estimate and select significant non-zero parametric and nonparametric components simultaneously.

View Article and Find Full Text PDF

Selecting Classification Methods for Small Samples of Next-Generation Sequencing Data.

Front Genet

March 2021

Shenzhen Key Laboratory of Advanced Machine Learning and Applications, College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen University, Shenzhen, China.

Next-generation sequencing has emerged as an essential technology for the quantitative analysis of gene expression. In medical research, RNA sequencing (RNA-seq) data are commonly used to identify which type of disease a patient has. Because of the discrete nature of RNA-seq data, the existing statistical methods that have been developed for microarray data cannot be directly applied to RNA-seq data.

View Article and Find Full Text PDF

Early detection is crucial to improve breast cancer (BC) patients' outcomes and survival. Mammogram and ultrasound adopting the Breast Imaging Reporting and Data System (BI-RADS) categorization are widely used for BC early detection, while suffering high false-positive rate leading to unnecessary biopsy, especially in BI-RADS category-4 patients. Plasma cell-free DNA (cfDNA) carrying on DNA methylation information has emerged as a non-invasive approach for cancer detection.

View Article and Find Full Text PDF

Epidemiology of Virus Infection and Human Cancer.

Recent Results Cancer Res

January 2021

Genomics Research Center, Academia Sinica, 128 Academia Road, Sect. 2, Taipei, 115, Taiwan.

Article Synopsis
  • Seven viruses, including EBV, HBV, HCV, KSHV, HIV-1, HTLV-1, and HPV, are classified as Group 1 human carcinogens by the IARC based on epidemiological and mechanistic studies.
  • These viruses can directly or indirectly cause various cancers, with some individuals developing cancer while others do not, highlighting the complexity of cancer risk associated with these infections.
  • Research has led to the development of risk calculators to predict the likelihood of specific cancers related to these viruses, and effective interventions like vaccination and antiviral therapies have shown a reduction in cancer incidence.
View Article and Find Full Text PDF

The M1/M2 spectrum and plasticity of malignant pleural effusion-macrophage in advanced lung cancer.

Cancer Immunol Immunother

May 2021

Graduate Institute of Toxicology, College of Medicine, National Taiwan University, No.1, Section 4, Ren-Ai Rd, Taipei, 100, Taiwan.

Article Synopsis
  • MPE-Mφ from lung cancer patients exhibit a mix of M1 and M2 macrophage characteristics, showing the ability to shift between these states and affecting patient outcomes.
  • A study of 147 stage-IV lung adenocarcinoma patients revealed a two-gene MPE-Mφ signature (IL-1β and TGF-β1) that could effectively predict survival rates, demonstrating significant associations with different expression patterns and immune markers.
  • Additionally, a strategy to repolarize MPE-Mφ towards the anti-cancer M1 phenotype using β-glucan and IFN-γ was found to enhance anti-cancer activity, suggesting potential therapeutic approaches to improve treatment outcomes.
View Article and Find Full Text PDF