Publications by authors named "Bida Gu"

Article Synopsis
  • * It achieves a high level of completeness, closing 92% of previous assembly gaps and fully assembling complex regions, including 1,852 complex structural variants and 1,246 human centromeres.
  • * The findings lead to significant improvements in genotyping accuracy and enable the detection of over 26,000 structural variants per sample, enhancing the potential for future disease association research.
View Article and Find Full Text PDF

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies.

View Article and Find Full Text PDF

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples.

View Article and Find Full Text PDF

Background: Homology-based recombination (HR) is the cornerstone of genetic mapping. However, a lack of sufficient sequence homology or the presence of a genomic rearrangement prevents HR through crossing, which inhibits genetic mapping in relevant genomic regions. This is particularly true in species hybrids whose genomic sequences are highly divergent along with various genome arrangements, making the mapping of genetic loci, such as hybrid incompatibility (HI) loci, through crossing impractical.

View Article and Find Full Text PDF

Roughly 3% of the human genome is composed of variable-number tandem repeats (VNTRs): arrays of motifs at least six bases. These loci are highly polymorphic, yet current approaches that define and merge variants based on alignment breakpoints do not capture their full diversity. Here we present a method vamos: VNTR Annotation using efficient Motif Sets that instead annotates VNTR using repeat composition under different levels of motif diversity.

View Article and Find Full Text PDF

Modeling longitudinal trajectories and identifying latent classes of trajectories is of great interest in biomedical research, and software to identify latent classes of such is readily available for latent class trajectory analysis (LCTA), growth mixture modeling (GMM) and covariance pattern mixture models (CPMM). In biomedical applications, the level of within-person correlation is often non-negligible, which can impact the model choice and interpretation. LCTA does not incorporate this correlation.

View Article and Find Full Text PDF

Background: Noninvasive prenatal testing (NIPT) is one of the most commonly employed clinical measures for screening of fetal aneuploidy. Fetal Fraction (ff) has been demonstrated to be one of the key factors affecting the performance of NIPT. Accurate quantification of ff plays vital role in NIPT.

View Article and Find Full Text PDF