High-throughput complement component 4 genomic sequence analysis with C4Investigator.

bioRxiv

Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States.

Published: July 2023

AI Article Synopsis

  • - The complement component 4 (C4) gene locus, found on chromosome 6, produces the C4 protein which is crucial for immune system modulation and debris clearance, involving copy number variation and potentially influencing disease susceptibility.
  • - C4's composition can vary due to the presence of HERV retrovirus and exhibits different forms (long and short), with specific blood group antigens linked to its proteins (C4A and C4B).
  • - To better understand this genetic variability, a new bioinformatics tool called C4Investigator has been created to analyze C4 gene sequences from genomic data, providing insights into its complex variations from the 1000 Genomes Project.

Article Abstract

The complement component 4 gene locus, composed of the and genes and located on chromosome 6, encodes for C4 protein, a key intermediate in the classical and lectin pathways of the complement system. The complement system is an important modulator of immune system activity and is also involved in the clearance of immune complexes and cellular debris. The gene locus exhibits copy number variation, with each composite gene varying between 0-5 copies per haplotype, genes also vary in size depending on the presence of the HERV retrovirus in intron 9, denoted by for long-form and for short-form, which modulates expression and is found in both and . Additionally, human blood group antigens Rodgers and Chido are located on the C4 protein, with the Rodger epitope generally found on C4A protein, and the Chido epitope generally found on C4B protein. copy number variation has been implicated in numerous autoimmune and pathogenic diseases. Despite the central role of C4 in immune function and regulation, high-throughput genomic sequence analysis of variants has been impeded by the high degree of sequence similarity and complex genetic variation exhibited by these genes. To investigate C4 variation using genomic sequencing data, we have developed a novel bioinformatic pipeline for comprehensive, high-throughput characterization of human sequence from short-read sequencing data, named C4Investigator. Using paired-end targeted or whole genome sequence data as input, C4Investigator determines gene copy number for overall and , additionally, C4Ivestigator reports the full overall aligned sequence, enabling nucleotide level analysis of . To demonstrate the utility of this workflow we have analyzed variation in the 1000 Genomes Project Dataset, showing that the genes are highly poly-allelic with many variants that have the potential to impact C4 protein function.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10370142PMC
http://dx.doi.org/10.1101/2023.07.18.549551DOI Listing

Publication Analysis

Top Keywords

copy number
12
complement component
8
genomic sequence
8
sequence analysis
8
gene locus
8
complement system
8
number variation
8
epitope generally
8
sequencing data
8
sequence
6

Similar Publications

LINE-1 (L1) retrotransposition is widespread in many cancers, especially those with a high burden of chromosomal rearrangements. However, whether and to what degree L1 activity directly impacts genome integrity is unclear. Here, we apply whole-genome sequencing to experimental models of L1 expression to comprehensively define the spectrum of genomic changes caused by L1.

View Article and Find Full Text PDF

The shelterin complex protects chromosome ends from the DNA damage repair machinery and regulates telomerase access to telomeres. Shelterin is composed of six proteins (TRF1, TRF2, TIN2, TPP1, POT1 and RAP1) that can assemble into various subcomplexes . However, the stoichiometry of the shelterin complex and its dynamic association with telomeres in cells is poorly defined.

View Article and Find Full Text PDF

Copy number variants (CNVs) are prevalent in both diploid and haploid genomes, with the latter containing a single copy of each gene. Studying CNVs in genomes from single or few cells is significantly advancing our knowledge in human disorders and disease susceptibility. Low-input including low-cell and single-cell sequencing data for haploid and diploid organisms generally displays shallow and highly non-uniform read counts resulting from the whole genome amplification steps that introduce amplification biases.

View Article and Find Full Text PDF

Apolipoprotein E4 (APOE4) is the strongest genetic risk factor for sporadic Alzheimer's disease (AD). Individuals with one copy of APOE4 exhibit greater amyloid-beta (Aβ) deposition compared to noncarriers, an effect that is even more pronounced in APOE4 homozygotes. Interestingly, APOE4 carriers not only show more AD pathology but also experience more rapid cognitive decline, particularly in episodic memory.

View Article and Find Full Text PDF

Introduction: Accurate genotyping of Killer cell Immunoglobulin-like Receptor (KIR) genes plays a pivotal role in enhancing our understanding of innate immune responses, disease correlations, and the advancement of personalized medicine. However, due to the high variability of the KIR region and high level of sequence similarity among different KIR genes, the generic genotyping workflows are unable to accurately infer copy numbers and complete genotypes of individual KIR genes from next-generation sequencing data. Thus, specialized genotyping tools are needed to genotype this complex region.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!