Transcriptional activation domains (ADs) of gene activators have remained enigmatic for decades as short, extremely variable, and structurally disordered sequences. Using a rational design and high throughput experimentation, we determine the grammar rules and exceptions for the language of ADs. According to identified rules, billions of highly active ADs can be composed of balanced amounts of acidic/aromatic amino acids, with either mixed composition of aromatic residues, or using only one aromatic residue mixed with acidic residues.
View Article and Find Full Text PDFBackground: A highly effective vaccine for malaria remains an elusive target, at least in part due to the under-appreciated natural parasite variation. This study aimed to investigate genetic and structural variation, and immune selection of leading malaria vaccine candidates across the Plasmodium falciparum's life cycle.
Methods: We analysed 325 P.
Background: A highly effective vaccine for malaria remains an elusive target, at least in part due to the under-appreciated natural parasite variation. This study aimed to investigate genetic and structural variation, and immune selection of leading malaria vaccine candidates across the 's life cycle.
Methods: We analyzed 325 whole genome sequences from Zambia, in addition to 791 genomes from five other African countries available in the MalariaGEN Pf3k Rdatabase.
Analysis of factors that lead to the functionality of transcriptional activation domains remains a crucial and yet challenging task owing to the significant diversity in their sequences and their intrinsically disordered nature. Almost all existing methods that have aimed to predict activation domains have involved traditional machine learning approaches, such as logistic regression, that are unable to capture complex patterns in data or plain convolutional neural networks and have been limited in exploration of structural features. However, there is a tremendous potential in the inspection of the structural properties of activation domains, and an opportunity to investigate complex relationships between features of residues in the sequence.
View Article and Find Full Text PDFThe mechanisms by which transcriptional activation domains (tADs) initiate eukaryotic gene expression have been an enigma for decades because most tADs lack specificity in sequence, structure, and interactions with targets. Machine learning analysis of data sets of tAD sequences generated elucidated several functionality rules: the functional tAD sequences should (i) be devoid of or depleted with basic amino acid residues, (ii) be enriched with aromatic and acidic residues, (iii) be with aromatic residues localized mostly near the terminus of the sequence, and acidic residues localized more internally within a span of 20-30 amino acids, (iv) be with both aromatic and acidic residues preferably spread out in the sequence and not clustered, and (v) not be separated by occasional basic residues. These and other more subtle rules are not absolute, reflecting absence of a tAD consensus sequence, enormous variability, and consistent with surfactant-like tAD biochemical properties.
View Article and Find Full Text PDF