The structural maintenance of chromosome (SMC) complexes-cohesin and condensins-are crucial for chromosome separation and compaction during cell division. During the interphase, mammalian cohesins additionally fold the genome into loops and domains. Here we show that, in Caenorhabditis elegans, a species with holocentric chromosomes, condensin I is the primary, long-range loop extruder.
View Article and Find Full Text PDFThe computational design of synthetic DNA sequences with designer in vivo properties is gaining traction in the field of synthetic genomics. We propose here a computational method which combines a kinetic Monte Carlo framework with a deep mutational screening based on deep learning predictions. We apply our method to build regular nucleosome arrays with tailored nucleosomal repeat lengths (NRL) in yeast.
View Article and Find Full Text PDFTranscription generates local topological and mechanical constraints on the DNA fiber, leading to the generation of supercoiled chromosome domains in bacteria. However, the global impact of transcription on chromosome organization remains elusive, as the scale of genes and operons in bacteria remains well below the resolution of chromosomal contact maps generated using Hi-C (~5-10 kb). Here we combined sub-kb Hi-C contact maps and chromosome engineering to visualize individual transcriptional units.
View Article and Find Full Text PDFWe designed and synthesized , which is ∼21.6% shorter than native , the smallest chromosome in . was designed for attachment to another synthetic chromosome due to concerns surrounding potential instability and karyotype imbalance and is now attached to , yielding the first synthetic yeast fusion chromosome.
View Article and Find Full Text PDFHere, we report the design, construction, and characterization of a tRNA neochromosome, a designer chromosome that functions as an additional, de novo counterpart to the native complement of Saccharomyces cerevisiae. Intending to address one of the central design principles of the Sc2.0 project, the ∼190-kb tRNA neochromosome houses all 275 relocated nuclear tRNA genes.
View Article and Find Full Text PDFThe hallmarks of chromosome organization in multicellular eukaryotes are chromosome territories (CT), chromatin compartments, and insulated domains, including topologically associated domains (TADs). Yet, most of these elements of chromosome organization are derived from analyses of a limited set of model organisms, while large eukaryotic groups, including insects, remain mostly unexplored. Here we combine Hi-C, biophysical modeling, and microscopy to characterize the 3D genome architecture of the silkworm, Bombyx mori.
View Article and Find Full Text PDFEukaryotic genomes vary in terms of size, chromosome number, and genetic complexity. Their temporal organization is complex, reflecting coordination between DNA folding and function. Here, we used fused karyotypes of budding yeast to characterize the effects of chromosome length on nuclear architecture.
View Article and Find Full Text PDFThe tremendous amount of biological sequence data available, combined with the recent methodological breakthrough in deep learning in domains such as computer vision or natural language processing, is leading today to the transformation of bioinformatics through the emergence of deep genomics, the application of deep learning to genomic sequences. We review here the new applications that the use of deep learning enables in the field, focusing on three aspects: the functional annotation of genomes, the sequence determinants of the genome functions and the possibility to write synthetic genomic sequences.
View Article and Find Full Text PDFThe artificial 601 DNA sequence is often used to constrain the position of nucleosomes on a DNA molecule in vitro. Although the ability of the 147 base pair sequence to precisely position a nucleosome in vitro is well documented, application of this property in vivo has been explored only in a few studies and yielded contradictory conclusions. Our goal in the present study was to test the ability of the 601 sequence to dictate nucleosome positioning in Saccharomyces cerevisiae in the context of a long tandem repeat array inserted in a yeast chromosome.
View Article and Find Full Text PDFBackground: Genome-wide association studies have identified statistical associations between various diseases, including cancers, and a large number of single-nucleotide polymorphisms (SNPs). However, they provide no direct explanation of the mechanisms underlying the association. Based on the recent discovery that changes in three-dimensional genome organization may have functional consequences on gene regulation favoring diseases, we investigated systematically the genome-wide distribution of disease-associated SNPs with respect to a specific feature of 3D genome organization: topologically associating domains (TADs) and their borders.
View Article and Find Full Text PDFWhile many computational methods have been proposed for 3D chromosome reconstruction from chromosomal contact maps, these methods are rarely used for the interpretation of such experimental data, in particular Hi-C data. We posit that this is due to the lack of an easy-to-use implementation of the proposed algorithms, as well as to the important computational cost of most methods. We here give a detailed implementation of the fast ShRec3D algorithm.
View Article and Find Full Text PDFBackground: Centromeric regions of human chromosomes contain large numbers of tandemly repeated α-satellite sequences. These sequences are covered with constitutive heterochromatin which is enriched in trimethylation of histone H3 on lysine 9 (H3K9me3). Although well studied using artificial chromosomes and global perturbations, the contribution of this epigenetic mark to chromatin structure and genome stability remains poorly known in a more natural context.
View Article and Find Full Text PDFAn increasing number of genomic tracks such as DNA methylation, histone modifications or transcriptomes are being produced to annotate genomes with functional states. The comparison of such high dimensional vectors obtained under various experimental conditions requires the use of a distance or dissimilarity measure. Pearson, Cosine and $L_{p}$-norm distances are commonly used for both count and binary vectors.
View Article and Find Full Text PDFSummary: Genomic sequences are widely used to infer the evolutionary history of a given group of individuals. Many methods have been developed for sequence clustering and tree building. In the early days of genome sequencing, these were often limited to hundreds of sequences but due to the surge of high throughput sequencing, it is now common to have millions of sampled sequences at hand.
View Article and Find Full Text PDFApplication of deep neural network is a rapidly expanding field now reaching many disciplines including genomics. In particular, convolutional neural networks have been exploited for identifying the functional role of short genomic sequences. These approaches rely on gathering large sets of sequences with known functional role, extracting those sequences from whole-genome-annotations.
View Article and Find Full Text PDFGenetically modified genomes are often used today in many areas of fundamental and applied research. In many studies, coding or noncoding regions are modified in order to change protein sequences or gene expression levels. Modifying one or several nucleotides in a genome can also lead to unexpected changes in the epigenetic regulation of genes.
View Article and Find Full Text PDFSummary: Prediction of genomic annotations from DNA sequences using deep learning is today becoming a flourishing field with many applications. Nevertheless, there are still difficulties in handling data in order to conveniently build and train models dedicated for specific end-user's tasks. keras_dna is designed for an easy implementation of Keras models (TensorFlow high level API) for genomics.
View Article and Find Full Text PDFTo investigate novel aspects of pattern formation in spin systems, we use a mapping between reactive concentrations in a reaction-diffusion system and spin orientations in a dynamic multiple-spin Ising model. While pattern formation in Ising models always relies on infinite-range interactions, this mapping allows us to design a finite-range-interactions Ising model that can produce patterns observed in reaction-diffusion systems including Turing patterns with a tunable typical length scale. This model has asymmetric interactions and several spin types coexisting at a site.
View Article and Find Full Text PDFWe revisit the notion of gene regulatory code in embryonic development in the light of recent findings about genome spatial organization. By analogy with the genetic code, we posit that the concept of code can only be used if the corresponding adaptor can clearly be identified. An adaptor is here defined as an intermediary physical entity mediating the correspondence between codewords and objects in a gratuitous and evolvable way.
View Article and Find Full Text PDFWe investigate the kinetics of a polymer collapse due to the formation of irreversible cross-links between its monomers. Using the contact probability P(s) as a scale-dependent order parameter depending on the chemical distance s, our simulations show the emergence of a cooperative pearling instability. Namely, the polymer undergoes a sharp conformational transition to a set of absorbing states characterized by a length scale ξ corresponding to the mean pearl size.
View Article and Find Full Text PDFIn chromosome conformation capture experiments (Hi-C), the accuracy with which contacts are detected varies due to the uneven distribution of restriction sites along genomes. In addition, repeated sequences or homologous regions remain indistinguishable because of the ambiguities they introduce during the alignment of the sequencing reads. We addressed both limitations by designing and engineering 144 kb of a yeast chromosome with regularly spaced restriction sites (Syn-HiC design).
View Article and Find Full Text PDFAs in eukaryotes, bacterial genomes are not randomly folded. Bacterial genetic information is generally carried on a circular chromosome with a single origin of replication from which two replication forks proceed bidirectionally toward the opposite terminus region. Here, we investigate the higher-order architecture of the Escherichia coli genome, showing its partition into two structurally distinct entities by a complex and intertwined network of contacts: the replication terminus (ter) region and the rest of the chromosome.
View Article and Find Full Text PDFWithin cells, soluble RNPs can switch states to coassemble and condense into liquid or solid bodies. Although these phase transitions have been reconstituted in vitro, for endogenous bodies the diversity of the components, the specificity of the interaction networks, and the function of the coassemblies remain to be characterized. Here, by developing a fluorescence-activated particle sorting (FAPS) method to purify cytosolic processing bodies (P-bodies) from human epithelial cells, we identified hundreds of proteins and thousands of mRNAs that structure a dense network of interactions, separating P-body from non-P-body RNPs.
View Article and Find Full Text PDFDuplication and segregation of chromosomes involves dynamic reorganization of their internal structure by conserved architectural proteins, including the structural maintenance of chromosomes (SMC) complexes cohesin and condensin. Despite active investigation of the roles of these factors, a genome-wide view of dynamic chromosome architecture at both small and large scale during cell division is still missing. Here, we report the first comprehensive 4D analysis of the higher-order organization of the genome throughout the cell cycle and investigate the roles of SMC complexes in controlling structural transitions.
View Article and Find Full Text PDF