Publications by authors named "C C Orengo"

CATH (https://www.cathdb.info) is a structural classification database that assigns domains to the structures in the Protein Data Bank (PDB) and AlphaFold Protein Structure Database (AFDB) and adds layers of biological information, including homology and functional annotation.

View Article and Find Full Text PDF

The AlphaFold Protein Structure Database (AFDB) contains more than 214 million predicted protein structures composed of domains, which are independently folding units found in multiple structural and functional contexts. Identifying domains can enable many functional and evolutionary analyses but has remained challenging because of the sheer scale of the data. Using deep learning methods, we have detected and classified every domain in the AFDB, producing The Encyclopedia of Domains.

View Article and Find Full Text PDF

The β and β' subunits of the RNA polymerase (RNAP) are large proteins with complex multi-domain architectures that include several insertional domains. Here, we analyze the domain organizations of RNAP-β and RNAP-β' using sequence, experimentally determined structures and AlphaFold structure predictions. We observe that lineage-specific insertional domains in bacterial RNAP-β belong to a group that we call BEAN (broadly embedded annex).

View Article and Find Full Text PDF

Proteins, fundamental to cellular activities, reveal their function and evolution through their structure and sequence. CATH functional families (FunFams) are coherent clusters of protein domain sequences in which the function is conserved across their members. The increasing volume and complexity of protein data enabled by large-scale repositories like MGnify or AlphaFold Database requires more powerful approaches that can scale to the size of these new resources.

View Article and Find Full Text PDF