Large-scale protein structure prediction methods for enhanced annotation

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

The outbreak of next-generation sequencing led to a boom in the number of available protein sequences. This opened a breach between sequence data, and structural and functional data. In recent years, deep learning algorithms like AlphaFold2 have managed to predict protein structures from sequence with an accuracy similar to that of experimental structures, saving the gap with structural data. Protein structure alignment tools have also experienced an upswing in terms of speed with Foldseek, enabling large-scale comparisons. In proteins, structural conservation is higher than sequence conservation. Because of this, large-scale comparisons opened the door to distant homology detection based on structures. The efficiency of protein structure predictors permits the generation of structures on a large scale, which, after the structure-based homology detection, can be used for inference annotation. This methodology has been used up to the whole UniProtKB level, showing promising evolutionary insights and perspectives. The fruitful combination of different methods and use of large datasets highlights the potential of protein structure-based tools, creating a whole new approach for the computational research of evolution.

Keywords

protein structure ; structure prediction ; structure alignment ; annotation ; large-scale ; deep learning ; AlphaFold2 ; Foldseek

Citation