Tandem repetition of domain in protein sequence occurs in all three domains of life. It creates protein diversity and adds functional complexity in organisms. In this work, we analyzed 52 streptococcal genomes and found 3748 proteins contained domain repeats. Proteins not harboring domain repeats are significantly enriched in cytoplasm, whereas proteins with domain repeats are significantly enriched in cytoplasmic membrane, cell wall and extracellular locations. Domain repetition occurs most frequently in S. pneumoniae and least in S. thermophilus and S. pyogenes. DUF1542 is the highest repeated domain in a single protein, followed by Rib, CW_binding_1, G5 and HemolysinCabind. 3D structures of 24 repeat-containing proteins were predicted to investigate the structural and functional effect of domain repetition. Several repeat-containing streptococcal cell surface proteins are known to be virulence-associated. Surface-associated tandem domain-containing proteins without experimental functional characterization may be potentially involved in the pathogenesis of streptococci and deserve further investigation.
- Domain repeats
- Domain repetition
- Protein structure modeling
- Protein subcellular localization