Ribosomal Protein Database Profiling Lends Clarity to Ribosomal Protein Evolution and Mass Distribution
Author(s): Wenfa Ng
Existence of theoretical ribosomal protein mass fingerprint as well as utility of ribosomal protein as biomarkers in mass spectrometry microbial identification suggests phylogenetic significance for this class of proteins. To serve the above two functions, facile means of identifying and extracting important attributes of ribosomal proteins from proteome data file of microbial species must be found. Additionally, there is a need to calculate important properties of ribosomal proteins such as molecular weight and nucleotide sequence based on amino acid sequence information from FASTA proteome file. This work sought to support the above endeavour through developing MATLAB software that extracts the amino acid sequence information of all ribosomal proteins from the FASTA proteome data file of a microbial species downloaded from UniProt. Built-in functions in MATLAB are subsequently employed to calculate important properties of extracted ribosomal proteins such as number of amino acid residue, molecular weight and nucleotide sequence. All information above are output, as a database, to an Excel file for ease of storage and retrieval. Data available from the analysis of an Escherichia coli K-12 proteome revealed that the bacterium possesses a total of 59 ribosomal proteins distributed between the large and small ribosome subunits. The ribosomal protein ranges in sequence length from 38 (50S ribosomal protein L36) to 557 (30S ribosomal protein S1). In terms of molecular weight distribution, the profiled ribosomal proteins range in weight from 4364.305 Da (50S ribosomal protein L36) to 61157.66 Da (30S ribosomal protein S1). More important, analysis of the distribution of the molecular weight of different ribosomal proteins in E. coli reveals a smooth curve that suggests strong co-evolution of ribosomal protein sequence and mass given the tight constraints that a functional ribosome presents. Finally, cluster analysis reveals a preponderance o