The new generation of fast and low-cost DNA sequencing methods led to a rapid accumulation of genomic data in the DNA databases after the year 2000. Hundreds of bacterial genomes have been sequenced during the last decade, however, in rare cases they were profoundly analyzed. Apparently it is much easier today to generate raw DNA sequencing data rather than to decipher their biological meaning. This study aims to shed light on the relationship between codon usage and gene expression in 158 prokaryotes belonging to the two major classes: Archaea and Bacteria. Archaea includes 2 families (Crenarchaeota and Euryarchaeota) and Bacteria 13 families (Actinobacteria, Aquificae, Bacteroidetes, Chlamydiae, Chlorobi, Cyanobacteria, Deinococcus-Thermus, Firmicutes, Fusobacteria, Planctomycetes, Proteobacteria, Spirochaetes and Thermotogae) and the total number of genera was 9 for Archaea and 22 for Bacteria. In order to study the link between codon usage and gene expression four local DNA databases were created containing respectively: a) all protein coding sequences of each organism; b) highly expressed genes; c) ribosomal proteins genes and d) low expressed genes. These databases were used to compare codon usage patterns of: a) different strains of one bacterial species; b) different subsets of genes in one species and c) different bacterial genomes. They were also employed to identify missing and atypical codons as well as to search for correlation between codon usage and the formal prokaryotic taxonomy. The obtained results indicate that the formal taxonomy of many bacterial species does not fit their codon usage pattern (as predicted by the "Genome Theory"). On the other hand there are obvious similarities in the codon usage pattern of distant (according to the formal taxonomy) organisms. The latter suggests that either the Genome Theory or the official taxonomy of microorganisms need revisions. Taking into consideration that at least two tRNAs occupy the two (A and P) functional ribosomal sites during the translation elongation step, we have also studied the effect of neighbor codons (codon pairs) on gene expression. Aiming to identify preferential and rare codon pairs, we have developed own criteria for determining their frequency of occurrence and for evaluation of their possible effect on translation. This approach was applied to analyze the distribution of all (3904) codon pairs in the Escherichia coli genome and we found that their frequency of occurrence varied from zero to 4913 times. The predicted effect of some (particularly 3'-terminal) codon pairs on gene expression is confirmed experimentally.

Department of Biology

Bachvarov, B.I. (Boris I.), Kirilov, K.T. (Kiril T.), Ivanov, I.G. (Ivan G.), & Golshany, A. (2010). Codon and codon pairs usage in bacteria. In Bacterial DNA, DNA Polymerase and DNA Helicases (pp. 3–49).